Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON template: provisions for definitions in other languages #109

Open
Ntsekees opened this issue Sep 27, 2020 · 7 comments
Open

JSON template: provisions for definitions in other languages #109

Ntsekees opened this issue Sep 27, 2020 · 7 comments

Comments

@Ntsekees
Copy link
Contributor

Ntsekees commented Sep 27, 2020

Is the current dictionary.json meant to be able to harbor definitions and keywords in languages other than English? If so, how should these be added?

I can imagine two options (taking the Spanish language as an example):

  1. Additional top-level fields spanish, spanish-gloss, spanish-short, spanish-keywords would be added.

  2. Alternatively, the current field english is removed; instead, a new field, a list of sub-dictionaries, named translations, is added; gloss, short, keywords are moved into it and a new field language is also added.
    I.e.

    "english": "▯ is something.",
    "gloss": "something",
    "short": "",
    "keywords": [],

would turn into:

    translations: [
        {
            "language": "english":
            "definition": "▯ is something.",
            "gloss": "something",
            "short": "",
            "keywords": []
        }
    ]

I think the second option is conceptually more elegant and tidy, although it takes some more spaces in the JSON than the first option.
The second option would clearly separate what fields are dependent on a particular target language from the fields that are inherent to the Toaq word, irrespectively of any translation of it (e.g. type, frame, distribution…).

What do you think?

@Ntsekees
Copy link
Contributor Author

Arguably even notes would be moved to the translations subfield, allowing notes in different languages.

@Ntsekees
Copy link
Contributor Author

#80 is a related issue.

@robintown
Copy link
Member

Sounds good—I would just encourage the use of ISO 693-3 codes for the language field. And definitely put notes inside translations, as well as examples.

@robintown
Copy link
Member

robintown commented Sep 27, 2020

Or perhaps just give examples its own translations array. (That would mean less duplication)

@Ntsekees
Copy link
Contributor Author

Ntsekees commented Sep 27, 2020

Or yet a third possibility is to automatically move all the examples into separate dictionary entries and put back links to them in the original examples field they initially were located in. Being promoted as entries, examples would have a translation field the same way other entries have.
Several fields don't make much senses with example sentences, but there are already several official entries which are example sentences and not mere lexemes.

I agree that ISO language codes should be used.

@robintown
Copy link
Member

Would you identify examples by their toaq field? Because with that method we would otherwise need to introduce an id field which doesn't make much sense for a dictionary where entries are unique anyways. It would not be hard to update the normalization script to check for broken links though, so identifying examples by their toaq field would certainly be feasible.

@Ntsekees
Copy link
Contributor Author

A lot of examples already have IDs however, namely official toaq.org example IDs and audio spreadsheet IDs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants