-
Notifications
You must be signed in to change notification settings - Fork 45
Fluent vs gettext
Gettext is a localization system deeply rooted in the GNU project and its design choices. Fluent Project is looking at gettext as a good example of a complete, low level, platform independent ecosystem of libraries and tools for managing full release cycle workflow with human-readable file format. At the same time Fluent paradigms lead us to different design decisions on multiple core localization-specific choices which lead to a vastly different APIs and lifecycles.
In other words, we believe that gettext is a very good project, but we disagree with how it approaches localization.
Below, we listed significant differences between gettext and Fluent:
gettext | Fluent | |
---|---|---|
Message identifier | source string | developer provided |
Argument bindings | positional | key-based |
Translation invalidation | fuzzy matched | id-change |
Data storage | human-readable (.po) and compiled (.mo) | human readable (.ftl) |
External arguments | none | rich support |
Plural support | special-cased | part of generic variant-selection syntax |
Plural support span | developer decision, spanning across translations | localizer decision, per locale |
Designed for | C family of languages | Web, modern client-side languages |
Message references | by developer | by localizer |
Message templates | required (.pot) | none |
Localizer comments | none | fully supported |
Error recovery | fragile | resilient, strong recovery logic |
Compound Messages | none | value + attributes per message |
BiDi | none | bidirectional isolation |
Intl Formatters | none | explicit and implicit |
The most important difference between gettext and Fluent is the choice of a message identifier. Gettext approaches the problem by taking the source string (often English). While the choice seem simple, it has long standing consequences in form of two limitations that this choice imposes.
First of all, it means that any change to the source string invalidate all translations of the string. This severely increases the burden on the developers to never alter messages in the source language as it results in all translations having to be updated.
Secondly, it makes it harder to introduce multiple messages with the same source string which should be translated differently. For example a button with a message "Open", and a label "Open" may have different translations, since one is a command, while the other is a description.
Gettext offers an optional context string - msgctxt
- to disambiguate between two or more strings with the same source translation. This approach puts the burden on developers to recognize such scenario, going against the separation of concerns principle.
Fluent recommends against reuse of translation messages because of that. Disconnecting source translation from other translations is also important for our ability to introduce compound messages (which hold multiple strings for a single translation unit bind to a single UI widget) and enable message referencing based on the message identifier.
Fluent establishes a social contract between the developer and localizers. The developer introduces a unique identifier and provides a set of variables such as number of unread emails or the name of the user, and localizers are using Fluent syntax features to construct the best possible translation for that identifier.
The developer does not, and should not, be bothered with details of how such translations are constructed. All they know is that a result of a query for the identifier will be a single, opaque, string that contains the right translation to be placed in the UI.
Gettext supports a limited set of internationalization features; notably - plural rules. But gettext support for plural rules is a special-cased addition on top of the original gettext syntax, and as such feels out of place and doesn't scale beyond plural rules. Fluent supports a generic concept of string variants that can be used in combination with a selector. Commonly, plural rule will be such a selector, but depending on grammatical features of a language there may be others as well, such as genders, declension or even environmental values such as time of the day or operating system, allowing localizers to easily design messages with multiple variants as they wish.
Gettext doesn't support external arguments, which means that string formatting doesn't include any parameter formatting. When needed, Gettext recommends returning a string that can be then passed to printf
or to run String.prototype.replace
on the result.
Fluent support for external arguments is deeply rooted in the syntax. External arguments are not only interpolated, but can also be used to design message variants or be passed to builtin functions. That allows fluent localizers to construct much more fine tuned localizations. On top of that, Fluent places FSI/PDI markers around placebles to protect directionality isolation in bidirectional text and strongly discourages any manipulations on result strings reducing the burden on the developer.
On top of that, the way gettext handles plural rules requires the developer of the system to select if the message will be a multi-variant message, or a single string. Fluent believes that a developer is not in the best position to make such decisions. In many cases, a message that does not require plural rules variants in English, may require them in other languages.
More generically, Fluent makes an assumption that developers should not be required to understand the linguistic requirements of all languages their software is translated to and that each language may want to use different features to construct the translation.
In result Fluent keeps each translation separate, without "leaking" the requirements of one language onto others, and keeps all translations opaque from the developer, who doesn't need to be bothered with deciding what features localizers may need for a given string.
Generally speaking, in release cycle we recognize three types of message invalidation:
- Minor: doesn't affect translations (e.g. spelling, or punctuation)
- Medium: does affect how the message is constructed, but does not invalidate the content of the message (e.g. "Show All Bookmarks" -> "Show Bookmarks Manager")
- Major: changes the meaning of the message (e.g. "Click to save" -> "Click to open")
Due to design decisions, Gettext clusters all three of the levels into a single state they call "fuzzy". Any change to the source string, no matter how minor or major, results in all translations being invalidated.
Fluent's use of unique identifiers allows for at least two of those three states to remain separated - minor changes may be applied and if the developer does not alter the unique identifier (a.k.a does not change the social contract), all translations remain valid. On the other hand if the developer changes the ID, all translations become invalid and must be updated.
While we believe this design decision to be better for most product release cycles, we recognize that it does not address the "medium" level of changes forcing the developer to chose between altering and not altering the ID thus turning it into a minor or major change respectively.
We're investigating an idea of message versioning, which would allow the developer to mark a translation as updated, without invalidating it completely. Such a state would keep the translation valid with an assumption tha the older version of the translation is better than an untranslated string, but would allow all tools to notify the localizer that their translation is outdated.
Gettext uses three file formats - .po, .pot and .mo - to operate on the life cycle of the product. This decision impact how gettext fits into release cycle introducing required steps such as compilation and extraction of messages. Fluent uses a single file format - .ftl - which makes it easier to fit into different workflow models and removes those extra steps which may cause data misalignment.
Gettext can be encoded as UTF-8, but that's basically how far its support for Unicode standard goes. It uses custom plural rules dataset, does not handle any date, time or number formatting, and does not help with bidirectional messages. Fluent leverages CLDR, ICU and ECMA402 extensively benefiting from well designed and standardized databases and algorithms for a lot of internationalization features nicely blending localization and internationalization together.
We believe that both Fluent API and syntax represent a substantial improvement over gettext and recommend it over gettext for all multilingual software.