-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserving word order for Arabic, Hebrew #582
Comments
If you are creating docx files, you need to set w:pPr/w:bidi and w:rPr/w:rtl appropriately, as well as w:pPr/w:lang . See:
A program exporting docx then needs to be sensitive to these attributes. For example, docx4j's PDF output via FO should do this correctly. If these attributes are not present, then the procedure recommended in your Strings on the Web reference might be a good fallback. (I wonder what Word does?) Or are you suggesting that docx4j is the consumer and as such the methods to add text to a run at https://github.com/plutext/docx4j/blob/VERSION_11_4_12/docx4j-openxml-objects/src/main/java/org/docx4j/wml/R.java#L201 should set appropriate attributes? |
We're basically adding a paragraph of text to the main document's content, providing a regular string when creating a Text t = factory.createText();
t.setValue(string); Now, implicit and sometimes explicit, unicode can be bidirectional and it would be convenient if there was a way to create a content element from a string that automatically figured out if bidi elements were necessary. But we don't know if the string is Arabic, Hebrew, or a mix of languages, which is why something like an Arabic script processor doesn't really make sense. Ideally, this interface should simply follow the best practices for interpreting unicode as a bidirectional language container. |
In Unicode text, consumers of RTL (right-to-left) language text such as Arabic or Hebrew, must identify the string direction, for example by observing the strong Unicode directional property of some glyphs such as Arabic letters.
That is, if for example a paragraph begins with an Arabic letter, we should align the whole paragraph right and render the glyphs right to left as we progress logically through the string.
In our testing, this does not seem to happen automatically in this library; bidi elements are not emitted.
While there's an ArabicScriptProcessor, often times we don't know the specific language of a given paragraph.
Shouldn't this be more or less an automatic process, working out of the box?
The text was updated successfully, but these errors were encountered: