-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unicode double-width character support #34
base: master
Are you sure you want to change the base?
Conversation
3634ce2
to
a1dd08f
Compare
I managed to get rid of the |
Does this work with python2/python3? |
I guess so, tests are passing |
Yes. The tests originally failed with python2, but I fixed that. |
It doesn't handle e.g. zero width joiners, though stand-alone emoji work fine. |
Thanks for this already! Unfortunately codecov is acting up, at least with the latest commits, since it would be useful to revisit/see its patch status (i.e. how much / which code is covered in tests after all). |
The tests don't really seem to cover any of the curses stuff. Which is of course hard to test. I think I may have fixed one or two existing bugs. But I may also have introduced some 😅 |
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Manual codecov link: https://app.codecov.io/gh/pypy/pyrepl/compare/34/diff)
def width(c): | ||
return 2 if unicodedata.east_asian_width(c) in "FW" else 1 | ||
def wlen(s): | ||
return sum(map(width, s)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about using https://pypi.org/project/wcwidth/?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking in from the PyPy side, wcwidth seems like a small package with JIT-freindly structure (no complicated classes or deep function chains). It was dormant from 2016 to 2020, but seems to be active again. The license is MIT, which is fine for PyPy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that that bit of code could be improved, but what would be the advantage of using wcwidth
over python's built-in unicode support? It seems to provide mostly the same functionality except with its own unicode tables that need to be kept up to date.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wcwidth
can also return 0
or -1
, which the current code doesn't know how to handle. And I'm not sure what the best way to handle those would be tbh.
For improved performance, we could use e.g. @functools.lru_cache
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(note that pyrepl
already shows e.g. control characters as ^c
etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is possible to find the Unicode version of a terminal by somewhat intrusive detection https://github.com/jquast/ucs-detect
According to the table in the README, lxterminal and kitty support the same unicode version (11.0.0).
But as my screenshot shows, they don't display emoji the same way.
(Slightly OT: I'd consider switching to kitty, but unfortunately it doesn't support Japanese input 😿)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I briefly tested using wcwidth: obfusk/pyrepl-wide@eba676a.
- I don't notice any improvement atm.
- I had to modify the original prompt length calculation (because that subtracts the length of ansi escape sequences).
_my_unctrl()
currently turns a zero-width joiner into"\u200d"
and I'm not sure returning it unmodified won't break something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@blueyed is there a suspicion that this PR is worse in some cases than the current code? If not maybe it can be merged as-is and other alternatives could be parts of future PRs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even with wcwidth's upcoming width()
function (and setting aside the differences between terminals), handling zero-width joiners etc. properly is not easy, since width(s) == sum(width(c) for c in s)
is no longer true for all strings. This would presumably require more careful and extensive modifications to the code than the current PR.
I think wcwidth is likely to be quite useful in improving unicode support even more in the future, but for now I think this PR is a good first step. Further improvements get a lot more complicated and should probably be separate, future PRs.
I'd also point out that readline
doesn't really seem to handle emoji any better.
AFAIK this PR doesn't make anything worse. Unfortunately (but understandably since I don't really know a good way to test all the curses stuff), the test suite can't rule that out.
I do however think the PR is still a little rough and could use a code review and probably some updates to comments and documentation before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking in from the PyPy side, wcwidth seems like a small package with JIT-freindly structure (no complicated classes or deep function chains). It was dormant from 2016 to 2020, but seems to be active again. The license is MIT, which is fine for PyPy.
I don't think wcwidth supports python2 though, which I presume is an issue for PyPy. [edit: I was mistaken]
That link gives me a 403. |
Any more thoughts here? I would like to get this in the upcoming PyPy release. |
To summarise:
|
Patch mentioned in #33 (but for
master
).