Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow transcripts of umlauts #16

Open
sebix opened this issue Jan 22, 2015 · 5 comments
Open

Allow transcripts of umlauts #16

sebix opened this issue Jan 22, 2015 · 5 comments

Comments

@sebix
Copy link

sebix commented Jan 22, 2015

For all users of openthesaurus not using a german keyboard layout and not knowing how to use compose keys, it would be very nice to have an automatic conversion for umlauts:

  • ue -> ü
  • ae -> ä
  • oe -> ö
  • ss or sz -> ß

dict.cc is doing the same

@janschreiber
Copy link

This feature is now implemented and seems to work. However, I found one strange result today:

  • Search for 'Evaskostum' finds 'Evaskostüm' via fuzzy search
  • Search for 'Evaskostuem' finds nothing
  • Search for 'Kostuem' finds 'Kostüm'

@sebix
Copy link
Author

sebix commented Aug 10, 2016

Great!

@danielnaber
Copy link
Owner

@janschreiber Thanks for the report. Unfortunately it's not easy to fix, as we apply some tricks to make it fast: the substring search needed here (as the item is im Evaskostüm) works on a memory table, but this table doesn't contain the normalized terms needed for this feature.

@janschreiber
Copy link

janschreiber commented Aug 11, 2016

@danielnaber Thanks for your explanation. I'm not sure if the following suggestion makes any sense whatsoever, but wouldn't it be possible to apply the normalization to the search terms rather than to the searched data? I mean, isn't it possible to transform a search for words that contain "umlaut-ish" character combinations such as 'ae' to a search for (Cäsar|Caesar) before it is even sent to the search algorithm?

@danielnaber
Copy link
Owner

The normalization needs to be applied to both, but our in-memory database currently isn't a mapping, but just a list of words. We'd need to extend that to contain a mapping from normalized to original term. (Plus, we actually have two different ways of normalization.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants