-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for multiple locales #10
Comments
I have created a draft PR #12 (I still need to test properly / clean the code ) up a bit. I will appreciate inputs regarding the approach and the feasibility of expanding it to even more languages. This is an overview of the approach followed. The main approach rests on creating 6 sets of dictionaries for each of the languages:-
In addition to all these above points would appreciate some opinions about using JSON files as the final data source instead of py files like date parser. So in my main parser code based on the language, I load the corresponding JSON and populate the 6 dictionaries. Are there any drawbacks to this method. (Speed ? loading json for each new call to parse ) |
Hi @arnavkapoor ! The most valuable things in this project at this moment are:
Believe me, we will probably need to rewrite all the code, but the important thing is keeping good tests and making sure that they pass. The interface will be really important too, as we need it to build the tests, but for now, it doesn't matter if it changes a lot. After releasing a new version to PyPI this will change and it shouldn't change too much. Whit this have been said, I think that, we shouldn't worry too much about Japanese, Chinese, German and French. It's a really good idea to check them (as doing it, we will see potential future issues), but for now, what we should do is:
After having a good coverage for those languages, we can work on adding support for other languages without missing anything for the currently supported languages. |
I have just created a new ticket to track the point 2.1: #18 However, as mentioned before, don't worry too much about the other languages and be focused on adding support for Hindi, Spanish and Russian (apart from English). I'm sure that we will be able to fix other languages in the future 😄 |
This library should support multiple languages.
As a first approach, we could support
English
(default) and three more languages that could beSpanish
,Russian
, andHindi
, as they are broadly used and have different alphabets.We could use a similar approach than the approach used in
dateparser
.It works like this:
json
files coming from CLDRyaml
files containing specific-language exceptionspy
files merging both sources.The script used in
dateparser
is this one: https://github.com/scrapinghub/dateparser/blob/master/dateparser_scripts/write_complete_data.py but it's not a good example, as there are a lot of things to be improved and some bad practices.To allow this library to support it, we could just add a
locale
or similar argument to theparse()
function (defaulting to English). I don't expect it to autodetect the language, at least in this first iteration.@arnavkapoor feel free to implement it in the way you think is better. We can also achieve this in separated PRs, no need to do just one PR.
The text was updated successfully, but these errors were encountered: