Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV File Import mangles dates - locale? #6499

Open
shakeshuck opened this issue Jul 7, 2023 · 2 comments · Fixed by #6539
Open

CSV File Import mangles dates - locale? #6499

shakeshuck opened this issue Jul 7, 2023 · 2 comments · Fixed by #6539
Assignees
Labels
bug report Bug is reported by user, not yet confirmed by the core team

Comments

@shakeshuck
Copy link

What's wrong?
Importing a csv file with UK dates and leaving the type to 'auto' results in some dates being converted to American format (those where the date is less than 13) and others not. I could set the type to 'text' but then the column order is re-arranged, which is also not ideal.

How can we reproduce the problem?
Import a column containing e.g. "28/03/18" and "08/03/18"
"28/03/18" becomes 2018-03-28
"08/03/18" becomes 2018-08-03

What's your environment?

  • Operating system: Linux - OpenSUSE Tumbleweed
  • Orange version: 3.35
  • How you installed Orange:
    Orange3 installation was via pip
@shakeshuck shakeshuck added the bug report Bug is reported by user, not yet confirmed by the core team label Jul 7, 2023
@PrimozGodec
Copy link
Contributor

In #6539, I try to address the issue, but even though I changed the implementation, the problem still persist because of how Pandas parses dates. Pandas try to guess the format of times in a column and then parse them with the same format. When Pandas cannot recognize the format, they fall back to dateutil implementation, and in this case, dates are still parsed separately, which can cause different parsing between dates in the same column. It happens in this case.

I suggest adding an option to specify datetime format (as we did in Edit Domain), but I would first wait for the File and CSV Import widgets to be joined and then implement this in one widget. What do you think, @janezd and @markotoplak?

Meanwhile, when datetimes are not parsed successfully, I suggest reading them as strings and converting them with the Edit Domain widget.

@PrimozGodec
Copy link
Contributor

Reopening since it is partially solved. As already discussed in #6539, there are two possible solutions:

  • Try rendering with all formats that we currently support, and then we fall back to default if None works (since date utils support more formats that we do). We would need to test how time-consuming it is.
  • An even better solution would be to allow users to specify the format (dropdown with more supported formats and maybe an option to input own format).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug report Bug is reported by user, not yet confirmed by the core team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants