[FIX] CSV Import - Change datetime format parsing #6539
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue
Addresses but doesn't fix #6499
The datetime format parsing in the Import CSV widget was implemented to work faster when the same times repeat, but it may cause problems since each time is parsed separately.
Description of changes
Since Pandas improved datetime parsing speeds, I suggest not parsing unique times separately but parsing them in one call of pd.to_datetime. Doing this way, Pandas try to guess the format of times in a column and then parse them with the same format.
It may solve issues with some formats that Pandas can recognize but will only solve some problems. E.g. when Pandas cannot recognize the format, they fall back to
dateutil
implementation, and in this case, dates are still parsed separately, which can cause different parsing between dates in the same column. It happens in case #6499, which means that this issue is not solved yet.Includes