Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] Fix data reading speed #2923

Merged
merged 4 commits into from
Apr 5, 2018
Merged

[FIX] Fix data reading speed #2923

merged 4 commits into from
Apr 5, 2018

Conversation

ales-erjavec
Copy link
Contributor

Issue
Description of changes
  • Replace linear search in string -> index mapping for discrete columns.
  • Generally speed up parsing ( ~50% improvement depending on the data)
Includes
  • Code changes
  • Tests
  • Documentation

@codecov-io
Copy link

codecov-io commented Feb 26, 2018

Codecov Report

Merging #2923 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #2923      +/-   ##
==========================================
- Coverage   81.86%   81.86%   -0.01%     
==========================================
  Files         329      329              
  Lines       56770    56792      +22     
==========================================
+ Hits        46474    46491      +17     
- Misses      10296    10301       +5

@lanzagar lanzagar added this to the 3.11 milestone Feb 26, 2018
@astaric astaric modified the milestones: 3.11, 3.12 Mar 9, 2018
Do not mix NaN float values in array of strings.
Use single NA mask instead of always checking for NaN.
Optimize out global or builtins loads where appropriate.
... if number of distinct discrete values is ~= number of rows

Fixes biolabgh-1297
@lanzagar lanzagar merged commit 400eea9 into biolab:master Apr 5, 2018
@ales-erjavec ales-erjavec deleted the io-speed branch April 5, 2018 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Discrete attributes with a large number of values
4 participants