[FIX] Venn Diagram is slow for big datasets #4400

AndrejaKovacic · 2020-02-06T11:01:46Z

Issue

Fixes #3989

Description of changes

Refactored venn over rows.

TODO:

Output duplicates

Includes

Code changes
Tests

codecov · 2020-02-06T21:10:40Z

Codecov Report

Merging #4400 into master will increase coverage by 0.24%.
The diff coverage is 95.96%.

@@            Coverage Diff             @@
##           master    #4400      +/-   ##
==========================================
+ Coverage   87.19%   87.43%   +0.24%     
==========================================
  Files         401      405       +4     
  Lines       72938    73990    +1052     
==========================================
+ Hits        63595    64692    +1097     
+ Misses       9343     9298      -45

janezd

This is quite a work. Nice tricks in the code, too.

I started commenting, but then found it easier to do the changes myself (they are just, minor, stylistic) and comment on them. Please review my changes and reject them if any of them are wrong.

I spent some time on trying to follow and understand the code, but I'll have to invest some more time -- or let you explain it to me on Friday.

Orange/widgets/visualize/owvenndiagram.py

janezd · 2020-02-18T14:02:12Z

Run the widget (from main). Select a few regions. In "Rows (instances)" change the combo to some attribute. It crashes. If it doesn't switch to the other attribute. It crashes then.

The error occurs a[mask] = values: ValueError: shape mismatch: value array of shape (27,) could not be broadcast to indexing result of shape (27,1).

AndrejaKovacic · 2020-02-18T15:07:55Z

I think reshaping should do the trick. I also removed relevant_keys parameter, since indices already contain that information in create_from_rows.

janezd · 2020-02-19T19:00:46Z

Run widget from main. Use rowwise, by instance. Click Output duplicates and select some area. Widget crashes with

  File "/Users/janez/orange3/Orange/widgets/visualize/owvenndiagram.py", line 663, in commit
    selected = self.create_from_rows(selected_ids, False)
  File "/Users/janez/orange3/Orange/widgets/visualize/owvenndiagram.py", line 598, in create_from_rows
    return self.extract_rowwise_duplicates(var_dict, relevant_ids)
  File "/Users/janez/orange3/Orange/widgets/visualize/owvenndiagram.py", line 634, in extract_rowwise_duplicates
    x, m, y, t_ids = self.expand_table(extracted, all_atrs, all_metas, all_cv)
  File "/Users/janez/orange3/Orange/widgets/visualize/owvenndiagram.py", line 616, in expand_table
    array[:, perm] = b
ValueError: could not convert string to float: 'YGL048C'

janezd · 2020-02-19T19:02:28Z

Run widget from main. Use rowwise. Match by "Test". Check Output duplicates and select some (non-empry) area. Widget crashes with

  File "/Users/janez/orange3/Orange/widgets/visualize/owvenndiagram.py", line 663, in commit
    selected = self.create_from_rows(selected_ids, False)
  File "/Users/janez/orange3/Orange/widgets/visualize/owvenndiagram.py", line 598, in create_from_rows
    return self.extract_rowwise_duplicates(var_dict, relevant_ids)
  File "/Users/janez/orange3/Orange/widgets/visualize/owvenndiagram.py", line 640, in extract_rowwise_duplicates
    values = {'attributes': [np.vstack(all_x)],
  File "/Users/janez/miniconda3/envs/o3/lib/python3.7/site-packages/numpy/core/shape_base.py", line 283, in vstack
    return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
ValueError: need at least one array to concatenate

AndrejaKovacic · 2020-02-19T21:40:02Z

Run widget from main. Use rowwise. Match by "Test". Check Output duplicates and select some (non-empry) area. Widget crashes with

  File "/Users/janez/orange3/Orange/widgets/visualize/owvenndiagram.py", line 663, in commit
    selected = self.create_from_rows(selected_ids, False)
  File "/Users/janez/orange3/Orange/widgets/visualize/owvenndiagram.py", line 598, in create_from_rows
    return self.extract_rowwise_duplicates(var_dict, relevant_ids)
  File "/Users/janez/orange3/Orange/widgets/visualize/owvenndiagram.py", line 640, in extract_rowwise_duplicates
    values = {'attributes': [np.vstack(all_x)],
  File "/Users/janez/miniconda3/envs/o3/lib/python3.7/site-packages/numpy/core/shape_base.py", line 283, in vstack
    return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
ValueError: need at least one array to concatenate

This crash was caused by Test being a StringVariable only by name, but not by nature :)

janezd · 2020-02-20T10:12:47Z

This crash was caused by Test being a StringVariable only by name, but not by nature :)

I see. It's heart wasn't in it. :)

AndrejaKovacic changed the title ~~[FIX] Rewrite venn over rows~~ [FIX] Venn Diagram is slow for big datasets Feb 6, 2020

AndrejaKovacic force-pushed the venn branch from 86bd845 to 7524b01 Compare February 6, 2020 12:52

Rewrite venn over rows

8784364

AndrejaKovacic force-pushed the venn branch from 7524b01 to 4c79696 Compare February 6, 2020 14:00

AndrejaKovacic changed the title ~~[FIX] Venn Diagram is slow for big datasets~~ [WIP][FIX] Venn Diagram is slow for big datasets Feb 6, 2020

Add output_duplicates function

97a4856

AndrejaKovacic force-pushed the venn branch 3 times, most recently from 55ae682 to 259ab93 Compare February 11, 2020 10:36

Add test_output_duplicates, fix ids copying

6262599

AndrejaKovacic force-pushed the venn branch from 259ab93 to 6262599 Compare February 11, 2020 11:16

Fix column selection

be2bac3

AndrejaKovacic changed the title ~~[WIP][FIX] Venn Diagram is slow for big datasets~~ [FIX] Venn Diagram is slow for big datasets Feb 11, 2020

Fix missing string attribute

52a70a4

AndrejaKovacic force-pushed the venn branch from 237b902 to 52a70a4 Compare February 11, 2020 14:46

janezd self-assigned this Feb 14, 2020

OWVennDiagram: Micro refactoring

0c8c0d7

janezd force-pushed the venn branch from ddf2e27 to 0c8c0d7 Compare February 17, 2020 21:16

janezd reviewed Feb 17, 2020

View reviewed changes

Fix padded column shape mismatch, remove unnecessary parameters

e8b2416

Fix output duplicates dimension mismatch

9e1a7db

OWVennDiagram: Lint

eb00fd2

janezd merged commit 9541a29 into biolab:master Feb 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FIX] Venn Diagram is slow for big datasets #4400

[FIX] Venn Diagram is slow for big datasets #4400

AndrejaKovacic commented Feb 6, 2020 •

edited

Loading

codecov bot commented Feb 6, 2020 •

edited

Loading

janezd left a comment

janezd commented Feb 18, 2020

AndrejaKovacic commented Feb 18, 2020

janezd commented Feb 19, 2020

janezd commented Feb 19, 2020

AndrejaKovacic commented Feb 19, 2020

janezd commented Feb 20, 2020

[FIX] Venn Diagram is slow for big datasets #4400

[FIX] Venn Diagram is slow for big datasets #4400

Conversation

AndrejaKovacic commented Feb 6, 2020 • edited Loading

Issue

Description of changes

Includes

codecov bot commented Feb 6, 2020 • edited Loading

Codecov Report

janezd left a comment

Choose a reason for hiding this comment

janezd commented Feb 18, 2020

AndrejaKovacic commented Feb 18, 2020

janezd commented Feb 19, 2020

janezd commented Feb 19, 2020

AndrejaKovacic commented Feb 19, 2020

janezd commented Feb 20, 2020

AndrejaKovacic commented Feb 6, 2020 •

edited

Loading

codecov bot commented Feb 6, 2020 •

edited

Loading