Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] Venn Diagram is slow for big datasets #4400

Merged
merged 9 commits into from
Feb 20, 2020
Merged

Conversation

AndrejaKovacic
Copy link
Contributor

@AndrejaKovacic AndrejaKovacic commented Feb 6, 2020

Issue

Fixes #3989

Description of changes

Refactored venn over rows.

TODO:

  • Output duplicates
Includes
  • Code changes
  • Tests

@AndrejaKovacic AndrejaKovacic changed the title [FIX] Rewrite venn over rows [FIX] Venn Diagram is slow for big datasets Feb 6, 2020
@AndrejaKovacic AndrejaKovacic changed the title [FIX] Venn Diagram is slow for big datasets [WIP][FIX] Venn Diagram is slow for big datasets Feb 6, 2020
@codecov
Copy link

codecov bot commented Feb 6, 2020

Codecov Report

Merging #4400 into master will increase coverage by 0.24%.
The diff coverage is 95.96%.

@@            Coverage Diff             @@
##           master    #4400      +/-   ##
==========================================
+ Coverage   87.19%   87.43%   +0.24%     
==========================================
  Files         401      405       +4     
  Lines       72938    73990    +1052     
==========================================
+ Hits        63595    64692    +1097     
+ Misses       9343     9298      -45

@AndrejaKovacic AndrejaKovacic changed the title [WIP][FIX] Venn Diagram is slow for big datasets [FIX] Venn Diagram is slow for big datasets Feb 11, 2020
Copy link
Contributor

@janezd janezd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite a work. Nice tricks in the code, too.

I started commenting, but then found it easier to do the changes myself (they are just, minor, stylistic) and comment on them. Please review my changes and reject them if any of them are wrong.

I spent some time on trying to follow and understand the code, but I'll have to invest some more time -- or let you explain it to me on Friday.

Orange/widgets/visualize/owvenndiagram.py Outdated Show resolved Hide resolved
Orange/widgets/visualize/owvenndiagram.py Show resolved Hide resolved
Orange/widgets/visualize/owvenndiagram.py Show resolved Hide resolved
Orange/widgets/visualize/owvenndiagram.py Show resolved Hide resolved
Orange/widgets/visualize/owvenndiagram.py Show resolved Hide resolved
Orange/widgets/visualize/owvenndiagram.py Show resolved Hide resolved
Orange/widgets/visualize/owvenndiagram.py Show resolved Hide resolved
Orange/widgets/visualize/owvenndiagram.py Show resolved Hide resolved
Orange/widgets/visualize/owvenndiagram.py Show resolved Hide resolved
Orange/widgets/visualize/owvenndiagram.py Show resolved Hide resolved
@janezd
Copy link
Contributor

janezd commented Feb 18, 2020

Run the widget (from main). Select a few regions. In "Rows (instances)" change the combo to some attribute. It crashes. If it doesn't switch to the other attribute. It crashes then.

The error occurs a[mask] = values: ValueError: shape mismatch: value array of shape (27,) could not be broadcast to indexing result of shape (27,1).

@AndrejaKovacic
Copy link
Contributor Author

I think reshaping should do the trick. I also removed relevant_keys parameter, since indices already contain that information in create_from_rows.

@janezd
Copy link
Contributor

janezd commented Feb 19, 2020

Run widget from main. Use rowwise, by instance. Click Output duplicates and select some area. Widget crashes with

  File "/Users/janez/orange3/Orange/widgets/visualize/owvenndiagram.py", line 663, in commit
    selected = self.create_from_rows(selected_ids, False)
  File "/Users/janez/orange3/Orange/widgets/visualize/owvenndiagram.py", line 598, in create_from_rows
    return self.extract_rowwise_duplicates(var_dict, relevant_ids)
  File "/Users/janez/orange3/Orange/widgets/visualize/owvenndiagram.py", line 634, in extract_rowwise_duplicates
    x, m, y, t_ids = self.expand_table(extracted, all_atrs, all_metas, all_cv)
  File "/Users/janez/orange3/Orange/widgets/visualize/owvenndiagram.py", line 616, in expand_table
    array[:, perm] = b
ValueError: could not convert string to float: 'YGL048C'

@janezd
Copy link
Contributor

janezd commented Feb 19, 2020

Run widget from main. Use rowwise. Match by "Test". Check Output duplicates and select some (non-empry) area. Widget crashes with

  File "/Users/janez/orange3/Orange/widgets/visualize/owvenndiagram.py", line 663, in commit
    selected = self.create_from_rows(selected_ids, False)
  File "/Users/janez/orange3/Orange/widgets/visualize/owvenndiagram.py", line 598, in create_from_rows
    return self.extract_rowwise_duplicates(var_dict, relevant_ids)
  File "/Users/janez/orange3/Orange/widgets/visualize/owvenndiagram.py", line 640, in extract_rowwise_duplicates
    values = {'attributes': [np.vstack(all_x)],
  File "/Users/janez/miniconda3/envs/o3/lib/python3.7/site-packages/numpy/core/shape_base.py", line 283, in vstack
    return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
ValueError: need at least one array to concatenate

@AndrejaKovacic
Copy link
Contributor Author

Run widget from main. Use rowwise. Match by "Test". Check Output duplicates and select some (non-empry) area. Widget crashes with

  File "/Users/janez/orange3/Orange/widgets/visualize/owvenndiagram.py", line 663, in commit
    selected = self.create_from_rows(selected_ids, False)
  File "/Users/janez/orange3/Orange/widgets/visualize/owvenndiagram.py", line 598, in create_from_rows
    return self.extract_rowwise_duplicates(var_dict, relevant_ids)
  File "/Users/janez/orange3/Orange/widgets/visualize/owvenndiagram.py", line 640, in extract_rowwise_duplicates
    values = {'attributes': [np.vstack(all_x)],
  File "/Users/janez/miniconda3/envs/o3/lib/python3.7/site-packages/numpy/core/shape_base.py", line 283, in vstack
    return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
ValueError: need at least one array to concatenate

This crash was caused by Test being a StringVariable only by name, but not by nature :)

@janezd
Copy link
Contributor

janezd commented Feb 20, 2020

This crash was caused by Test being a StringVariable only by name, but not by nature :)

I see. It's heart wasn't in it. :)

@janezd janezd merged commit 9541a29 into biolab:master Feb 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Venn Diagram is very slow with some data sets
2 participants