-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Distances: Optimize PearsonR/SpearmanR #2852
[ENH] Distances: Optimize PearsonR/SpearmanR #2852
Conversation
0e915a6
to
1d5c34f
Compare
Codecov Report
@@ Coverage Diff @@
## master #2852 +/- ##
==========================================
+ Coverage 81.91% 81.92% +0.01%
==========================================
Files 326 326
Lines 55997 56031 +34
==========================================
+ Hits 45868 45903 +35
+ Misses 10129 10128 -1 |
* Use numpy.corrcoef in PearsonR * Optimize PearsonR/SpearmanR when computing pairwise distances on a single input table
... for the case where computing distances from two tables.
1d5c34f
to
f837920
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks ok. Do you have any measurements of how much we gain with this re-implementation of what we previously handed over to numpy and scipy?
rho = rho[:2, :2].copy() | ||
else: | ||
# scalar if n1 == 1 | ||
rho = stats.spearmanr(x1, axis=self.axis)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these two cases (if, else) necessary? At first glance stats.spearmanr seems to (efficiently) handle the case of a missing second attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind.
In [1]: import Orange, numpy
In [2]: d = Orange.data.Table(numpy.random.random(size=(200, 200)))
In [3]: %timeit Orange.distance.PearsonR(d) Before
After
In [1]: import Orange, numpy
In [2]: d = Orange.data.Table(numpy.random.random(size=(400, 200)))
In [3]: %timeit Orange.distance.SpearmanR(d[1:], d[:-1]) Before
After
|
Issue
Description of changes
numpy.corrcoef
for PearsonRIncludes