-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FIX] Fix Chi2 computation for variables with values with no instances #2031
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2031 +/- ##
==========================================
- Coverage 70.7% 69.69% -1.01%
==========================================
Files 343 343
Lines 54469 54478 +9
==========================================
- Hits 38510 37967 -543
- Misses 15959 16511 +552 Continue to review full report at Codecov.
|
Orange/widgets/visualize/owsieve.py
Outdated
@@ -42,6 +42,8 @@ def __init__(self, data, attr1, attr2): | |||
self.expected = np.outer(self.probs_y, self.probs_x) * self.n | |||
self.residuals = \ | |||
(self.observed - self.expected) / np.sqrt(self.expected) | |||
where_are_NaNs = np.isnan(self.residuals) | |||
self.residuals[where_are_NaNs] = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gh-2031 | ||
Check if it can calculate chi square when there are no attributes which suppose to be. | ||
""" | ||
tempdir = tempfile.mkdtemp() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of creating files in code, you could either commit the file along the test, or even better create the table directly with something along the lines of:
a, b = Orange.data.DiscreteVariable("a", values=["y", "n"]), Orange.data.DiscreteVariable("b", values=["y", "n", "other"])
t = Orange.data.Table(Orange.data.Domain([a, b], list(zip("yynny", "ynyyn"))))
…which suppose to be Chi-squared test is nan when there are attributes which are not in the data. It is caused by division by zero because code does not calculate limits. It actually suppose to be 0. Check if there is NaN in the array and then change that value to 0. - [X] Code changes - [X] Tests - [ ] Documentation
Chi-squared test is nan when there are attributes which are not in the data. It is caused by division by zero because code does not calculate limits. It actually suppose to be 0.