Skip to content

Commit

Permalink
statistics.utils: Fix stats for sparse when last column missing
Browse files Browse the repository at this point in the history
`np.bincount` has a length of the maximal element and can hence be shorter than the number of columns (e.g. when some of the last columns have all zeros). This forces it to count non zero elements for all columns.
  • Loading branch information
nikicc committed Jul 15, 2016
1 parent b77c021 commit ff2a5b4
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 1 deletion.
2 changes: 1 addition & 1 deletion Orange/statistics/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,7 @@ def stats(X, weights=None, compute_variance=False):
if compute_variance:
raise NotImplementedError

non_zero = np.bincount(X.nonzero()[1])
non_zero = np.bincount(X.nonzero()[1], minlength=X.shape[1])
X = X.tocsc()
return np.column_stack((
X.min(axis=0).toarray().ravel(),
Expand Down
8 changes: 8 additions & 0 deletions Orange/tests/test_statistics.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,3 +41,11 @@ def test_stats_sparse(self):
[0, 1, .2, 0, 4, 1],
[0, 1, .2, 0, 4, 1],
[0, 1, .2, 0, 4, 1]])

# assure last two columns have just zero elements
X = X[:3]
np.testing.assert_equal(stats(X), [[0, 1, 1/3, 0, 4, 1],
[0, 1, 1/3, 0, 4, 1],
[0, 1, 1/3, 0, 4, 1],
[0, 0, 0, 0, 5, 0],
[0, 0, 0, 0, 5, 0]])

0 comments on commit ff2a5b4

Please sign in to comment.