Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continuize: multiple attributes with the same name #6877

Closed
ZanMervic opened this issue Aug 21, 2024 · 1 comment · Fixed by #6878
Closed

Continuize: multiple attributes with the same name #6877

ZanMervic opened this issue Aug 21, 2024 · 1 comment · Fixed by #6878

Comments

@ZanMervic
Copy link
Contributor

What's wrong?

This issue is related to the issue with Discretization #6876.
If the input to the Continuize widget has attributes with multiple values with the same "name"/"value" (see Issue #6876 for a better explanation), the One-hot encoding will create multiple attributes with the same name which results in an exception.

Workflow I used (an extension of the workflow from issue #6876):
image

Exception:

Traceback (most recent call last):
  File "C:\Users\zanme\work\orange3\orange3\Orange\widgets\data\owcontinuize.py", line 458, in _on_radio_clicked
    self.commit.deferred()
  File "C:\Users\zanme\miniconda3\envs\orange3\Lib\site-packages\orangewidget\gui.py", line 2006, in conditional_commit
    do_commit()
  File "C:\Users\zanme\miniconda3\envs\orange3\Lib\site-packages\orangewidget\gui.py", line 2014, in do_commit
    commit.call()
  File "C:\Users\zanme\miniconda3\envs\orange3\Lib\site-packages\orangewidget\gui.py", line 1879, in call
    acting_func(instance)
  File "C:\Users\zanme\work\orange3\orange3\Orange\widgets\data\owcontinuize.py", line 517, in commit
    self.Outputs.data.send(self._prepare_output())
                           ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\zanme\work\orange3\orange3\Orange\widgets\data\owcontinuize.py", line 534, in _prepare_output
    return self.data.transform(Domain(attrs, class_vars, metas))
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\zanme\work\orange3\orange3\Orange\data\domain.py", line 154, in __init__
    raise Exception('All variables in the domain should have'
Exception: All variables in the domain should have unique names.

Screenshot of the raised exception and the two attributes with the same name:

image

Note

Because of this issue, a test was failing for the ScoringSheet widget. I have temporarily excluded the widget from the test, but it should be included again when the issue is resolved.

Test: Orange.tests.test_classification.LearnerAccessibility.test_all_models_work_after_unpickling_pca

How can we reproduce the problem?

Zip of the workflow: continuize_bug.zip

To reproduce the problem, set the PCA components to 8 in the provided workflow.
image

What's your environment?

  • Operating system: Windows 10
  • Orange version: 3.38
  • How you installed Orange: Using pip in a conda environment
@ZanMervic ZanMervic added the bug report Bug is reported by user, not yet confirmed by the core team label Aug 21, 2024
@janezd
Copy link
Contributor

janezd commented Aug 21, 2024

Code may assume that values of categorical variables are unique. The bug is thus in discretization. Adding np.unique, as I suggested in a comment in #6876, resolves it.

I nevertheless made #6878 to prevent construction of variables with duplicated values, so any future bugs that result in duplicated values will be reported earlier, at the appropriate place.

@janezd janezd removed the bug report Bug is reported by user, not yet confirmed by the core team label Aug 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants