Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] table_from_frame: replace nan with String.Unknown for string variable #5795

Merged
merged 1 commit into from
Jan 19, 2022

Conversation

PrimozGodec
Copy link
Contributor

@PrimozGodec PrimozGodec commented Jan 18, 2022

Issue
  1. When transforming pandas data frame to table string columns that contain nan values will
    keep them after the transformation but Orange uses an empty string for the unknown value in the StringVarable.

    For example:
    df = pd.DataFrame(
    [["a", "b"], ["c", "d"], ["e", "f"], [np.nan, np.nan]],
    )

    will be transformed to the table with two string variables and nan values will be kept even String.Unknow = ""

  2. When a column is recognized to be a string and has an object type, it still can contain some values that are not strings. Cast column to string.

Description of changes

Changed that nan values are transformed to String.Unknown for columns that will be transformed to the string variable and values are transformed to strings.

Includes
  • Code changes
  • Tests
  • Documentation

@markotoplak
Copy link
Member

Could you better explain what is the reasoning here?

Does not Orange use the empty string OR nan? I'd lean towards using nan wherever possible. And this goes further from what I'd like to see. :)

@PrimozGodec
Copy link
Contributor Author

Could you better explain what is the reasoning here?

I added a better explanation of the issue.

Does not Orange use the empty string OR nan? I'd lean towards using nan wherever possible. And this goes further from what I'd like to see. :)

I am not sure if nan is supported by all widgets (at least it does not work in Orange3-text addon). I can start working on supporting it there. I changed my implementation that it uses StringVariable.Unknow instead of hardcoded "" such that it will still work when we decide to use nan instead of "".

@PrimozGodec PrimozGodec changed the title table_from_frame: replace nan with "" for string variable [FIX] table_from_frame: replace nan with String.Unknown for string variable Jan 19, 2022
@markotoplak markotoplak self-assigned this Jan 19, 2022
@codecov
Copy link

codecov bot commented Jan 19, 2022

Codecov Report

Merging #5795 (3e6f5bf) into master (c860359) will not change coverage.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #5795   +/-   ##
=======================================
  Coverage   86.12%   86.12%           
=======================================
  Files         316      316           
  Lines       66386    66386           
=======================================
  Hits        57173    57173           
  Misses       9213     9213           

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants