Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1247349: Add a function to test for DataFrames equality #1320

Closed
valdAsync opened this issue Mar 18, 2024 · 6 comments · Fixed by #2010
Closed

SNOW-1247349: Add a function to test for DataFrames equality #1320

valdAsync opened this issue Mar 18, 2024 · 6 comments · Fixed by #2010
Labels
feature New feature or request status-triage_done Initial triage done, will be further handled by the driver team

Comments

@valdAsync
Copy link

What is the current behavior?

There is no simple way for a user to compare equality of two DataFrames.

What is the desired behavior?

A simple function for testing equality of two DataFrames.

How would this improve snowflake-snowpark-python?

Currently, there is no straight way to test for snowflake-snowpark-python DataFrame equality. The new function would assert DataFrame equality and would provide a user-friendly error message in the case of unequal DataFrames.

References, Other Background

PySpark 3.5.0 update https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.testing.assertDataFrameEqual.html

@valdAsync valdAsync added the feature New feature or request label Mar 18, 2024
@github-actions github-actions bot changed the title Add a function to test for DataFrames equality SNOW-1247349: Add a function to test for DataFrames equality Mar 18, 2024
@sfc-gh-sghosh sfc-gh-sghosh self-assigned this Mar 20, 2024
@sfc-gh-sghosh
Copy link

sfc-gh-sghosh commented Mar 20, 2024

Hello @valdAsync ,

Thanks for raising the issue. As of now Snowpark dataframe doesnt have the API assertDataFrameEqual, please see if below workaround works for you.

Example
Convert the snowpark dataframe to pandas dataframe and use equals method.

`session.sql("alter session set PYTHON_CONNECTOR_QUERY_RESULT_FORMAT=ARROW").collect()
df1 = session.table("sampletable1")
df2 = session.table("sampletable2")

pandasDF1 = df1.to_pandas()
pandasDF2 = df2.to_pandas()

areEqual = pandasDF1.equals(pandasDF2)

print(areEqual)`

Regards,
Sujan

@sfc-gh-sghosh sfc-gh-sghosh added the status-triage Issue is under initial triage label Mar 20, 2024
@sfc-gh-sghosh
Copy link

sfc-gh-sghosh commented Mar 21, 2024

Hello @valdAsync ,

An enhancement request has been raised to add support in snowpark python dataframe APIs.

Regards,
Sujan

@sfc-gh-sghosh sfc-gh-sghosh removed their assignment Mar 21, 2024
@sfc-gh-sghosh sfc-gh-sghosh added status-triage_done Initial triage done, will be further handled by the driver team and removed status-triage Issue is under initial triage labels Mar 21, 2024
@valdAsync
Copy link
Author

Hello @sfc-gh-sghosh,

Thank you for your comment and the provided workaround, it will come in handy. However, a more straightforward solution would be appreciated.

If you consider this a good first issue, I would like to give it a shot myself.

@sfc-gh-sghosh
Copy link

Hello @valdAsync ,

We are currently addressing the feature request and will keep you updated on its progress in this thread. In the meantime, feel free to utilize the workaround provided above until the feature is fully implemented.

Regards,
Sujan

@duongleh
Copy link

hi @sfc-gh-sghosh I'm looking for this feature to be implemented, do you have any update?

@sfc-gh-jdu
Copy link
Collaborator

Sorry for the late update - this testing function will be added to next Snowpark release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request status-triage_done Initial triage done, will be further handled by the driver team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants