Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added cd to upload to conda #2

Merged
merged 3 commits into from
Jul 2, 2024
Merged

added cd to upload to conda #2

merged 3 commits into from
Jul 2, 2024

Conversation

utkarshgupta95
Copy link

Closes #1

Copy link
Member

@aidanheerdegen aidanheerdegen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@aidanheerdegen
Copy link
Member

There seems to be a problem with later versions of python.

This string

https://github.com/ACCESS-NRI/ncigrafana/blob/master/ncigrafana/UsageDataset.py#L462-L469

when it is formatted in the query here

https://github.com/ACCESS-NRI/ncigrafana/blob/master/ncigrafana/UsageDataset.py#L462-L469

ends up looking like this:

'SELECT printf("%s (%s)", Users.fullname, Users.user) as Name, scandate as Date, SUM((\'size\',)) AS totsize \n        FROM (\'UserStorage\',)\n        LEFT JOIN Users ON (\'UserStorage\',).user_id = Users.id\n        WHERE scandate between \'1984-07-01\' AND \'1984-09-30\'\n        AND project_id = (1,)\n        AND storagepoint_id = (4,)\n        GROUP BY Name, Date\n        ORDER BY Date'

so the formatting has treated all the strings and numbers as arrays.

Very odd, but I assume this is something that change in python from v3.8 to 3.9, as the former was fine. We should, in any case, test with much more modern versions (3.11 and 3.12), but I assume they also exhibit the same behaviour.

Is that enough to go on with @utkarshgupta95?

@aidanheerdegen
Copy link
Member

BTW I debugged this by cloning the repo and running

python -m pytest --pdb -s test

I also added

        import pdb; pdb.set_trace()

In the getstorage function so I could step through the code and print out the variables and see what it was doing.

@utkarshgupta95
Copy link
Author

utkarshgupta95 commented Jun 27, 2024

@aidanheerdegen The issue seems to be with pandas >= 2.2.0. After debugging for hours I did the very basic step to print the exception for this line, which gave

'Connection' object has no attribute 'cursor'

The issue is similar to the one below, although it uses SQLAlchemy instead of dataset
https://stackoverflow.com/questions/38332787/pandas-to-sql-to-sqlite-returns-engine-object-has-no-attribute-cursor

I cannot find a similar approach for dataset Do you have idea on how we can adapt the above solution for dataset package?

Alternatively I can execute the query with dataset.query() and then convert the result obtained to pandas.

@aidanheerdegen
Copy link
Member

I cannot find a similar approach for dataset Do you have idea on how we can adapt the above solution for dataset package?

No.

Alternatively I can execute the query with dataset.query() and then convert the result obtained to pandas.

That sounds like a good approach.

Copy link
Member

@aidanheerdegen aidanheerdegen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for tracking that down that error and fixing it.

@utkarshgupta95 utkarshgupta95 merged commit 5e78942 into master Jul 2, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add CD to deploy to accessnri conda channel
2 participants