Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quotas: hard to understand logic #158

Open
VMois opened this issue Dec 21, 2021 · 1 comment
Open

quotas: hard to understand logic #158

VMois opened this issue Dec 21, 2021 · 1 comment

Comments

@VMois
Copy link

VMois commented Dec 21, 2021

Originated in #156 (review)

I had a few thoughts while working on and reviewing things related to quotas:

  • The update_users_cpu_quota function, and other quota functions that rely on checking of env variables inside, have non-obvious behavior. For example, update_users_cpu_quota has a verb update_... which implies it will update CPU quotas but sometimes it will not if the combination of env variables is not right.

  • The WORKFLOW_TERMINATION_QUOTA_UPDATE_POLICY env variable is used not only to configure what quotas are calculated when the workflow is finished but also affects whatever quotas are calculated in the REST API endpoint in r-workflow-controller during file upload or deletion (line). And this is not obvious from the name, TERMINATION doesn't imply it will stop calculating quotas during upload/deletion. From the user/admin perspective, maybe, not much to worry about but for a developer it is confusing.

The above changes were introduced because we wanted to quickly disable disk quotas to benchmark cluster (and I was part of it) but I am not sure it is a good long-term solution.

I don't have any concrete suggestions on how to make quotas logic better (except, maybe, moving ENV variables out from update_... functions). Would be good to hear your opinion on this + maybe you have other comments related to quota logic.

@VMois
Copy link
Author

VMois commented Aug 4, 2022

Another thing is that when the workflow is finished/failed, and quotas are recalculated inside sqlalchemy hook (code). Which is probably a wrong decision because we cannot easily re-try or delay recalculation.

A better way might be to handle quotas recalculations in a separate consumer asynchronously with a dedicated queue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant