Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GSoC] Parallelisation of AnalysisBase with multiprocessing and dask #4162

Merged
merged 288 commits into from
Aug 16, 2024

Conversation

marinegor
Copy link
Contributor

@marinegor marinegor commented Jun 5, 2023

Fixes #4158
Also fixes # 4259 as a check in AnalysisBase.run() no it doesn't, I'll do another PR that does that.

Related to #4158, does not help f-i-x-i-n-g (cudos to github bot) the issue per se but paves the way towards that.

Changes made in this Pull Request:

  • several methods added analysis.base.AnalysisBase, implementing backend configuration, splitting of the frames for analysis, computation, and results aggregation
  • module analysis.backends introduces BackendBase class, as well as built-in backends BackendMultiprocessing, BackendSerial and BackendDask, implementing the apply method for computations using various backends
  • module analysis.results introduces ResultsGroup class that allows for merging of multiple uniform Results objects from the same module, given appropriate aggregation functions

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

📚 Documentation preview 📚: https://mdanalysis--4162.org.readthedocs.build/en/4162/

@github-actions
Copy link

github-actions bot commented Jun 5, 2023

Linter Bot Results:

Hi @marinegor! Thanks for making this PR. We linted your code and found the following:

Some issues were found with the formatting of your code.

Code Location Outcome
main package ⚠️ Possible failure
testsuite ⚠️ Possible failure

Please have a look at the darker-main-code and darker-test-code steps here for more details: https://github.com/MDAnalysis/mdanalysis/actions/runs/10422317333/job/28866526863


Please note: The black linter is purely informational, you can safely ignore these outcomes if there are no flake8 failures!

@codecov
Copy link

codecov bot commented Jun 5, 2023

Codecov Report

Attention: Patch coverage is 97.10744% with 7 lines in your changes missing coverage. Please review.

Project coverage is 93.62%. Comparing base (d16b8d4) to head (1dc4613).
Report is 42 commits behind head on develop.

Files with missing lines Patch % Lines
package/MDAnalysis/analysis/base.py 96.52% 2 Missing and 2 partials ⚠️
package/MDAnalysis/analysis/results.py 95.71% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##           develop    #4162    +/-   ##
=========================================
  Coverage    93.61%   93.62%            
=========================================
  Files          171      173     +2     
  Lines        21254    21419   +165     
  Branches      3937     3978    +41     
=========================================
+ Hits         19897    20053   +156     
- Misses         898      903     +5     
- Partials       459      463     +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@RMeli RMeli self-requested a review June 5, 2023 19:34
@RMeli
Copy link
Member

RMeli commented Jun 5, 2023

Great to see things are moving @marinegor!

I'll have a proper look at this PR when back from holidays, but I just wanted to point out that GitHub is not very smart, so "Does not fix #4162" will actually automatically close #4162 when this is merged. "Related to #4162" or something similar (with no "fix" before the PR number) would avoid this issue.

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start. I left a few comments on the code.

package/MDAnalysis/analysis/base.py Outdated Show resolved Hide resolved
package/MDAnalysis/analysis/base.py Outdated Show resolved Hide resolved
package/MDAnalysis/analysis/base.py Outdated Show resolved Hide resolved
package/MDAnalysis/analysis/base.py Outdated Show resolved Hide resolved
@yuxuanzhuang yuxuanzhuang self-requested a review June 6, 2023 07:15
@orbeckst
Copy link
Member

orbeckst commented Jun 8, 2023

@yuxuanzhuang any initial comments so that @marinegor can move forward?

package/MDAnalysis/analysis/base.py Outdated Show resolved Hide resolved
package/MDAnalysis/analysis/base.py Outdated Show resolved Hide resolved
package/MDAnalysis/analysis/base.py Outdated Show resolved Hide resolved
package/MDAnalysis/analysis/base.py Outdated Show resolved Hide resolved
package/MDAnalysis/analysis/base.py Outdated Show resolved Hide resolved
package/MDAnalysis/analysis/base.py Show resolved Hide resolved
package/MDAnalysis/analysis/base.py Outdated Show resolved Hide resolved
@marinegor
Copy link
Contributor Author

I added suggestions from @RMeli and @IAlibay (thanks a lot!), and opened issues for few non-critical ValueErrors raised here and there and noted by @IAlibay.

That's it for me, @orbeckst feel free to proceed!

@IAlibay
Copy link
Member

IAlibay commented Aug 11, 2024

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Azure pipelines fails seem completely unrelated - I wouldn't let them be blockers

@IAlibay
Copy link
Member

IAlibay commented Aug 11, 2024

@orbeckst I'm letting you do the final call on merging.

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor fixes (which I will add) and a few where I need @marinegor to have a quick look (and update with the suggestions if ok).

package/MDAnalysis/analysis/base.py Outdated Show resolved Hide resolved
package/MDAnalysis/analysis/base.py Outdated Show resolved Hide resolved
package/MDAnalysis/analysis/base.py Outdated Show resolved Hide resolved
package/MDAnalysis/analysis/base.py Outdated Show resolved Hide resolved
package/MDAnalysis/analysis/base.py Show resolved Hide resolved
package/MDAnalysis/analysis/base.py Outdated Show resolved Hide resolved
package/MDAnalysis/analysis/results.py Outdated Show resolved Hide resolved
package/MDAnalysis/analysis/results.py Show resolved Hide resolved
@orbeckst
Copy link
Member

@marinegor please check my two suggested changes. If they are ok, please add accept the suggestions or make appropriate changes yourself. Then ping me and assuming that CI still passes (at least non Azure) I will then finally squash-merge (while keeping the co-authors line with the reviewers).

@marinegor
Copy link
Contributor Author

@orbeckst I agree with both of the changes, thanks. I removed the issubclass check that you added after the failing test, since it's only for classes and not for their instances.

Then ping me and assuming that CI still passes (at least non Azure)

ping!

I will then finally squash-merge (while keeping the co-authors line with the reviewers).

🤞🤞🤞

Copy link
Member

@p-j-smith p-j-smith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @marinegor! An excellent contribution to MDAnalysis

@orbeckst orbeckst merged commit 481e36a into MDAnalysis:develop Aug 16, 2024
17 of 23 checks passed
@orbeckst
Copy link
Member

HOORAY!!!!

merged

@RMeli
Copy link
Member

RMeli commented Aug 16, 2024

Wooooooah! 🥳

Congratulations and than you for the massive effort @marinegor! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Introducing dask-based parallel backend for the AnalysisBase.run()
9 participants