Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvement : parallelized PrePARE could use all available CPUs #347

Open
senesis opened this issue May 2, 2018 · 2 comments
Open

Improvement : parallelized PrePARE could use all available CPUs #347

senesis opened this issue May 2, 2018 · 2 comments
Milestone

Comments

@senesis
Copy link

senesis commented May 2, 2018

This could come with value 'no_limit' for its argument max-threads, and could use https://docs.python.org/2/library/multiprocessing.html#multiprocessing.cpu_count

@senesis
Copy link
Author

senesis commented May 2, 2018

However, PrePARE can exhaust available memory of 64 Gb nodes when launched with max-threads=150 in a job allocated with 21 nodes having each 40 CPUs.

Checking data... /slurmstepd: Job 66629726 exceeded memory limit (65649604 > 62914560), being killed
Exception in thread Thread-1:
Traceback (most recent call last):
File "/scratch/CMIP6/V1/externals/miniconda2/envs/cmor/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/scratch/CMIP6/V1/externals/miniconda2/envs/cmor/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/scratch/CMIP6/V1/externals/miniconda2/envs/cmor/lib/python2.7/multiprocessing/pool.py", line 328, in _handle_workers
pool._maintain_pool()
File "/scratch/CMIP6/V1/externals/miniconda2/envs/cmor/lib/python2.7/multiprocessing/pool.py", line 232, in _maintain_pool
self._repopulate_pool()
File "/scratch/CMIP6/V1/externals/miniconda2/envs/cmor/lib/python2.7/multiprocessing/pool.py", line 225, in _repopulate_pool
w.start()
File "/scratch/CMIP6/V1/externals/miniconda2/envs/cmor/lib/python2.7/multiprocessing/process.py", line 130, in start
self._popen = Popen(self)
File "/scratch/CMIP6/V1/externals/miniconda2/envs/cmor/lib/python2.7/multiprocessing/forking.py", line 121, in init
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
slurmstepd: Exceeded job memory limit

When launched with max-threads=100 , it uses 25Go on node with highest memory use

@durack1 durack1 added this to the 4.0/Future milestone Apr 7, 2024
@durack1
Copy link
Contributor

durack1 commented Apr 7, 2024

This would be useful to assess in CMOR4 planning

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants