-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues to run ESM4 #11
Comments
Without knowing the error, it's difficult to know what went wrong. There are several things though that I can guess
|
Hi Thomas, thank you for your quick answer! Initially, the error was about the "Partition Nodes Limit", that is,
when I try to use the default configuration. And when I change this to 30 nodes or less, the run only break. So, when I make " sacct -j 10597390 --format=Jobname,partition,time,start,end,nnodes,state,nodelist,ncpus " command, I get these answers: For 30 nodes:
For 10 nodes:
For only 1 nodes:
However, seeing in your answer, maybe it's for forgetting to change the ocean_ncores in another list of names besides the one in ./ESM4_rundir/input.nml. I try to find the other namelist and change it to correct ncores. Thomas, in the 1 point of your answer, y and x are the number of nodes (nnode, --nodes=y) and the number of cores per node (ncore_node, --ntasks-per-node=x), or the inverse, respectivily?! |
in the Again, the best/easiest strategy for running the model is to run it with the prescribed number of cores. It's difficult to change the number of cores, especially for the ocean. I don't really understand the information you are showing me. It looks like you are asking me how many nodes you need, and I have never worked with your specific computer. I think you need help from someone local to figure out how to get the model running and how many nodes you need. |
Ok, I'll go back to the default settings. So, what the x,y, and 6 numbers mean? in the input.nml file there is |
The fv_core_nml is referring to the atmosphere only. |
Oh, Thank you for your answer, Thomas! Yes, there are atmos_pes and ocean_pes, which are 1728 and 1437, respectively. I'm trying some slurm configurations, using the standard ESM4 cores configurations. If it runs, I will tell you. |
Hi, am I again, Jaime. I managed to compile the model on our machine, as I mentioned in the previous issue. Now, I'm struggling to run it. I'm finding some problems to run the model with the default settings (existing) in our machine (to do the first tests to run).
The amount of nodes/cores available to run the model is large and It no all the time these quantities are available to run the model with the configuration presented on our machine. That is, running with more than 3100 colors (actually 3165, atmos_npes = 1728, 1728 core for atmospheric model, and ocean_npes = 1437, 1437 for ocean), as configured in the namelist (in floder ./ESM4_rundir/input.nml) and run script ( floder ./run/) of the model.
In the partition I have access to, each node has 48 cores, in fact, there are those cores quantities (there are about 90 nodes in the partition that I have access to), but I'm trying to test with just a few nodes (about 10 just to see how the model will behave). It seems to me, that 66 of 90 nodes, is a sufficient number of nodes to run the model with these default settings, however, the logistics of this are quite difficult. For that reason, I'm trying fewer nodes just as a test to see if the model will run and how it will behave. However, when I'm testing with a configuration of 10 knots or even less (I've tested less and more than 10, in this case, 30) the round breaks before it even starts. Could someone help me understand why?
Below is the bash script used to run the model:
Please, looking at these settings, can anyone see any errors that are causing the test runs to break?
Another thing, when I change the default values (atm_cores=1728 and ocn_cores=1437), should I also change the values in the namelist (in ./ESM4_rundir/input.nml)?
I made this change. I don't know if this is what is causing the crash.
Thanks.
The text was updated successfully, but these errors were encountered: