model execution and case run error #5697

PappuP · 2023-05-17T22:37:04Z

PappuP
May 17, 2023

I am graduate student of UIUC Atmospheric Sciences, trying to port E3SM 2.1 on our local computing cluster. I have created, built and run for compset A and X successfully to our machine. However, I am now facing an error while running F compset case.

I used resolution ne11_ne11 and compset FAQP to create new case

./create_newcase --case ne11AQP --res ne11_ne11 --compset FAQP –mach keeling
cd <case directory>
./case.setup
./case.build

MODEL BUILD HAS FINISHED SUCCESSFULLY

./case.submit

Error calling advance_xp2_xpyp
 Fatal error in CLUBB: at timestep            1 LAT (radians):   0.81141059892242973      LON (radians):    1.8020098759391776      LAT (degrees):    46.490402770437605      LON (degrees):    103.24756053220796      Global Column Number:         6184
 ERROR: clubb_tend_cam:  Fatal error in CLUBB libraryERROR in /data/keeling/a/pappup2/E3SM2.1/E3SM/components/eam/src/physics/cam/clubb_intr.F90 at line 2296
#0  0xf6a4dd in ???
#1  0xf6a65b in ???
#2  0x535d4a in ???
#3  0x9c57bd in ???
#4  0xa766bd in ???
#5  0xa7d9a4 in ???
#6  0x4d6152 in ???
#7  0x4ce27c in ???
#8  0x43426a in ???
#9  0x4237a7 in ???
#10  0x433437 in ???
#11  0x2b0b7e364554 in ???
#12  0x4155b2 in ???
#13  0xffffffffffffffff in ???
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1001.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------

E3SM log

e3sm.log.1672085.230517-161207.txt

Machine Configuration: --mach keeling

keeling.cmake.txt
config_machines.xml.txt
config_batch.xml.txt
build_environment.txt

rljacob · 2023-05-18T05:20:38Z

rljacob
May 18, 2023
Maintainer

Try -res ne4_ne4. Thats the resolution we test that compset with.

10 replies

PappuP May 18, 2023
Author

Great! Thanks. This documentation seems helpful

PappuP May 24, 2023
Author

@rljacob Currently I am facing another issue of run my case

./create_newcase --case F2010_ne30_oQU120 --compset F2010 --res ne30_oQU120 --mach keeling
./case.setup

After creating and setting up my new case I got this case info:

./preview_run


CASE INFO:
  nodes: 128
  total tasks: 128
  tasks per node: 1
  thread count: 1
  ngpus per node: 0


BATCH INFO:
Deprecated "arg" node detected in /data/keeling/a/pappup2/E3SM2.1/E3SM/cime/scripts/F2010_ne30_oQU120/env_batch.xml, check files /data/keeling/a/pappup2/E3SM2.1/E3SM/cime_config/machines/config_batch.xml
  FOR JOB: case.run
    ENV:
      Setting Environment OMP_NUM_THREADS=1

    SUBMIT CMD:
      sbatch --time 96:00:00 .case.run --resubmit

    MPIRUN (job=case.run):
      mpirun -n 128  --oversubscribe -n 128 /data/keeling/a/pappup2/a/E3SM/F2010_ne30_oQU120/run/F2010_ne30_oQU120/bld/e3sm.exe   >> e3sm.log.$LID 2>&1

One task per node looks extremely inefficient. Is there any way that I can control nodes and tasks per node in my case? I think some argument in config_machine.xml handles this issue but not sure what is actually appropriate for me.
In our machine, I am allowed to use a maximum of 8 nodes and 48 or 32 tasks per node at a time. So basically 4 nodes can handle my current job. Can you provide any help for this?

Currently, I am using the --oversubscribe flag in the config_machine.xml in my cases, but it takes too much time to run. Without this flag, I got this error message when I submit case

--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 128
slots that were requested by the application:

  /data/keeling/a/pappup2/a/E3SM/F2010_ne30_oQU120/run/F2010_ne30_oQU120/bld/e3sm.exe

Either request fewer slots for your application, or make more slots
available for use.

A "slot" is the Open MPI term for an allocatable unit where we can
launch a process.  The number of slots available are defined by the
environment in which Open MPI processes are run:

  1. Hostfile, via "slots=N" clauses (N defaults to number of
     processor cores if not provided)
  2. The --host command line parameter, via a ":N" suffix on the
     hostname (N defaults to 1 if not provided)
  3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
  4. If none of a hostfile, the --host command line parameter, or an
     RM is present, Open MPI defaults to the number of processor cores

In all the above cases, if you want Open MPI to default to the number
of hardware threads instead of the number of processor cores, use the
--use-hwthread-cpus option.

Alternatively, you can use the --oversubscribe option to ignore the
number of available slots when deciding the number of processes to
launch.
--------------------------------------------------------------------------

rljacob May 31, 2023
Maintainer

What is <MAX_MPITASKS_PER_NODE> set to on your machine definition? I think if you set that to 48 or 32, you'll get the correct node count.

PappuP May 31, 2023
Author

I set <MAX_MPITASKS_PER_NODE> 48 and then my preview_run

CASE INFO:
  nodes: 3
  total tasks: 128
  tasks per node: 48
  thread count: 1
  ngpus per node: 0

BATCH INFO:
Deprecated "arg" node detected in /data/keeling/a/pappup2/E3SM2.1/E3SM/cime/scripts/F2010_ne30_oQU120_3/env_batch.xml, check files /data/keeling/a/pappup2/E3SM2.1/E3SM/cime_config/machines/config_batch.xml
  FOR JOB: case.run
    ENV:
      Setting Environment OMP_NUM_THREADS=1

    SUBMIT CMD:
      sbatch --time 168:00:00 -q lr3 .case.run --resubmit

    MPIRUN (job=case.run):
      mpirun -N 48 /data/keeling/a/pappup2/a/E3SM/F2010_ne30_oQU120_3/run/F2010_ne30_oQU120_3/bld/e3sm.exe   >> e3sm.log.$LID 2>&1

I got this error now:
e3sm.log.1677466.230527-221436.txt

Note: using --oversubscribe I successfully run the case but it takes ~37min for 5 days run. Seems like it took too much time. Is there any other efficient way?

PappuP May 31, 2023
Author

I have solved this issue by removing -N 48 and also it allows us to run without --oversubscribe

But for 5 days run using 3 nodes and 48 tasks per node, it needs ~37 min. I also tried run it using 6 nodes and it needs ~37 min. Our target is to run it for at least 20 years. So, using this configuration it will take ~35+ days to run 20 years.

Is there any better way to deal with this?

rljacob · 2023-06-01T04:46:38Z

rljacob
Jun 1, 2023
Maintainer

What kind of network is used to connect the nodes in your local cluster?

2 replies

PappuP Jun 1, 2023
Author

Our network on the compute nodes you're using is 10 Gbit Ethernet

rljacob Jun 1, 2023
Maintainer

That's a major reason why its slow. E3SM sends a lot of messages and needs a fast network like Infiniband to run efficiently.

mt5555 · 2023-06-01T19:28:27Z

mt5555
Jun 1, 2023
Maintainer

FAQP is an "aqua planet" case, and is basically an atmosphere-only model. To get a rough estimate of expected performance, you can look at atmopshere component performance in this paper: https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2022MS003156

Those results are for the "ne30pg2" atmosphere grid, which is our standard low resolution model. "ne4" is a very coarse resolution for testing - it should be well over 10x faster. But it can only use efficiently up to 96 MPI tasks. On a modern Intel Xeon or AMD Epyc cluster with low latency interconnect, on 96 cores, I'd think you could get close to 100 simulated years per day.

The standard 5 day run will write a lot of data for the restart file which will skew the timings if your I/O is slow.

0 replies

mt5555 · 2023-06-01T19:44:07Z

mt5555
Jun 1, 2023
Maintainer


  <machine MACH="keeling">
    <DESC>UIUC CentOS 7.9, os is Linux, 16 pes/node, batch system is SLURM</DESC>
    <OS>LINUX</OS>
    <COMPILERS>gnu</COMPILERS>
    <MPILIBS>openmpi</MPILIBS>
    <CIME_OUTPUT_ROOT>/data/keeling/a/pappup2/a/E3SM/$CASE/run</CIME_OUTPUT_ROOT>
    <DIN_LOC_ROOT>/data/keeling/a/pappup2/a/E3SM/E3SM_input_data</DIN_LOC_ROOT>
    <DIN_LOC_ROOT_CLMFORC>$DIN_LOC_ROOT/atm/datm7</DIN_LOC_ROOT_CLMFORC>
    <DOUT_S_ROOT>/data/keeling/a/pappup2/a/E3SM/$CASE/E3SM_output_data</DOUT_S_ROOT>
    <BASELINE_ROOT>/data/keeling/a/pappup2/a/E3SM/CCSM_BASELINE</BASELINE_ROOT>
    <CCSM_CPRNC>/data/keeling/a/pappup2/E3SM2.1/E3SM/cime_config/tools/cprnc/cprnc</CCSM_CPRNC>
    <BATCH_SYSTEM>slurm</BATCH_SYSTEM>
    <SUPPORTED_BY>e3sm</SUPPORTED_BY>
    <MAX_TASKS_PER_NODE>48</MAX_TASKS_PER_NODE>
    <MAX_MPITASKS_PER_NODE>1</MAX_MPITASKS_PER_NODE>
    <PROJECT_REQUIRED>FALSE</PROJECT_REQUIRED>
    <mpirun mpilib="openmpi">
      <executable>mpirun</executable>
      <arguments>
        <arg name="num_tasks"> -np {{ total_tasks }}</arg>
      </arguments>
    </mpirun>
    <module_system type="module">
    </module_system>

  </machine>

your machine file is setup to allow up to 48 tasks per node, but only 1 MPI task per node. both should be changed to 48. if you machine has only 24 cores (and the 48 is counting hyperthreading), set them both to 24, and only worry about using hyperthreading after getting good performance without it.

Another important thing for performance is core bindings - that's an option to mpirun to bind each MPI task to a single core. you'll need to check locally what that should be on keeling.

1 reply

PappuP Jun 1, 2023
Author

Thanks for your reply, I changed <MAX_MPITASKS_PER_NODE> 48
Then I created a case using F2010 compset and grid size at least 1 deg so I used res -ne30_oQU120. I run this case for 5 days using 3 nodes and 48 tasks per node and it took ~37 mins. So basically in this config will take ~35+ days for 20 years.

Update: We no longer need FAQP compset, we need F2010 compset and res 1 deg
Current setup

  <machine MACH="keeling">
    <DESC>UIUC CentOS 7.9, os is Linux, 16 pes/node, batch system is SLURM</DESC>
    <OS>LINUX</OS>
    <COMPILERS>gnu</COMPILERS>
    <MPILIBS>openmpi</MPILIBS>
    <CIME_OUTPUT_ROOT>/data/keeling/a/pappup2/a/E3SM/$CASE/run</CIME_OUTPUT_ROOT>
    <DIN_LOC_ROOT>/data/keeling/a/pappup2/a/E3SM/E3SM_input_data</DIN_LOC_ROOT>
    <DIN_LOC_ROOT_CLMFORC>$DIN_LOC_ROOT/atm/datm7</DIN_LOC_ROOT_CLMFORC>
    <DOUT_S_ROOT>/data/keeling/a/pappup2/a/E3SM/$CASE/E3SM_output_data</DOUT_S_ROOT>
    <BASELINE_ROOT>/data/keeling/a/pappup2/a/E3SM/CCSM_BASELINE</BASELINE_ROOT>
    <CCSM_CPRNC>/data/keeling/a/pappup2/E3SM2.1/E3SM/cime_config/tools/cprnc/cprnc</CCSM_CPRNC>
    <BATCH_SYSTEM>slurm</BATCH_SYSTEM>
    <SUPPORTED_BY>e3sm</SUPPORTED_BY>
    <MAX_TASKS_PER_NODE>48</MAX_TASKS_PER_NODE>
    <MAX_MPITASKS_PER_NODE>48</MAX_MPITASKS_PER_NODE>
    <PROJECT_REQUIRED>FALSE</PROJECT_REQUIRED>
    <mpirun mpilib="openmpi">
      <executable>mpirun</executable>

    </mpirun>
    <module_system type="module">
    </module_system>

  </machine>

mt5555 · 2023-06-01T20:16:46Z

mt5555
Jun 1, 2023
Maintainer

From that paper, I think this resolution on 5 nodes (AMD EPYC, 64 hardware cores per node) was 5 SYPD. You're getting 0.6 SYPD on 3 nodes. about 6x slower than a relatively new AMD EPYC system. A little slow, but could be explained by an old machine, or skewed benchmarks from a 5 day run (which is dominated by I/O costs instead of compute).

At this resolution, model should scale well up to 5400 MPI tasks, so the fact that you get no improvement going from 3 to 6 nodes suggests something is still wrong with the configuration/ mpirun command.

you should be able to get a 1.5x speedup by switching to "ne30pg2" for the atmosphere grid, matching the grid used in that paper.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model execution and case run error #5697

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 13 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

model execution and case run error #5697

PappuP May 17, 2023

Replies: 5 comments · 13 replies

rljacob May 18, 2023 Maintainer

PappuP May 18, 2023 Author

PappuP May 24, 2023 Author

rljacob May 31, 2023 Maintainer

PappuP May 31, 2023 Author

PappuP May 31, 2023 Author

rljacob Jun 1, 2023 Maintainer

PappuP Jun 1, 2023 Author

rljacob Jun 1, 2023 Maintainer

mt5555 Jun 1, 2023 Maintainer

mt5555 Jun 1, 2023 Maintainer

PappuP Jun 1, 2023 Author

mt5555 Jun 1, 2023 Maintainer

PappuP
May 17, 2023

Replies: 5 comments 13 replies

rljacob
May 18, 2023
Maintainer

PappuP May 18, 2023
Author

PappuP May 24, 2023
Author

rljacob May 31, 2023
Maintainer

PappuP May 31, 2023
Author

PappuP May 31, 2023
Author

rljacob
Jun 1, 2023
Maintainer

PappuP Jun 1, 2023
Author

rljacob Jun 1, 2023
Maintainer

mt5555
Jun 1, 2023
Maintainer

mt5555
Jun 1, 2023
Maintainer

PappuP Jun 1, 2023
Author

mt5555
Jun 1, 2023
Maintainer