Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to v2.0.0 release container for Gaea #167

Open
5 tasks
EdwardSnyder-NOAA opened this issue Oct 30, 2024 · 0 comments
Open
5 tasks

Updates to v2.0.0 release container for Gaea #167

EdwardSnyder-NOAA opened this issue Oct 30, 2024 · 0 comments

Comments

@EdwardSnyder-NOAA
Copy link
Collaborator

The Land DA release v2.0.0 container can run on Gaea after a number of modifications. The following are the steps that are needed to run on Gaea:

  1. module use /ncrc/proj/epic/rocoto/modulefiles/
  2. module load rocoto
  3. for setup_container.sh script use: -c=intel-classic/2023.2.0 -m=cray-mpich/8.1.28
  4. sed -i 's|which mpiexec| which srun|g' land-DA_workflow/scripts/exlandda_*
  5. sed -i 's|${RUN_CMD} -n ${NPROCS_FORECAST}|${RUN_CMD} -n ${NPROCS_FORECAST} --mpi=pmi2 |g' land-DA_workflow/scripts/exlandda_forecast.sh
  6. sed -i 's|${RUN_CMD} -n ${NPROCS_ANALYSIS}|${RUN_CMD} -n ${NPROCS_ANALYSIS} } --mpi=pmi2|g' land-DA_workflow/scripts/exlandda_analysis.sh
  7. sed -i '30 i module reset' land-DA_workflow/parm/task_load_modules_run_jjob.sh
  8. sed -i 's|which singularity|"/usr/bin/singularity"|g' land-DA_workflow/parm/run_container_executable.sh
  9. sed i 's|<queue>batch</queue>|<native> --clusters=c5 --partition=batch --export=NONE</native>|g' land_analysis.xml
  10. Update binding bins in run_container_executable.sh to: -B $BINDDIR:/contrib -B $CONTAINERBASE:/contrib

These steps are being added to the release notes. Action steps breakdown:

  • Steps 1-3 are fine being documented.
  • Steps 4-8 can be added to the setup_container.sh script.
  • Step 9 would require the uw tools being upgraded to 2.4.2 to address the native option bug. On Gaea, the slurm configuration needs the task to define which partition you want to run, and in order to do that, the native and core options needs to be able to be defined together in the workflow. The uw tools version that is in the develop and release branch is using 2.2.2, which contains the bug that doesn't allow these options to be defined together. Suggestion would be to upgrade to 2.4.2 whenever we can. In the meantime, users need to run that sed command before running the rocotorun command when running the experiment.
  • Step 10 would require the container to be rebuilt with the /gpfs directory in the container.
  • After all these steps are completed, then the container needs to be rebuilt, tested, and placed on all T1 platforms.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant