Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for nvhpc on ncar machines #192

Merged
merged 20 commits into from
Oct 7, 2024

Conversation

mnlevy1981
Copy link
Collaborator

I also added a "--debug" command line argument to build_examples.sh and some module load statements to build_examples.sh for running on the NCAR machines, though that is skipped for casper nodes (including crhtc nodes).

I also added a "--debug" command line argument to build_examples.sh and some
module load statements to build_examples.sh for running on the NCAR machines,
though that is skipped for casper nodes (including crhtc nodes)
@mnlevy1981
Copy link
Collaborator Author

It would be great if the compiler flags set in standalone/templates/ncar-*.mk matched what is used in CESM. Eventually, I think we want to use the CIME build system (or at least cime/CIME/scripts/configure to create the Makefile based on configure_machines.xml), but as a short-term fix I'd like to get the various NCAR templates cleaned up -- they are all modifications of different files from mom-ocean/mkmf, so they look fairly different from one another. If they could all follow the same basic pattern and then also get FFLAGS and CFLAGS "right" (matching CESM for that compiler), that would be a useful temporary step.

To run MOM standalone use ./build_examples_cesm.sh just like
./build_examples.sh. To use the CESM compiler flags, (or at least the
rough conversion I've done), add a "--cesm" arg to the run. The
NVHPC with CESM is not working, and with intel debug it is clear there
is an issue. The GNU and Intel CESM version (Non Debug) builds and
runs. See templates for the changes between non-cesm and cesm flags.

If you would like to use the builds without building yourself, check out
this repo on derecho in
/glade/u/home/manishrv/documents/installs/mom_interface_pr_192/components/mom/standalone/build.
…,GNU, INTEL)

Made changes to the intel and nvhpc compiler flags to build properly and
run with double_gyre. Unfortunately that meant diverging slightly from
the CESM compiler flags, see comments in each template for more
information. So they all can be built and run, but each template needs
some workshopping to make sure they're good, for example I took out
-Ktrap=fp in NVHPC DEBUG which I think I need to confirm can be taken out.
@mnlevy1981 mnlevy1981 marked this pull request as ready for review September 24, 2024 17:12
Copy link
Member

@alperaltuntas alperaltuntas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mnlevy1981, @manishvenu, was there a specific reason for removing the REPRO flag? Could we consider adding it back, with the option to control it through a CLI argument like --repro, similar to how --debug works?

@manishvenu
Copy link
Collaborator

manishvenu commented Sep 24, 2024

@mnlevy1981, @manishvenu, was there a specific reason for removing the REPRO flag? Could we consider adding it back, with the option to control it through a CLI argument like --repro, similar to how --debug works?

Hey @alperaltuntas ,

I think @mnlevy1981 would be better able to answer on the general reasoning hah, but, loosely, I think we discussed that there not being a need for OPT, VERBOSE or OPENMP.

Which means, on the code side, after we talked, the REPRO is the default mode now, so if "--debug" isn't called, we default to REPRO. The flags that were removed were OPT, VERBOSE, and OPENMP.

Happy to re-add any of them and test them really quick if need be.

Thanks,
Manish

@mnlevy1981
Copy link
Collaborator Author

@mnlevy1981, @manishvenu, was there a specific reason for removing the REPRO flag? Could we consider adding it back, with the option to control it through a CLI argument like --repro, similar to how --debug works?

yeah, @manishvenu summed it up -- the previous templates set

DEBUG =
REPRO =
VERBOSE =
OPENMP =

but we set REPRO=1 and didn't have a mechanism for changing any of the other variables. We definitely want to be able to set DEBUG=1, but DEBUG and REPRO are mutually exclusive so the default is REPRO=1,DEBUG= and the --debug flag sets REPRO=,DEBUG=1 instead. I removed the OPENMP options because we'll let the GPU-ification program come up with useful flags, and we can add VERBOSE back in later if there is ever a need for it.

Copy link
Member

@alperaltuntas alperaltuntas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks!

@mnlevy1981
Copy link
Collaborator Author

An update -- the changes @manishvenu put in have let us do more extensive testing in the standalone driver, and has found issues in both MOM6 and MARBL. I'm going to address those with PRs (the MARBL PR is marbl-ecosys/MARBL#470 but the MOM6 hasn't been opened yet) and update .gitmodules, at which point this PR will be ready to merge. [I also have one more commit to make the different standalone/templates/ncar* files more consistent.]

General formatting is the same, the only differences are the actual flags and
whatnot used by the different compilers
@mnlevy1981
Copy link
Collaborator Author

MOM6 PR mentioned in #192 (comment) is NCAR/MOM6#305

@mnlevy1981
Copy link
Collaborator Author

I'll also add an SMS_Ld1_D test to aux_mom_MARBL and prebeta because it turns out we weren't running any debug tests and it would've caught an issue in using MARBL's chlorophyll field rather than reading it from a file.

This tag refactors a computation that included a divide-by-0 under certain
forcing conditions (u10_sqr = 0, I believe).
Was: SMS_Ld2_D.TL319_t232.C1850MARBL_JRA.derecho_intel
Now: SMS_Ld1_D.TL319_t232.C1850MARBL_JRA.derecho_intel

I also added this test to the prebeta test list.
Adds timestamp for rpointer file and fixes log10(0) bug in Ohlmann opacity
scheme
This uses the CESM 2/3 degree global grid and turns on the MARBL tracers as
well.
@mnlevy1981
Copy link
Collaborator Author

mnlevy1981 commented Oct 3, 2024

@alperaltuntas This is ready for re-review and to merge in. A lot ended up going into it:

  1. Update MOM6 and MARBL tags to fix bugs found by nvhpc / intel+debug tests
  2. Update the standalone build scripts to (a) more cleanly support available compilers on derecho [thanks @manishvenu!] and (b) match compiler flag options used in CESM (includes a --debug flag that is equivalent to ./xmlchange DEBUG=TRUE)
  3. Add a global ocean example that uses solo_driver [this is the 2/3 degree setup you put together to get preliminary performance numbers ahead of gpu-ification]
  4. Add a debug smoke test using MARBL to the prebeta test list

@@ -0,0 +1,25 @@
! Parameter changes to run CESM-MOM6 config in standalone mode

#override INPUTDIR = /glade/work/altuntas/mom6.standalone.runs/cesm/INPUT/t232
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of my personal work space, we should probably place all these input files under inputdata?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! This has been addressed in 7618bcd

1. No longer override INPUTDIR (MOM_input sets it to CESM input directory on
campaign, which is fine for now)
2. Added debug test with MARBL to aux_mom
@mnlevy1981 mnlevy1981 merged commit 47034fc into ESCOMP:main Oct 7, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants