Skip to content

Commit

Permalink
Merge pull request #1186 from sjsprecious/update_gpu_tests
Browse files Browse the repository at this point in the history
cam6_4_046: Update GPU tests with new XML options
  • Loading branch information
nusbaume authored Nov 8, 2024
2 parents 7502085 + 9a14ae3 commit 3511875
Show file tree
Hide file tree
Showing 12 changed files with 192 additions and 12 deletions.
6 changes: 3 additions & 3 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -144,21 +144,21 @@ fxDONOTUSEurl = https://github.com/ESCOMP/mizuRoute
[submodule "ccs_config"]
path = ccs_config
url = https://github.com/ESMCI/ccs_config_cesm.git
fxtag = ccs_config_cesm1.0.7
fxtag = ccs_config_cesm1.0.8
fxrequired = ToplevelRequired
fxDONOTUSEurl = https://github.com/ESMCI/ccs_config_cesm.git

[submodule "cime"]
path = cime
url = https://github.com/ESMCI/cime
fxtag = cime6.1.29
fxtag = cime6.1.41
fxrequired = ToplevelRequired
fxDONOTUSEurl = https://github.com/ESMCI/cime

[submodule "cmeps"]
path = components/cmeps
url = https://github.com/ESCOMP/CMEPS.git
fxtag = cmeps1.0.16
fxtag = cmeps1.0.22
fxrequired = ToplevelRequired
fxDONOTUSEurl = https://github.com/ESCOMP/CMEPS.git

Expand Down
4 changes: 2 additions & 2 deletions cime_config/testdefs/testlist_cam.xml
Original file line number Diff line number Diff line change
Expand Up @@ -1489,7 +1489,7 @@
<!-- (unsupported) -->
<!-- @@@@@@@@@@@@@@@@@@@@@@@@@@@ -->

<test compset="F2000dev" grid="ne30pg3_ne30pg3_mg17" name="ERS_Ln9_G4-a100-openacc" testmods="cam/outfrq9s_mg3_default">
<test compset="F2000dev" grid="ne30pg3_ne30pg3_mg17" name="ERS_Ln9" testmods="cam/outfrq9s_gpu_default">
<machines>
<machine name="derecho" compiler="nvhpc" category="derecho_gpu"/>
<machine name="derecho" compiler="nvhpc" category="aux_cam"/>
Expand All @@ -1498,7 +1498,7 @@
<option name="wallclock">00:30:00</option>
</options>
</test>
<test compset="F2000dev" grid="ne30pg3_ne30pg3_mg17" name="ERS_Ln9_G4-a100-openacc" testmods="cam/outfrq9s_mg3_pcols760">
<test compset="F2000dev" grid="ne30pg3_ne30pg3_mg17" name="ERS_Ln9" testmods="cam/outfrq9s_gpu_pcols760">
<machines>
<machine name="derecho" compiler="nvhpc" category="derecho_gpu"/>
<machine name="derecho" compiler="nvhpc" category="prealpha"/>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
./xmlchange NTASKS=128
./xmlchange NTHRDS=1
./xmlchange ROOTPE='0'
./xmlchange ROF_NCPL=`./xmlquery --value ATM_NCPL`
./xmlchange GLC_NCPL=`./xmlquery --value ATM_NCPL`
./xmlchange CAM_CONFIG_OPTS=' -microphys mg3 -rad rrtmg' --append
./xmlchange TIMER_DETAIL='6'
./xmlchange TIMER_LEVEL='999'
./xmlchange GPU_TYPE=a100
./xmlchange OPENACC_GPU_OFFLOAD=TRUE
./xmlchange OVERSUBSCRIBE_GPU=TRUE
./xmlchange NGPUS_PER_NODE=4
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
mfilt=1,1,1,1,1,1
ndens=1,1,1,1,1,1
nhtfrq=9,9,9,9,9,9
inithist='ENDOFRUN'
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
!----------------------------------------------------------------------------------
! Users should add all user specific namelist changes below in the form of
! namelist_var = new_namelist_value
!
! Include namelist variables for drv_flds_in ONLY if -megan and/or -drydep options
! are set in the CLM_NAMELIST_OPTS env variable.
!
! EXCEPTIONS:
! Set use_cndv by the compset you use and the CLM_BLDNML_OPTS -dynamic_vegetation setting
! Set use_vichydro by the compset you use and the CLM_BLDNML_OPTS -vichydro setting
! Set use_cn by the compset you use and CLM_BLDNML_OPTS -bgc setting
! Set use_crop by the compset you use and CLM_BLDNML_OPTS -crop setting
! Set spinup_state by the CLM_BLDNML_OPTS -bgc_spinup setting
! Set irrigate by the CLM_BLDNML_OPTS -irrig setting
! Set dtime with L_NCPL option
! Set fatmlndfrc with LND_DOMAIN_PATH/LND_DOMAIN_FILE options
! Set finidat with RUN_REFCASE/RUN_REFDATE/RUN_REFTOD options for hybrid or branch cases
! (includes $inst_string for multi-ensemble cases)
! Set glc_grid with CISM_GRID option
! Set glc_smb with GLC_SMB option
! Set maxpatch_glcmec with GLC_NEC option
! Set glc_do_dynglacier with GLC_TWO_WAY_COUPLING env variable
!----------------------------------------------------------------------------------
hist_nhtfrq = 9
hist_mfilt = 1
hist_ndens = 1
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
./xmlchange NTASKS=64
./xmlchange NTHRDS=1
./xmlchange ROOTPE='0'
./xmlchange ROF_NCPL=`./xmlquery --value ATM_NCPL`
./xmlchange GLC_NCPL=`./xmlquery --value ATM_NCPL`
./xmlchange CAM_CONFIG_OPTS=' -microphys mg3 -rad rrtmg -pcols 760 ' --append
./xmlchange TIMER_DETAIL='6'
./xmlchange TIMER_LEVEL='999'
./xmlchange GPU_TYPE=a100
./xmlchange OPENACC_GPU_OFFLOAD=TRUE
./xmlchange OVERSUBSCRIBE_GPU=TRUE
./xmlchange NGPUS_PER_NODE=4
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
mfilt=1,1,1,1,1,1
ndens=1,1,1,1,1,1
nhtfrq=9,9,9,9,9,9
inithist='ENDOFRUN'
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
!----------------------------------------------------------------------------------
! Users should add all user specific namelist changes below in the form of
! namelist_var = new_namelist_value
!
! Include namelist variables for drv_flds_in ONLY if -megan and/or -drydep options
! are set in the CLM_NAMELIST_OPTS env variable.
!
! EXCEPTIONS:
! Set use_cndv by the compset you use and the CLM_BLDNML_OPTS -dynamic_vegetation setting
! Set use_vichydro by the compset you use and the CLM_BLDNML_OPTS -vichydro setting
! Set use_cn by the compset you use and CLM_BLDNML_OPTS -bgc setting
! Set use_crop by the compset you use and CLM_BLDNML_OPTS -crop setting
! Set spinup_state by the CLM_BLDNML_OPTS -bgc_spinup setting
! Set irrigate by the CLM_BLDNML_OPTS -irrig setting
! Set dtime with L_NCPL option
! Set fatmlndfrc with LND_DOMAIN_PATH/LND_DOMAIN_FILE options
! Set finidat with RUN_REFCASE/RUN_REFDATE/RUN_REFTOD options for hybrid or branch cases
! (includes $inst_string for multi-ensemble cases)
! Set glc_grid with CISM_GRID option
! Set glc_smb with GLC_SMB option
! Set maxpatch_glcmec with GLC_NEC option
! Set glc_do_dynglacier with GLC_TWO_WAY_COUPLING env variable
!----------------------------------------------------------------------------------
hist_nhtfrq = 9
hist_mfilt = 1
hist_ndens = 1
104 changes: 100 additions & 4 deletions doc/ChangeLog
Original file line number Diff line number Diff line change
@@ -1,7 +1,103 @@
===============================================================

Tag name: cam6_4_046
Originator(s): sjsprecious
Date: 06 November 2024
One-line Summary: update GPU regression tests with new XML options
Github PR URL: https://github.com/ESCOMP/CAM/pull/1186

Purpose of changes (include the issue number and title text for each relevant GitHub issue):

. GitHub issue: https://github.com/ESCOMP/CAM/issues/1165

. As discussed in https://github.com/ESMCI/cime/pull/4687, it is better remove the GPU options from the Python workflow in CIME and use XML files instead to configure a GPU test for CESM.

The following tags should be brought into CAM together to make the new GPU workflow function properly:
- cmeps1.0.22 or newer
- ccs_config_cesm1.0.8 or newer
- cime6.1.33 or newer

Once those new tags are merged in, the GPU test definition here (https://github.com/ESCOMP/CAM/blob/cam_development/cime_config/testdefs/testlist_cam.xml#L1493-L1510) needs to be updated accordingly.

Describe any changes made to build system: none

Describe any changes made to the namelist: none

List any changes to the defaults for the boundary datasets: none

Describe any substantial timing or memory changes: none

Code reviewed by: peverwhee

List all files eliminated: none

List all files added and what they do:

. The following files are added to perform a GPU regresesion test with PCOLS=16 and two GPU nodes on Derecho
- cime_config/testdefs/testmods_dirs/cam/outfrq9s_gpu_default
- shell_commands
- user_nl_cam
- user_nl_clm

. The following files are added to perform a GPU regresesion test with PCOLS=760 and one GPU node on Derecho
- cime_config/testdefs/testmods_dirs/cam/outfrq9s_gpu_pcols760
- shell_commands
- user_nl_cam
- user_nl_clm

List all existing files that have been modified, and describe the changes:

.gitmodules
. update the tags for the external components

cime_config/testdefs/testlist_cam.xml
. update the GPU regression tests to use the right setups

If there were any failures reported from running test_driver.sh on any test
platform, and checkin with these failures has been OK'd by the gatekeeper,
then copy the lines from the td.*.status files for the failed tests to the
appropriate machine below. All failed tests must be justified.

derecho/intel/aux_cam:

All tests had differences in the namelist comparision and FIELDLIST field,
otherwise, unless listed below, they were bit-for-bit.

SMS_D_Ln9.f19_f19_mg17.FXHIST.derecho_intel.cam-outfrq9s_amie (Overall: FAIL)
SMS_D_Ln9_P1280x1.ne0CONUSne30x8_ne0CONUSne30x8_mt12.FCHIST.derecho_intel.cam-outfrq9s (Overall: FAIL)
- pre-existing failures due to build-namelist error requiring CLM/CTSM external update.

ERP_Ln9.f09_f09_mg17.FCSD_HCO.derecho_intel.cam-outfrq9s (Overall: FAIL)
SMS_Ld1.f09_f09_mg17.FCHIST_GC.derecho_intel.cam-outfrq1d (Overall: DIFF)
- pre-existing failure due to HEMCO not having reproducible results issues #1018 and #856

derecho/nvhpc/aux_cam:

A new baseline was generated successfully, and a second test
with that new baseline passed as expected.

izumi/nag/aux_cam:

All tests had differences in the namelist comparision and FIELDLIST field,
otherwise, unless listed below, they were bit-for-bit.

DAE.f45_f45_mg37.FHS94.izumi_nag.cam-dae (Overall: FAIL)
- pre-existing failure -- issue #670

izumi/gnu/aux_cam:

All tests had differences in the namelist comparision and FIELDLIST field,
otherwise they were all bit-for-bit.

CAM tag used for the baseline comparison tests if different than previous
tag: cam6_4_045

Summarize any changes to answers: BFB except name anf field list changes.

===============================================================

Tag name: cam6_4_045
Originator(s): mwaxmonsky
Originator(s): mwaxmonsky
Date: 10/25/2024
One-line Summary: Start refactoring of vertical diffusion to be CCPPized
Github PR URL: https://github.com/ESCOMP/CAM/pull/1176
Expand Down Expand Up @@ -197,12 +293,12 @@ Describe any changes made to the namelist:
. change default value of seasalt_emis_scale to 0.75 for cam7 (both lt and mt)
This is a cam7 tuning mod from issue #1143

. update ubc_file_path for cam7 (lt only) to
. update ubc_file_path for cam7 (lt only) to
atm/cam/chem/ubc/b.e21.BWHIST.f09_g17.CMIP6-historical-WACCM.ensAvg123.cam.h0zm.H2O.1849-2014_c240604.nc

List any changes to the defaults for the boundary datasets:

. update ubc_file_path for cam7 (lt only) to
. update ubc_file_path for cam7 (lt only) to
atm/cam/chem/ubc/b.e21.BWHIST.f09_g17.CMIP6-historical-WACCM.ensAvg123.cam.h0zm.H2O.1849-2014_c240604.nc

Describe any substantial timing or memory changes: none
Expand Down Expand Up @@ -230,7 +326,7 @@ bld/namelist_files/namelist_defaults_cam.xml
This is a cam7 tuning mod from issue #1143

bld/namelist_files/use_cases/1850_cam_lt.xml
. update ubc_file_path to
. update ubc_file_path to
atm/cam/chem/ubc/b.e21.BWHIST.f09_g17.CMIP6-historical-WACCM.ensAvg123.cam.h0zm.H2O.1849-2014_c240604.nc

cime_config/testdefs/testlist_cam.xml
Expand Down

0 comments on commit 3511875

Please sign in to comment.