-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
poor performance on slingshot11 #72
Comments
Noting that a similar performance difference is seen between perlmutter and chrysalis (an AMD machine with infiniband) for E3SM cases. (Haven't tried the exact case above yet). |
I just tried the X case on derecho with the cray compiler and I am not seeing the poor performance - |
Is the mpi library different? |
It's the same mpi library, cray-mpich/8.1.25, however I note that there is a different build of this library for each compiler flavor. |
Updating this issue: some hardware updates on NERSC made a lot of the observed behavior go away. @ndkeen can say more. |
Using the cesm model in a coupler test configuration
PFS.ne120_t12.2000_XATM_XLND_XICE_XOCN_XROF_SGLC_SWAV.derecho_intel
We are observing very poor performance of mct_rearrange_rearr on machines perlmutter (NERSC) and derecho (NCAR) - both machines use slingshot11 network and AMD processor.
Using 512 tasks on derecho with gptl timing we see
"mct_rearrange_rearr" - 512 512 4.426752e+06 1.391128e+05 277.198 ( 268 0) 263.345 ( 505 0)
Comparing to the ncar cheyenne system:
"mct_rearrange_rearr" - 512 512 4.426752e+06 3.399975e+04 73.911 ( 414 0) 60.767 ( 384 0)
The text was updated successfully, but these errors were encountered: