You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was just looking at the model cost of running some CESM tests with MARBL turned on, and it's not good: an SMS.TL319_t232.G1850MARBL_JRA.derecho_intel test costs 1000 core-hours (a comparable test without MARBL is in the neighborhood of 80 cpu-hours). I think the bulk of the additional cost comes from having every MARBL diagnostic in the diag_table -- by default MARBL asks MOM6 writes "minimal" output (49 fields) in fully coupled runs, "full" output (238 fields) in ocean-only and FOSI runs, and every diagnostic (353 fields) if the test suite is on.
I think we want to do the following:
the aux_mom_MARBL test list should continue to write the full output, but tests should be shortened (SMS_Ld2 instead of SMS should cost ~400 cpu-hours, which isn't quite as bad)
the prealpha, prebeta, and aux_mom test lists should only either (a) only write the default output based on the compset, or (b) write minimal output for both fully coupled and full output runs
@alperaltuntas if we turn on FMS's parallel I/O for the test suite, will cprnc still be able to compare new tests to a baseline? I don't want to lose bit-for-bit checks, but if the archiver and test system don't care that each time slice is broken across multiple files that would help reduce cost as well
edit: When I initially posted this, I was looking at cpu-hrs / year rather than total cpu-hrs. I've adjusted the numbers in the opening paragraph, but the point still stands -- we think of MARBL as increasing cost somewhere between 3x and 5x, but the tests are 12x more expensive and I think a lot of the gap between "3-5x" and "12x" is due to I/O
The text was updated successfully, but these errors were encountered:
I was just looking at the model cost of running some CESM tests with MARBL turned on, and it's not good: an
SMS.TL319_t232.G1850MARBL_JRA.derecho_intel
test costs 1000 core-hours (a comparable test without MARBL is in the neighborhood of 80 cpu-hours). I think the bulk of the additional cost comes from having every MARBL diagnostic in the diag_table -- by default MARBL asks MOM6 writes "minimal" output (49 fields) in fully coupled runs, "full" output (238 fields) in ocean-only and FOSI runs, and every diagnostic (353 fields) if the test suite is on.I think we want to do the following:
aux_mom_MARBL
test list should continue to write the full output, but tests should be shortened (SMS_Ld2
instead ofSMS
should cost ~400 cpu-hours, which isn't quite as bad)prealpha
,prebeta
, andaux_mom
test lists should only either (a) only write the default output based on the compset, or (b) write minimal output for both fully coupled and full output runscprnc
still be able to compare new tests to a baseline? I don't want to lose bit-for-bit checks, but if the archiver and test system don't care that each time slice is broken across multiple files that would help reduce cost as welledit: When I initially posted this, I was looking at cpu-hrs / year rather than total cpu-hrs. I've adjusted the numbers in the opening paragraph, but the point still stands -- we think of MARBL as increasing cost somewhere between 3x and 5x, but the tests are 12x more expensive and I think a lot of the gap between "3-5x" and "12x" is due to I/O
The text was updated successfully, but these errors were encountered: