Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up test lists to reduce total cost #179

Open
3 tasks
mnlevy1981 opened this issue Aug 15, 2024 · 0 comments
Open
3 tasks

Clean up test lists to reduce total cost #179

mnlevy1981 opened this issue Aug 15, 2024 · 0 comments

Comments

@mnlevy1981
Copy link
Collaborator

mnlevy1981 commented Aug 15, 2024

I was just looking at the model cost of running some CESM tests with MARBL turned on, and it's not good: an SMS.TL319_t232.G1850MARBL_JRA.derecho_intel test costs 1000 core-hours (a comparable test without MARBL is in the neighborhood of 80 cpu-hours). I think the bulk of the additional cost comes from having every MARBL diagnostic in the diag_table -- by default MARBL asks MOM6 writes "minimal" output (49 fields) in fully coupled runs, "full" output (238 fields) in ocean-only and FOSI runs, and every diagnostic (353 fields) if the test suite is on.

I think we want to do the following:

  • the aux_mom_MARBL test list should continue to write the full output, but tests should be shortened (SMS_Ld2 instead of SMS should cost ~400 cpu-hours, which isn't quite as bad)
  • the prealpha, prebeta, and aux_mom test lists should only either (a) only write the default output based on the compset, or (b) write minimal output for both fully coupled and full output runs
  • @alperaltuntas if we turn on FMS's parallel I/O for the test suite, will cprnc still be able to compare new tests to a baseline? I don't want to lose bit-for-bit checks, but if the archiver and test system don't care that each time slice is broken across multiple files that would help reduce cost as well

edit: When I initially posted this, I was looking at cpu-hrs / year rather than total cpu-hrs. I've adjusted the numbers in the opening paragraph, but the point still stands -- we think of MARBL as increasing cost somewhere between 3x and 5x, but the tests are 12x more expensive and I think a lot of the gap between "3-5x" and "12x" is due to I/O

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant