-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[develop]: User's Guide Updates #880
[develop]: User's Guide Updates #880
Conversation
I think only item #2 needed a response. All the others were totally fine as is. I was just curious.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great @gspetro-NOAA. I appreciate you taking the time to respond so thoroughly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delayed review, I didn't have time to go through all of the changes, and most of these are very minor suggestions that are optional.
* Bug fixes since the v2.1.0 release | ||
* Pre-implementation Rapid Refresh Forecast System (RRFS) forecast configurations | ||
* Air Quality Modeling (AQM) capabilities | ||
* Updates to :term:`CCPP` that target the top of the ``main`` branch (which is ahead of CCPP v6.0.0). See :ref:`this page <CCPPUpdates>` for a detailed summary of updates that came in ahead of the v2.1.0 release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need a similar summary of CCPP updates for this release? If so I can work on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mkavulich Yes! That would be very helpful! :) If you can provide a Google Doc, I can edit/format it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'll plan to have that on my plate. If it's getting close to release time and I still don't have it to you feel free to bug me.
docs/UsersGuide/source/BuildingRunningTesting/ContainerQuickstart.rst
Outdated
Show resolved
Hide resolved
docs/UsersGuide/source/BuildingRunningTesting/ContainerQuickstart.rst
Outdated
Show resolved
Hide resolved
.. note:: | ||
|
||
On ``JET``, users should add ``PARTITION_DEFAULT: xjet`` and ``PARTITION_FCST: xjet`` to the ``platform:`` section of the ``config.yaml`` file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The defaults from the jet.yaml machine file lists all partitions, not just xjet. Unless there's something I'm missing this note can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point, Jet was running into errors when "xjet" wasn't specified. I am trying to figure out if this was only when running containers or in all experiments. Unfortunately, we have a few teammates out on vacation, so I think we might have to make the change (if necessary) in a separate PR when I'm able to get a clearer answer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I am like 95% sure it's not necessary outside of containers, but it won't cause any serious problems to have this included, it just limits the number of nodes available so will increase queue wait times. Hopefully you can get an answer and just remove this in a future PR once you do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mkavulich I checked with @natalie-perlin who is updating the SRW App modulefiles, and even outside of containers, experiments often fail on Jet unless xjet is specified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I've never run into that myself (or maybe I have and didn't realize what was causing the failure!). @natalie-perlin can you open an issue for this? This seems like something we should get to the root cause of since only using xjet would significantly reduce the number of nodes available for testing, and currently the workflow does not contain this partition restriction.
Since xjet does not have the most memory or the most cores per node (https://jetdocs.rdhpcs.noaa.gov/wiki/index.php/Jet_System_Overview), and the fact that it only occasionally fails makes me think that maybe only one or two older partitions is the problem, and maybe it's a memory issue (that could be solved by specifying the necessary memory rather than restricting by partition) or something similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mkavulich - please note this is not a bug, but the differences in architecture. xjet is the newer system, and as far as I remember, the only one that has AVX2 vector instructions that are needed for the UFS. The software stack had to be built with no special instructions and AVX2, so some of the SRW tasks could run on other Jet partitions, too (sjet, vjet, kjet). The weather model needs to go to xjet only.
Co-authored-by: Michael Kavulich <[email protected]>
Co-authored-by: Michael Kavulich <[email protected]>
Co-authored-by: Michael Kavulich <[email protected]>
Co-authored-by: Michael Kavulich <[email protected]>
Co-authored-by: Michael Kavulich <[email protected]>
…er-app into text/ug-updates
Co-authored-by: Michael Kavulich <[email protected]>
…er-app into text/ug-updates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing all my comments 👍
DESCRIPTION OF CHANGES:
This PR updates the SRW App documentation to align better with the head of develop. It also reorganizes the documentation to make it more user-friendly. The documentation has been split into four sections:
This PR also switches many tables to list-tables, which are easier to maintain because they do not rely on exact spacing of
|
symbols and other symbols used to construct the table. When workflows or other major details change, reformatting the tables is much easier. The same is true when adjusting table contents for a release; less time can be spent reformatting the tables.EPIC has separate tickets to edit the ConfigWorkflow/Workflow Parameters chapter and the LAM Grid chapter, so these are not yet up to date (but coming soon!). The Container Quickstart chapter requires an update to the develop branch container, so the instructions do not work as written because the container is currently broken. This chapter will be reexamined ahead of the next release. All other chapters have been updated.
All PNG files were removed from the docs in the reorganization. They are now hosted on the SRW App wiki and linked from there instead.
Type of change
TESTS CONDUCTED:
None required (text-only changes). The documentation builds on my fork's RTD account and can be viewed here: https://srw-ug.readthedocs.io/en/text-ug-updates/.
In general, I have run through instructions for various chapters on Hera or Jet.
DEPENDENCIES:
None.
DOCUMENTATION:
N/A. All docs!
ISSUE:
Will partially resolve Issue #879 . Further PRs to update ConfigWorkflow and LAM grid chapters will also be required.
CHECKLIST
CONTRIBUTORS (optional):
@mkavulich contributed updates for VX tasks.