-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using gcchem_internal_checkpoint for grid-stretching simulation but fails with 'Factories not equal' error. #404
Comments
Thanks for writing @yuz-wx. There was a similar issue #318 that was solved by using longitude in 0-360 range when specifying the target longitude. But I'm not sure that applies to your issue. However, we just received a fix in GCPy (see geoschem/gcpy#311) for cubed sphere regridding. We will bring this into the dev branch soon. In the meantime you could pull the fix into your GCPy clone and then see if that results in a better-regridded restart file. |
Dear yantosca, the restart file made by GCPY can be used successfully, but the GCHP generated restart file has such an issue. I think the cause of the issue is the inversion from double64 to real32. A double64 number is directly given to a real32 variable, which carries a truncation error because it is not the rounding number. I revised related lines in 'base/MAPL_CubedSphereGridFactory.F90' by an inelegant way to solve this problem temporarily and let the factories equal (it turns 0.645771756658875 to 0.6457718, instead of 0.6457717). But I think it might be better to compare the target_lat written in restart file and GCHP.rc directly (compare 37.0 with 37.0), instead of comparing the calculated one (compare 0.6457718 with 0.6457718). Hope it can be updated in the future. Thank you very much! |
Hi @yuz-wx, thanks for raising this issue! We have not had a report before of this problem but I can see how it might show up. Doing a check that the stretched grid attributes match up between restart and config is a relatively recent addition to MAPL. I will create an issue with NASA GMAO and also try to put a work-around into our MAPL fork in time for 14.4.0. I will link to the PR here once I have one. |
Hi @lizziel, thank you for your comment! Look forward to the following update! Here's my conclusion for advice:
Here's the related lines in code:
|
HI @yuz-wx, here is a quick fix for the stretched grid restart file issue: geoschem/MAPL#34 |
Hi @yantosca , @lizziel - I think I've found another way that this problem can sneak in! @amy1916 was seeing the same issue as described by @yuz-wx so we read through this thread and applied the fix that Lizzie made to the MAPL submodule. We were still seeing the same To demo this, let's take a stretched grid restart file that was created using GCPy regridding tools (FILE_1) and a checkpoint file produced by GCHP (FILE_2). If we FILE_1:
FILE_2:
All looks good, right? Except, if we open both using
Just to sanity check against some kind of printing problem I modified I haven't had a chance to dig into the checkpoint writing code to figure out why this number was getting into the checkpoint with an error, but something I did notice is that there are type differences between variables in FILE_1 and FILE_2, for example:
which hints to me that somewhere the type of variables going into the checkpoint files is not being made explicit. If I can find some time I'll get into the checkpoint writing code, but my GEOS-Chem time is quite thin at the moment! Hope that's helpful, and if you think this would be more appropriate as a separate bug report I'll happily move it. |
Thanks for reporting this still isn't working. I'll take a look at a more comprehensive fix. Indeed looks like we need to better track/preserve the type. This may already be fixed in a newer version of MAPL. I'll see what I can come up with. |
I still have not been able to reproduce the issue. So far I have been able to submit sequential stretched grid runs without hitting the error. I'll try your exact stretch params and see if that makes a difference. |
I created a stretched grid restart file using your parameters but still cannot reproduce the issue. Besides running for three consecutive simulations I opened each checkpoint and printed the stretch attributes.
Have you tried using the latest MAPL version used in 14.4? I wonder if there was a mistake if you manually applied the fix. |
@kilicomu, I just remembered that when this issue came up a long time ago I was not able to reproduce because I was using transport tracer simulation which has no issues. I will try again with fullchem. |
Following up on this, I'm going to implement a fix from GMAO: GEOS-ESM/MAPL#1979. |
@lizziel Cool - let me know if / when you want me to test it my end! |
@kilicomu, I expected to reproduce the problem with a fullchem run but actually I don't see it. Could you try an out-of-the box run with 14.4.1? You can try your own restart as well as the one at http://geoschemdata.wustl.edu/ExtData/GEOSCHEM_RESTARTS/GC_14.3.0/. |
Hi @lizziel - sorry it's taking a while to get back to this, pushed for time at the moment. I'll let you know when I can get a test going. |
@kilicomu, no worries! I am putting this issue on the back-burner until we get confirmation it is still a problem in the latest version. If you still get the error then I will try testing it on other systems since it does not seem to be a problem on the Harvard cluster. |
This issue has been automatically marked as stale because it has not had recent activity. If there are no updates within 7 days it will be closed. You can add the "never stale" tag to prevent the issue from closing this issue. |
Name and Institution (Required)
Name: YU, Zheng
Institution: The Chinese University of Hong Kong, EASC
Confirm you have reviewed the following documentation
Description of your issue or question
Please provide as much detail as possible. Always include the GCHP version number and any relevant configuration and log files.
I use GCPY to create a restart file called new_restart_file.nc to run a grid-stretching case by GCHP14.2.2, it can run successfully. After that, I link the Restarts/gcchem_internal_checkpoint as gchp_restart.nc4 to run a new case from a new start time. Then it has such an error (Factories not equal) when running a new case. I've checked the global attributes of gcchem_internal_checkpoint and new_restart_file, they are the same.
settings in GCHP.rc:
new_restart_file:
gcchem_internal_checkpoint:
So I print out some lines in base/MAPL_CubedSphereGridFactory.F90, and I find the a%target_lat (from the case settings) and this%target_lat (from the gcchem_internal_checkpoint) are not equal, even if they are both calculated by 37.0 x pi / 180.d0. Since the values of a%target_lat and this%target_lat are 0.6457718 and 0.6457717. It's like a casual float trap caused this problem. So I'm not sure if GCHP has some compiling options to void such an issue? Or can such a comparation be more simple in the future, like just compare the '37.0'.
The cause of 'Factories not equal'
The calculation of target_lat:
The calculation for a%target_lat:
The calculation for this%target_lat:
The text was updated successfully, but these errors were encountered: