Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCHP simulation stopped after 7 month simulation (total set time for 1 yr) #405

Closed
Hemrajbhattarai opened this issue Apr 15, 2024 · 9 comments
Assignees
Labels
category: Bug Something isn't working topic: Stretched Grid Specific to stretched grid simulation

Comments

@Hemrajbhattarai
Copy link

Name and Institution (Required)

Name: Hemraj Bhattarai
Institution: The Chinese University of Hong Kong

Description of your issue or question

I am running GCHP version 14.3.0
cap_restart: 20170101 000000
Run_Duration="00010000 000000"

CS_RES=48
STRETCH_GRID=ON
STRETCH_FACTOR=4.0
TARGET_LAT=28.0
TARGET_LON=80.0

The model simulation perfectly runs and gives output from 20170101 to 20170731, however, it stopped after that.
I didn't find useful information in the log file:
1514335_print_out.log.txt
1514335_error.log.txt

I also tried resubmitting the case with a new restart file that the model generates (I set Checkpoint_Freq=monthly). But the model still fails running with new error type (
1526524_error.log.txt
)
I am not sure where this problem is coming from, but I tested with a new restart file that the model generated itself, but failed to run. When I tested it, I changed cap_restart to 20170801 (since I already for 7 months), and also changed run duration to 00000500 000000.

I am looking for appropriate suggestions for debugging the issue.
Thank you.

@Hemrajbhattarai
Copy link
Author

Let me provide additional informations:

I wonder can I directly use the internal_checkpoint files that are stored in Restart directory (e.g., gcchem_internal_checkpoint.20170801_0000z.nc4), or do these files need some processing before making use of them. I presume they can be directly used, but I am failing in making direct use of them.

I tested with other restart file (not the one that model generates during simulation), the model runs well, but with gcchem_internal_checkpoint.xxxx.nc4 the model is not running. I think this means the problem is on restart file.

again, let me clarify:
I am running with stretch grid, and setting in my setCommonsetting.sh looks like:
CS_RES=48
STRETCH_GRID=ON
STRETCH_FACTOR=4.0
TARGET_LAT=28.0
TARGET_LON=80.0

And beginning and end lines of gcchem_internal_checkpoint.xxxx.nc4 looks like:
dimensions:
lat = 288 ;
lev = 72 ;
lon = 48 ;
time = 1 ;
variables:

// global attributes:
:STRETCH_FACTOR = 4.f ;
:TARGET_LAT = 28.f ;
:TARGET_LON = 80.f ;

I hope these additional informations help more in understanding the problem.

@yantosca
Copy link
Contributor

Thanks for writing @Hemrajbhattarai. There was a similar issue geoschem/geos-chem#318 not long ago. In that case the problem was solved by using target longitude in 0..360 coordinates when regridding a regular GCHP grid to a GCHP stretched grid. I don't know if that will solve your issue but that would be the first thing to check.

@yantosca yantosca added the category: Bug Something isn't working label Apr 16, 2024
@yantosca yantosca transferred this issue from geoschem/geos-chem Apr 16, 2024
@yantosca yantosca added the topic: Stretched Grid Specific to stretched grid simulation label Apr 16, 2024
@Hemrajbhattarai
Copy link
Author

Thanks Bob for prompt reply. My target point is around Delhi, India, so I think the lat and long on both 0 to 360 and -180 to 180 would remain the same (target lat = 28N, lon = 80E).
More importantly, I am using the restart file obtained from the internal_checkpoint of my simulation that interrupted after running for 7 months. In other word, it is like continue run for 8th month, so I think the setting of target lat and long should have no issue.

@lizziel
Copy link
Contributor

lizziel commented Apr 16, 2024

Hi @Hemrajbhattarai, I think there are two separate issues. Your original post that the model ran for one month and then stopped has this error in the log file:
pe=00070 FAIL at line=02608 ExtDataGridCompMod.F90 <unknown error>
This indicates a problem with an input file. The regular log shows that it stopped running at hour 23 of Aug 1 2017. This makes me think there is an issue finding an hourly file for Aug 2 2017, possible a meteorology file. Do you have the output log file allPEs.log for that run? It might have an error message from the ExtData component of MAPL which handles input files.

The second issue is starting up a stretched grid run using an output restart file from the model. We have a separate report of this in issue #402 and I am looking into it. You should be able to start up a stretched grid simulation with any of the checkpoint files. There may be a bug preventing that that somehow was not reported until now. I will look into it.

@lizziel lizziel self-assigned this Apr 17, 2024
@Hemrajbhattarai
Copy link
Author

Dear @lizziel
Thank you for pointing this out, the OFFLINE_DUST files for Aug 2 2017 and onwards were missing. I downloaded and added all those OFFLINE_DUST files and stored them in their respective directory.

I rerun the simulation and the problem is still the same. I double-checked if I stored them in the right directory, and they seem to be ok, but the problem does exist.

attached are the log files and other pieces of information.

allPEs.log
1529154_print_out.log
1529154_error.log

@Hemrajbhattarai
Copy link
Author

I did two test runs, difference is only the restart file.

  1. The log files I have attached above few hours before uses the restart file obtained by Checkpoint (gcchem_internal_checkpoint.xxxxz.nc4) during model simulation. Making use of this restart file mails to run the model (error log files in above comment).

  2. This simulation is exactly same as 1) but only change restart file. Now the restart file is the one generated by GCPy from the default nc file. The model runs fine.

I suspect some problem with the restart file ontained by Checkpoint.

I make some quick check among the variables, and seems some variables are not in checkpoint restart file. Not sure if this is the reason, you can check!
check_original_checkpoint_test

@lizziel
Copy link
Contributor

lizziel commented Apr 18, 2024

Thanks @Hemrajbhattarai for the update. I am going to try to reproduce the restart file issue for stretched grid. It does appear to be a bug.

@lizziel
Copy link
Contributor

lizziel commented May 14, 2024

Hi @Hemrajbhattarai, this bug will be fixed in version 14.4.0. You can apply the fix manually using the update here: geoschem/MAPL#34. This does not address missing variables in the restart. Try the fix and see if it solves your problem.

@lizziel
Copy link
Contributor

lizziel commented Jun 10, 2024

I believe any remaining issues with stretched grid are summarized in #404. If there are additional problems please create a new github issue.

@lizziel lizziel closed this as completed Jun 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: Bug Something isn't working topic: Stretched Grid Specific to stretched grid simulation
Projects
None yet
Development

No branches or pull requests

3 participants