-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression tests in CI #155
Comments
Hi @blimlim, Thank you for raising this issue. Yes what you talk about is definitely doable. Where to run the tests
I think the better option is to run the tests (and also store the data) on Gadi (either through a PBS job or on the login node, depending on the "workload" of the tests).
I think a good option for storing data is within the Which tests to run (and which fieldsfile to use)
I think the main objective of the regression tests is to make sure the code functionality is as expected. |
The um2nc repository contains a script for running manual regression tests. These tests run the um2nc conversion on a subset of an ESM1.5 output file, and compare the results with the output of the original
um2netcdf4.py
script. They require you to have a local copy of the ESM1.5 output fields file, and output netCDF from the originalum2netcdf4.py
script with different options applied.These files are:
For both netCDF files, the
--nohist
flag was also applied.I need to update the tests to allow for different versions of hdf5, however they seem like a good candidate for implementing as part of the CI - e.g. I think it would be really valuable if we could run them automatically for each PR into
main
.Speaking with @aidanheerdegen, this sounds doable! The main decisions to make are where the fields files + netCDF files should be stored, and where the regression tests should be run. Options mentioned by @aidanheerdegen include storing the data and running on gadi, as with the model config reproducibility tests, or running on github directly with the data stored elsewhere.
Another decision is around the fields file to use – the
aiihca.paa1jan.subset
file is small, but it only contains two variables (pr
andta_plev
), and so some important parts of the code aren't run. Theheaviside
variables are missing, and as a result the masking step skipped. This part of the code is not covered by the unit tests, as it involves hard to test iris methods.If it's viable, I'd be in favour of using a slightly bigger file which requires more of the code to run. The whole
aiihca.paa1jan
file thataiihca.paa1jan.subset
is derived from is 511M, perhaps too big. I'm wondering whether there'd be a possible middle ground, and whether the choice of file determines where the CI tests should run?For some extra information, the following is the
/usr/bin/time -l
output from running thebinary_diff.sh
script on my macbook:The following shows the resource usage when swapping the small
aiihca.paa1jan.subset
file for the biggeraiihca.paa1jan
file.@aidanheerdegen, @CodeGat, @atteggiani – your thoughts on the best way to set up the CI tests would be valuable!
The text was updated successfully, but these errors were encountered: