Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression tests in CI #155

Open
blimlim opened this issue Nov 12, 2024 · 1 comment
Open

Regression tests in CI #155

blimlim opened this issue Nov 12, 2024 · 1 comment

Comments

@blimlim
Copy link
Collaborator

blimlim commented Nov 12, 2024

The um2nc repository contains a script for running manual regression tests. These tests run the um2nc conversion on a subset of an ESM1.5 output file, and compare the results with the output of the original um2netcdf4.py script. They require you to have a local copy of the ESM1.5 output fields file, and output netCDF from the original um2netcdf4.py script with different options applied.

These files are:

8.2M	aiihca.paa1jan.subset (the fields file)
1.2M	aiihca.paa1jan.subset.nomask.orig.nc (um2netcdf4.py output with --nomask option)
128K	aiihca.paa1jan.subset.orig.nc (um2netcdf4.py output without nomask flag)

For both netCDF files, the --nohist flag was also applied.

I need to update the tests to allow for different versions of hdf5, however they seem like a good candidate for implementing as part of the CI - e.g. I think it would be really valuable if we could run them automatically for each PR into main.

Speaking with @aidanheerdegen, this sounds doable! The main decisions to make are where the fields files + netCDF files should be stored, and where the regression tests should be run. Options mentioned by @aidanheerdegen include storing the data and running on gadi, as with the model config reproducibility tests, or running on github directly with the data stored elsewhere.

Another decision is around the fields file to use – the aiihca.paa1jan.subset file is small, but it only contains two variables (pr and ta_plev), and so some important parts of the code aren't run. The heaviside variables are missing, and as a result the masking step skipped. This part of the code is not covered by the unit tests, as it involves hard to test iris methods.

If it's viable, I'd be in favour of using a slightly bigger file which requires more of the code to run. The whole aiihca.paa1jan file that aiihca.paa1jan.subset is derived from is 511M, perhaps too big. I'm wondering whether there'd be a possible middle ground, and whether the choice of file determines where the CI tests should run?

For some extra information, the following is the /usr/bin/time -l output from running the binary_diff.sh script on my macbook:

       10.00 real         2.00 user         1.10 sys
           157335552  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
               35988  page reclaims
                 887  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                1633  voluntary context switches
                4233  involuntary context switches
            18911038  instructions retired
            20490635  cycles elapsed
             2098112  peak memory footprint

The following shows the resource usage when swapping the small aiihca.paa1jan.subset file for the bigger aiihca.paa1jan file.

 16.96 real        15.02 user         1.16 sys
          1171406848  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              160969  page reclaims
                2497  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                5265  voluntary context switches
              109062  involuntary context switches
            19046728  instructions retired
            21810650  cycles elapsed
             2032576  peak memory footprint

@aidanheerdegen, @CodeGat, @atteggiani – your thoughts on the best way to set up the CI tests would be valuable!

@atteggiani
Copy link
Collaborator

atteggiani commented Nov 12, 2024

Hi @blimlim,

Thank you for raising this issue.

Yes what you talk about is definitely doable.
I had a chat with @CodeGat earlier today to better understand what was already implemented for the model config reproducibility tests, and how much of that code can be applied to our case here.
I will have a look at that and how we can implement this for the moment.

Where to run the tests

The main decisions to make are where the fields files + netCDF files should be stored, and where the regression tests should be run. Options mentioned by @aidanheerdegen include storing the data and running on gadi, as with the model config reproducibility tests, or running on github directly with the data stored elsewhere.

I think the better option is to run the tests (and also store the data) on Gadi (either through a PBS job or on the login node, depending on the "workload" of the tests).
The reasons are multiple:

  • Main reason would be to set-up a "regression-testing" infrastructure that could be applicable to other repos (for example, we have to do a similar thing with the replace_landsurface and era5_grib_parallel repos)
  • The input/output files size for testing could get pretty big, making it slower (or impossible) to copy the files somewhere else (for example to a GitHub runner).
  • If tests require greater computational resources (n. CPUs, MEM, etc.) than the ones available on the GitHub runners, we can have access to them on Gadi through PBS jobs.

I think a good option for storing data is within the vk83 project, maybe in a folder called regression-tests or similar and subdivided for each different project (based on the repo name). In the case of this repo, the whole path could be /g/data/vk83/regression-tests/um2nc-standalone/.
I don't think the number or size of files would be too big in general, making the inodes and size quota not an issue. In any case, we could think about ways to optimize that if problems arise.

Which tests to run (and which fieldsfile to use)

Another decision is around the fields file to use – the aiihca.paa1jan.subset file is small, but it only contains two variables (pr and ta_plev), and so some important parts of the code aren't run. The heaviside variables are missing, and as a result the masking step skipped. This part of the code is not covered by the unit tests, as it involves hard to test iris methods.
If it's viable, I'd be in favour of using a slightly bigger file which requires more of the code to run. The whole aiihca.paa1jan file that aiihca.paa1jan.subset is derived from is 511M, perhaps too big. I'm wondering whether there'd be a possible middle ground, and whether the choice of file determines where the CI tests should run?

I think the main objective of the regression tests is to make sure the code functionality is as expected.
These are especially important when unit-testing is not complete (or not present at all).
I think in general we should perform a bunch of tests that verify different options/entry-points based on the specific project.
For this project I think it's important to test the heaviside masking, along with the conversion_driver_esm1p5.py script.
Given the rather quick elapsed time for the tests (around 15s from the output you shared), we could perform 3-4 different tests for the um2netcdf.py and maybe 1-2 tests for the conversion_driver_esm1p5.py. This way we could test different things, still keeping the tests to a total time ~2m.
For the fieldsfile to use, I think the best option is to prepare "ad hoc" fieldsfile (unless we already have suitable ones) that are not too big (I would try not to go more than 20M each) and that are suitable to test the particular options we want to test (for example the heaviside masking).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants