Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update conda environments, conda package recipe and documentation #146

Merged
merged 6 commits into from
Nov 7, 2024

Conversation

atteggiani
Copy link
Collaborator

@atteggiani atteggiani commented Oct 9, 2024

Fixes #106.
Fixes #129.
Fixes #60.

Superseeds #62.

  • Set up build and dev conda environment files.
  • Modified pyproject.toml to reflect modifications in the conda environment files. Modified meta.yaml for conda packaging.
  • Modified CI.yml for conda-build testing.
  • Modified README.md with instructions on how to install (Gadi, locally and dev)

The conda environments and the steps in the documentation have been all tested locally.

@atteggiani atteggiani marked this pull request as draft October 9, 2024 21:51
@atteggiani atteggiani force-pushed the davide/create_development_dependencies branch 4 times, most recently from 92bac2d to f8abe7d Compare October 10, 2024 03:00
@atteggiani atteggiani marked this pull request as ready for review October 10, 2024 03:06
@atteggiani atteggiani changed the title Create environments and development dependencies Create conda environments and development dependencies + documentation Oct 10, 2024
@atteggiani atteggiani changed the title Create conda environments and development dependencies + documentation Update conda environments, conda package recipe and documentation Oct 10, 2024
@atteggiani atteggiani force-pushed the davide/create_development_dependencies branch from f8abe7d to 8b39008 Compare October 10, 2024 03:08
@atteggiani atteggiani force-pushed the davide/create_development_dependencies branch from 8b39008 to 2bf7501 Compare October 10, 2024 03:22
README.md Outdated Show resolved Hide resolved
.github/workflows/CI.yml Outdated Show resolved Hide resolved
@atteggiani atteggiani force-pushed the davide/create_development_dependencies branch 2 times, most recently from d4b424e to 2cfc4ba Compare October 13, 2024 21:38
.github/workflows/CI.yml Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
Copy link

@marc-white marc-white left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty sensible & workable to me.

@blimlim
Copy link
Collaborator

blimlim commented Oct 30, 2024

I've just been testing the dev environment out and ran into a couple of things:

Last week iris added support for numpy version 2, and as a result creating an environment from env_dev.yml uses numpy v2. mule uses methods that are removed in numpy v2 and so converting a fields file using the development environment produces the following error:

  File "/Users/u1161001/miniconda3/envs/um2nc_base_PR/lib/python3.12/site-packages/numpy/__init__.py", line 414, in __getattr__
    raise AttributeError("module {!r} has no attribute "
AttributeError: module 'numpy' has no attribute 'product

On my laptop, conda installed mule 2019.01.01 which doesn't restrict the numpy version, hence leading to the above error. The payu environment on gadi uses mule 2022.07.01 which does restrict numpy >=1.21.6,<2.0a0 but I don't think this is available on all platforms.

I'm wondering, will we need to restrict the numpy version to numpy<2 in our env_...yaml files to ensure mule works?


Installing using env_dev.yml also installed hdf5 1.14.4 for me, while the version in the payu 1.1.5 environment is 1.14.3. This change in hdf5 version breaks the bitwise reproducibility integration tests.

Running the integration tests:

  • Using an environment with the new hdf5 1.14.4:
    $ diff mask.nc ../../aiihca.paa1jan.subset.orig.nc 
    > Binary files mask.nc and ../../aiihca.paa1jan.subset.orig.nc differ
    
  • Using an environment were we restrict hdf5 1.14.3:
    $ diff mask.nc ../../aiihca.paa1jan.subset.orig.nc
    $
    

In the above aiihca.paa1jan.subset.orig.nc was produced by converting /g/data/tm70/esm15_testing/sample_output with the original um2netcdf4 script.

There's no clear differences in the resulting netCDF files though. Using nccmp to compare the data, global attributes, encodings and metadata:

$ nccmp -Bdgehs mask.nc ../../aiihca.paa1jan.subset.orig.nc mask.nc
> Files "mask.nc" and "../../aiihca.paa1jan.subset.orig.nc" are identical.

I'll make a separate issue for discussion on this, as I'm not sure how strictly we want to adhere to bitwise reproducibility.

@CodeGat CodeGat removed their request for review October 30, 2024 05:08
@atteggiani
Copy link
Collaborator Author

Hi @blimlim,
Thank you for your comments!

The payu environment on gadi uses mule 2022.07.01 which does restrict numpy >=1.21.6,<2.0a0 but I don't think this is available on all platforms.

Why you think it is not available on all platforms?

I think the best solution would be to restrict mule dependency to use version 2022.07.01.
In case this version is not available (or generates issues), then your suggestion to restrict the numpy version to numpy<2 makes sense.

I'll make a separate issue for discussion on this, as I'm not sure how strictly we want to adhere to bitwise reproducibility.

Your point is very interesting, and I think a separate issue is the best place to discuss that.
On a general case, I think we might want to think about keeping bitwise reproducibility in the first release of the um2nc conda package, to ensure there is at least a version which is fully backward-compatible with the old script, and therefore can fully reproduce its results.
However, for all subsequent releases, I would not restrict the hdf5 dependency.
We might think about doing 2 releases of the um2nc conda package one after the other:

  • the first version could be called something like 1.0-backcomp, where we ensure backward compatibility by restricting hdf5 to use version 1.14.3
  • the second version (1.0) will instead have a non-restricted hdf5 dependency, and would be suitable for any future usage.

@blimlim
Copy link
Collaborator

blimlim commented Oct 30, 2024

Why you think it is not available on all platforms?
I think the best solution would be to restrict mule dependency to use version 2022.07.01.

Definitely agree – since it's mule that requires numpy<2 I think it would make the most sense for that dependency to be handled by mule. When I try and build the environment on my mac with mule=2022.07.1 (I had a typo in my original message, it should be 2022.07.1 instead of 2022.07.01), I get the error:

PackagesNotFoundError: The following packages are not available from current channels:

  - mule=2022.07.1*

Building on gadi with the same restriction works without any issues. On the coecms Anaconda page for mule, does the linux-64 prefix before the 2022.07.01 version mean that it's not available on all architectures?

The 2020.01.1 version also doesn't restrict numpy, and I had a separate error trying to install that one:

└─ mule 2020.01.1**  is not installable because it requires
   └─ python_abi 3.6.* *_cp36m, which requires
      └─ python 3.6.* , which does not exist (perhaps a missing channel).

On a general case, I think we might want to think about keeping bitwise reproducibility in the first release of the um2nc conda package, to ensure there is at least a version which is fully backward-compatible with the old script, and therefore can fully reproduce its results.

I think this sounds like a great idea. I've made an issue #149 to continue discussion.

@atteggiani
Copy link
Collaborator Author

On the coecms Anaconda page for mule, does the linux-64 prefix before the 2022.07.01 version mean that it's not available on all architectures?

Yeah, I had not seen that.
The 2022.07.01 from the coecms conda channel is only available on Linux systems.

This is a problem at the moment.
I am not sure why there is no mule 2022.07.01 withnoarch specification.

For the moment probably we will have to restrict numpy to numpy<2.

@atteggiani
Copy link
Collaborator Author

Added mule<2 restriction in pyproject.toml in b5aa60b

@atteggiani atteggiani force-pushed the davide/create_development_dependencies branch from b5aa60b to 8038a70 Compare November 3, 2024 23:12
@blimlim
Copy link
Collaborator

blimlim commented Nov 5, 2024

Just noticed with the version implementation, we need to add __version__ to the history attributes in um2netcdf.py:

def add_global_history(infile, iris_out):
version = -1 # FIXME: determine version
t = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
um2nc_path = os.path.abspath(__file__)
history = f"File {infile} converted with {um2nc_path} {version} at {t}"
iris_out.update_global_attributes({'history': history})
warnings.warn("um2nc version number not specified!")

Would the best way to do this be importing umpost and then changing the above to:

 def add_global_history(infile, iris_out): 
     version = umpost.__version__
     t = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S") 
     um2nc_path = os.path.abspath(__file__) 
     history = f"File {infile} converted with {um2nc_path} {version} at {t}" 
  
     iris_out.update_global_attributes({'history': history}) 

(I was hoping to add a review code suggestion for this, but don't think it lets you review unchanged files!)

@atteggiani
Copy link
Collaborator Author

atteggiani commented Nov 5, 2024

Just noticed with the version implementation, we need to add __version__ to the history attributes in um2netcdf.py:

def add_global_history(infile, iris_out):
version = -1 # FIXME: determine version
t = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
um2nc_path = os.path.abspath(__file__)
history = f"File {infile} converted with {um2nc_path} {version} at {t}"
iris_out.update_global_attributes({'history': history})
warnings.warn("um2nc version number not specified!")

Would the best way to do this be importing umpost and then changing the above to:

 def add_global_history(infile, iris_out): 
     version = umpost.__version__
     t = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S") 
     um2nc_path = os.path.abspath(__file__) 
     history = f"File {infile} converted with {um2nc_path} {version} at {t}" 
  
     iris_out.update_global_attributes({'history': history}) 

(I was hoping to add a review code suggestion for this, but don't think it lets you review unchanged files!)

Good point!

I added your suggestion in 0512700d8fdbd1aa0bb85d129118b1539d15aa23

(There are a few formatting differences in the file that have been introduced by my VSCode automatic Ruff formatting on Save).

blimlim
blimlim previously approved these changes Nov 6, 2024
Copy link
Collaborator

@blimlim blimlim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look great! Thanks for putting this together!

Copy link
Collaborator

@blimlim blimlim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good!

@atteggiani atteggiani force-pushed the davide/create_development_dependencies branch from c098818 to 7e8e556 Compare November 7, 2024 00:39
Modified pyproject.toml to reflect modifications in the conda environment files.
Modified meta.yaml for conda packaging.
Modified CI.yml for conda-build testing.
Modified README.md with instructions on how to install (Gadi, locally and dev).

Removed 'develop' as a branch that triggers the CI.

Removed external install of 'udunits' from README

switch tag reference with hash in codecov/codecov-action

small fix to CI.yml

Modified itignore

Included url and summary in conda recipe from pyproject.toml

Removed useless env variable from CI

Bumped deployment py version from 3.10 to 3.11

Fixed name for conda environment in CD.yml
@atteggiani atteggiani force-pushed the davide/create_development_dependencies branch from 7e8e556 to 588d070 Compare November 7, 2024 00:43
@atteggiani atteggiani merged commit 495aaa5 into main Nov 7, 2024
5 checks passed
@atteggiani
Copy link
Collaborator Author

Thank you @blimlim and @marc-white for your reviews.
Merged.

@atteggiani atteggiani deleted the davide/create_development_dependencies branch November 7, 2024 00:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants