Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resilliency measures to prevent table hacks #575

Open
sashakames opened this issue Jan 16, 2020 · 14 comments
Open

Resilliency measures to prevent table hacks #575

sashakames opened this issue Jan 16, 2020 · 14 comments
Milestone

Comments

@sashakames
Copy link
Collaborator

Example file: CMIP6/PMIP/NUIST/NESM3/lig127k/r1i1p1f1/Ofx/areacelli/gn/v20190924/areacelli_Ofx_NESM3_lig127k_r1i1p1f1_gn.nc
CMOR should not produce this because it doesn't match the tables.

The file will pass PrePARE if I add the same made-up variable with the "sea" realm.

Possibilities: (1) CMOR/PrePARE only uses source repos with "tag" checkouts without modified files (2) gets Table data from a server.

@durack1
Copy link
Contributor

durack1 commented Jan 16, 2020

@sashakames I think the way that @doutriaux1 did this in the past was to compare the hash of the table file used with the version they identified they used. If the hash wasn't identical, then report a "user error"

@doutriaux1
Copy link
Collaborator

@sashakames @durack1 is right. In the tables repo I stored the md5 of the tables see: https://github.com/PCMDI/cmip5-cmor-tables/blob/master/Lib/gen_table_md5s.py

@doutriaux1
Copy link
Collaborator

@durack1
Copy link
Contributor

durack1 commented Jan 16, 2020

While we're on this subject, for the CMIP6_CVs we embed some version information in the files, so e.g.:

    "version_metadata":{
        "CV_collection_modified":"Wed Jan 15 16:44:36 2020 -0800",
        "CV_collection_version":"6.2.45.7",
        "author":"Paul J. Durack <[email protected]>",
        "institution_id":"PCMDI",
        "mip_era_CV_modified":"Thu Aug 25 17:21:00 2016 -0700",
        "mip_era_CV_note":"Fix #36 - Add CV name to json structure",
        "previous_commit":"f07f21c4a1fa5b87a39d1fa51106fd7fad1583b8",
        "specs_doc":"v6.2.7 (10th September 2018; https://goo.gl/v1drZl)"

https://github.com/WCRP-CMIP/CMIP6_CVs/blob/master/mip_era.json#L9-L17

I note this is a little fiddly, as you can't embed the commit hash in the file that corresponds with that particular version as the change to the hash means you're always one step behind yourself, but this could be implemented external to the table files just like @doutriaux1 did for cmip5.

@sashakames
Copy link
Collaborator Author

Reasonable solution and PrePARE should do so too, and we check that CMOR still will enforce this as it must have been disabled, unless NUIST created their own hashes to circumvent the check.

@durack1
Copy link
Contributor

durack1 commented Jan 16, 2020

@sashakames @doutriaux1 it would seem the CMOR already provides this info as netcdf file metadata, see:
:table_info = "Creation Date:(09 May 2019) MD5:fa1fbd5904d4b890fb319f821b9ab99a"
We'd just need to validate this is an "official" hash/date pair

@doutriaux1
Copy link
Collaborator

yep that's what I implemented for CMIP5. I think there's even tools for it (attributes name may have changed in CMOR3), but essentially go fetch md5s file from github directly and validate it matches.

@sashakames
Copy link
Collaborator Author

sashakames commented Jan 16, 2020

I double checked and the hash is there in the file that should be rejected!

@sashakames
Copy link
Collaborator Author

Sorry for all the edits. Clearly there must be a bug somewhere that the check isn't enforced even though the hash gets written out.

@doutriaux1
Copy link
Collaborator

no we never checked the hash! Sadly. I added the tag in CMOR2 with the hope that people would use to check. I created the tools in the cmip5-cmor-tables repo to check if the hash matches. But nobody ever cared to use this in production...

@durack1
Copy link
Contributor

durack1 commented Aug 16, 2022

@sashakames just circling around on some of these issues, there has been conversations about creating conda packages for the cmip6-cmor-tables, if we did that, it would facilitate the automated testing far more easily - it would also allow us to embed md5 hashes in the package tar ball

@sashakames
Copy link
Collaborator Author

@durack1 That sounds promising, so the embedded tables in the conda package would be much more difficult to tamper with. You just need to engineer the conda build so its not simple enough to download from source and install yourself.

@durack1
Copy link
Contributor

durack1 commented Nov 28, 2022

@matthew-mizielinski thinking that this issue should be moved across to the https://github.com/PCMDI/mip-cmor-tables/ repo?

@durack1 durack1 added this to the 4.0/Future milestone Apr 7, 2024
@durack1
Copy link
Contributor

durack1 commented Apr 7, 2024

A very good idea for CMOR4 planning, along with "shipping" CMOR with the mip-cmor-tables - somewhat of a dupe for #676

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants