Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where can I see changes in the standard name table? #7

Open
rsignell-usgs opened this issue Aug 7, 2014 · 12 comments
Open

Where can I see changes in the standard name table? #7

rsignell-usgs opened this issue Aug 7, 2014 · 12 comments

Comments

@rsignell-usgs
Copy link
Member

Where can I see changes in the standard name table?

I would expect that the xml file is being revision controlled somewhere in github, with releases corresponding to the version number.

On this page:

https://github.com/cf-convention/repository-cf/tree/master/cf-standard-names/trunk

I see only an old 1.0 xml (this is sort of a template, right?)

While on this page:

https://github.com/cf-convention/cf-convention.github.io/tree/master/Data/cf-standard-names

I see folders that contain the different versions of XML within them, like:

https://github.com/cf-convention/cf-convention.github.io/tree/master/Data/cf-standard-names/27/src

(BTW, Shouldn't all these folders be turned into a single folder with tagged releases? Perhaps this should be a separate issue -- not sure)

@graybeal
Copy link

graybeal commented Aug 8, 2014

I endorse Rich's sentiment here. Now that the information is on github, I think the correct model is that it is a single document in each case, with the version number both internal and tagged.

This is especially appropriate if we do not map the old CF site links to their exact page in the new site, which is something I also think is important, so as not to break the countless links to CF already on the web. Real URIs Never Change, as Tim Berners-Lee says.

So yes, it would be nice to be able to see change histories and content changes. Though realistically, textual diffs may not be useful at all, because of the different ordering of elements which may occur; and it will be hard to get the dates metadata correct.

This may be much more usefully done in a repository, where the changes are expressed per term, rather than per line. I will enter a request in our ORR repository for this change, though some fidelity (in aliases) is already not present there. Someday we should have this capability.

John

On Aug 7, 2014, at 03:07, Rich Signell [email protected] wrote:

Where can I see changes in the standard name table?

I would expect that the xml file is being revision controlled somewhere in github, with releases corresponding to the version number.

On this page:

https://github.com/cf-convention/repository-cf/tree/master/cf-standard-names/trunk

I see only an old 1.0 xml (this is sort of a template, right?)

While on this page:

https://github.com/cf-convention/cf-convention.github.io/tree/master/Data/cf-standard-names

I see folders that contain the different versions of XML within them, like:

https://github.com/cf-convention/cf-convention.github.io/tree/master/Data/cf-standard-names/27/src

Shouldn't all these folders be turned into a single folder with tagged releases?


Reply to this email directly or view it on GitHub.

@mattben
Copy link
Contributor

mattben commented Aug 11, 2014

From my understanding the community makes choices of what changes will be made to the standard-names file. @japamment (Alison) is the one who post the updated standard names xml files once the community has voted and vetted all new changes. at this point she will push the new files to the repo.

@mattben mattben closed this as completed Aug 11, 2014
@graybeal
Copy link

Matt, I don't think this should be closed quite yet. The original question was not who makes changes, but how one can easily see the changes.

A big advantage of using a version control system is that it is easy to track changes from one version of the document to the next, and do comparisons between those versions, using the tools of the version control system. To explore this possibility, I volunteer to put up an example of this using a few historical versions of the CF Standard Names XML, to see if it is a good fit for Rich's needs.

@mattben mattben reopened this Aug 12, 2014
@mattben
Copy link
Contributor

mattben commented Aug 12, 2014

I agree that version control used in the way you are suggesting could be done. The updaters of both the files have always copied version X in to version Y and then made the changes. which does keep a full history. The change your are suggestion is more a cf-convention team discussion through the email list, since not all the members are on github. I'd recommend sending @painter1 (Jeff Painter) or @japamment (Alison Pamment) an email with your suggestion. The data all lives in side the same repo (site, standard names, cf conventions, documents), I'm not sure how your could tag just the new standard names and not everything. We could make it more complex by creating a branch for standard names and one for the cf-conventions (...) which could be updated and then pushed in to master. Can you tag a branch?

@graybeal
Copy link

Thanks for the comments Matt, and for re-opening this issue. Yes, branches can be tagged. The act of releasing a new version of a file automatically generates a new release; you can also tag that particular release. (Incidentally, you don't have to create a branch to make updates to a file, but branching is a nicer workflow to let people see and optionally work on those updates before they are released.)

I will do an example in my own fork of the repository which shows how this could work, and then put something out to the mail list we were using before for discussing the github repository.

@graybeal
Copy link

I have created the promised example, it can be viewed at
https://github.com/graybealski/cf-conventions-work
The specific file can be 'viewed' at
https://github.com/graybealski/cf-conventions-work/blob/master/cf-standard-name-table.xml
though, because the file is so large, you get to look at the 'raw' file here (or by clicking on RAW):
https://raw.githubusercontent.com/graybealski/cf-conventions-work/master/cf-standard-name-table.xml
and meanwhile the history of commits can be viewed at
https://github.com/graybealski/cf-conventions-work/commits/master/cf-standard-name-table.xml

The one nice thing is that I could set the authored date to the actual last_modified timestamp from within the file, so those match (except for the first version!).

You can't really view the specific diffs on github, because it just doesn't handle viewing big files well. (Similarly its blame capability, which tracks who made which changes, just can't deal.)

You can see those things in your local repository, if you have forked this repository. For one of a gazillion examples,
git diff HEAD HEAD^ compares the most recent commit to the one before it. I grabbed a screen shot of the beginning of this output, it is below; the diff command has a ton of variants.

A difficulty is that the comparison is between the commits in the repository, not in this file. So unless you make a single repository just for the standard name XMLs, users would have to be git-familiar to get what they want.

That's enough info for now. I will be interested in comments.

screen shot 2014-08-12 at 10 31 05 pm

@rsignell-usgs
Copy link
Member Author

@graybeal , this is great!

If you click on the standard name table, then on the History tab, and then select a release number, you can see the differences very nicely on github, for example:

graybealski/cf-conventions-work@bb04b24

This is definiltey the way that the CF releases should be managed on GitHub!

@graybeal
Copy link

oh hey, didn't realize the commit viewed showed diffs so nicely. cool!

john

On Aug 13, 2014, at 05:15, Rich Signell [email protected] wrote:

@graybeal , this is great!

If you click on the [standard name table[(https://github.com/graybealski/cf-conventions-work/blob/master/cf-standard-name-table.xml), then on the History tab, and then select a release number, you can see the differences very nicely on github, for example:

graybealski/cf-conventions-work@bb04b24


Reply to this email directly or view it on GitHub.

@rhattersley
Copy link
Member

This is definiltey the way that the CF releases should be managed on GitHub!

👍 ... it's a big step in the right direction.

For future changes it'd be even nicer if the process allowed multiple commits between versions - i.e. each related group of changes had its own commit. (Perhaps by having the document in its own repo though and using tagged versions.)

@rsignell-usgs
Copy link
Member Author

@rhattersley , are you on the cf mailing list?
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Would you be willing to chime in there on this issue as well?

@sadielbartholomew
Copy link
Member

Hi all, I thought I would add a comment here after hearing the summary of discussions on Standard Names at the breakout session of this week's workshop, & then reading the summary notes (I was attending a parallel session). I was nearly finished my comment before realising how old this Issue is! Though it is still open, so I assume not resolved satisfactorily...

The original question was not who makes changes, but how one can easily see the changes.

It seems the discussion thus far had focused on diffs on the canonical XML, but thinking more generally, the change information one would want to view is (surely) the set of names &/or the data about them (canonical units, description) that exist in one version that were not there in some previous version. I.e. to extract the new names, perhaps with info about them, between two versions, not necessarily in a diff-command form that indicates strict changes in the XML file e.g. ordering? Is that correct @rsignell-usgs? Or is there something else about a diff that is important generally &/or to yourself?

In that case I do not think a GitHub version control solution, as discussed here, is the only possibility; some lightweight tool that could extract the desired information would work, & then the reader would not have to pick out the information from within the XML syntax. And such a tool could be hooked up to the website so that anyone can search on the differences they are interested in.

I am asking partly because I already have some utility functions in a Python script that can pull out the standard names that exist in one version but not in another, by walking the repo hierarchy (specifically the Data/cf-standard-names dir) as it is to find the relevant version's XML & pulling out the names via regular expression pattern. It would be simple to extend it (by adding extra regex patterns to apply) to extract the descriptions or the canonical units too.

To demonstrate, using those functions to extract the names so I can plot various data on the standard names, as covered in the issue here.

I've got this all stored in a branch, but as an example, this is a function I have that prints out, to the command line, all the names in version X but not in version Y from the XML in the repo.

E.g, on that branch, if I run print_version_comparison(65, 64) within the script, I get all the names (underscores changed to whitespace ATM to put into a word cloud, but ignore that for now) in v.65 and not v.64:

 --------------- New to v65 --------------- 
mole fraction of nitrogen trifluoride in air
mole fraction of chloroform in air
mole fraction of hcfc124 in air
mole fraction of hfc245fa in air
mole fraction of pfc218 in air
mole fraction of hfc32 in air
mole fraction of hfc365mfc in air
mole fraction of pfc116 in air
mole fraction of hfc125 in air
mole fraction of perchloroethene in air
mole fraction of hfc152a in air
mole fraction of hfc143a in air
mole fraction of hfc236fa in air
mole fraction of hfc134a in air
mole fraction of pfc318 in air
mole fraction of hfc4310mee in air
mole fraction of dichloromethane in air
mole fraction of sulfuryl fluoride in air
mole fraction of hfc227ea in air
mole fraction of hfc23 in air

and I can input in any arbitrary X & Y versions to print_version_comparison(X, Y) (assuming they both exist as a published version i.e. a dir underData/cf-standard-names/ ).

Overall, could an extraction tool such as that in the functions above be preferable to a version control diff solution? And if so, how could we provide that to people?

One idea I had is that there could be a dedicated page on the website to query on changes to the table, similar to the 'search' functionality on each HTML table but for querying differences across all versions. For example, someone could ask via the query form such questions as:

  • what are the names, with their description & units, new to version X?
  • what version was added in?
  • now many names were added in version X?

What do people think?

@jesusff
Copy link
Contributor

jesusff commented Sep 19, 2024

Hi Sadie, these tools are really helpful!

Regarding the OP question, I think it was solved already in Sep 2014 (fcc4b29), but never mentioned here. Just for the record, when the current/ folder was created, differences across different versions can be displayed in GitHub and nicely also using local tools (e.g. git difftool). Git blame is also helpful to find out when a given entry was last modified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants