Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NX docstring as attribute #415

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

lukaspie
Copy link
Collaborator

@lukaspie lukaspie commented Aug 29, 2024

@rettigl this is a way to add the NX docstrings to the HDF5 files as attributes. The idea is that you can pass the write-docs flag to the dataconverter and then docstrings are added. By default, this is turned off so as to not change any existing workflows. We can discuss if we want this to happen by default.

Here's an example using the xps reader:
output.nxs.zip

The implementation is relatively trivial. There are however two open questions for me

  • How to handle inherited docs
    Usually, our appdefs have very sparse documentation since all the docstrings are in the base classes or in appdefs that are further up the chain. My understanding is that the docs are also extended (not overwritten) when you inherit from a class. So technically, one should concatenate the docs of all inherited nodes. However, this gets quite unwieldy and messy as the docs in the files get rather large and sometimes they might even contradict each other. My suggestion (implemented here) was that as we travel up the inherited nodes, as soon as there is a docstring we only use that one (and don't add any docs coming from even earlier nodes). But this is probably not strictly correct. Maybe @sanbrock can comment.

  • Docs for NeXus attributes
    NeXus attributes are written as HDF5 attributes already. Since HDF5 attributes cannot have attributes themselves, the question is where to place the docs for these attributes? My solution here: write another attribute <attribute>__docs (e.g. entry/definition/version__docs) to the HDF5 file.

@lukaspie lukaspie linked an issue Aug 29, 2024 that may be closed by this pull request
.gitmodules Outdated
@@ -1,3 +1,4 @@
[submodule "src/pynxtools/definitions"]
path = src/pynxtools/definitions
url = https://github.com/FAIRmat-NFDI/nexus_definitions.git
branch = nxmpes-unit-doc-test
Copy link
Collaborator Author

@lukaspie lukaspie Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reminder to remove before merging

@lukaspie lukaspie marked this pull request as ready for review August 30, 2024 10:49
@sherjeelshabih
Copy link
Collaborator

  • Docs for NeXus attributes
    NeXus attributes are written as HDF5 attributes already. Since HDF5 attributes cannot have attributes themselves, the question is where to place the docs for these attributes? My solution here: write another attribute <attribute>__docs (e.g. entry/definition/version__docs) to the HDF5 file.

Any practical solution works in my opinion. The only issue will be how to report/add this in the NXDL structure. It's a bit meta over the already meta attributes we have in the NXDL. Will something like adding this renameable field, FIELDNAME__docs, in NXobject work out nicely in the NXDL framework?

@lukaspie
Copy link
Collaborator Author

  • Docs for NeXus attributes
    NeXus attributes are written as HDF5 attributes already. Since HDF5 attributes cannot have attributes themselves, the question is where to place the docs for these attributes? My solution here: write another attribute <attribute>__docs (e.g. entry/definition/version__docs) to the HDF5 file.

Any practical solution works in my opinion. The only issue will be how to report/add this in the NXDL structure. It's a bit meta over the already meta attributes we have in the NXDL. Will something like adding this renameable field, FIELDNAME__docs, in NXobject work out nicely in the NXDL framework?

I didn't go this far. For groups and fields, I basically added an attribute @docs, whereas for the attributes, I used an additional attribute <attribute>__docs. I guess the problem would be that all of these are undocumented..

@sherjeelshabih
Copy link
Collaborator

Ah alright. So it's just for the attributes that you add a suffix.

You're right. They will remain undocumented. Let's say to see how it goes in use for us we can leave it undocumented.

It will make it practically easier to understand Nexus files like this. It makes the Nexus files more self sufficient too. And it seems this is the best we can do without overcomplicating it.

@lukaspie
Copy link
Collaborator Author

lukaspie commented Sep 3, 2024

Notes from TF meeting:

  • Use single underscore for attributes, like for example data and data_units
  • Can we render the docs a bit nicer? HTML syntax?
  • Don't write the documentation to every single instance of the same concept, use internal / softlink
  • External links to documentation? docs and docs_url
  • @sanbrock: HDF5 contains data, not information. What happens when we want to reuse the data somewhere else? We could prevent this by writing a parallel structure with the docs and linking to that place.
    • option 1: add a warning if links are used and the docstrings are not actually correct?
    • option 2 (by @rettigl): if we write the docs, we are not using links, but rather we copy the data directly

@lukaspie lukaspie force-pushed the 403-improvement-add-nx-doc-string-as-attribute branch from 7369779 to 81edf90 Compare September 17, 2024 15:34
"""
Process individual XML element to generate the NeXus concept path.

Output is e.g. "NXexperiment:/NXentry/NXinstrument/NXdetector".
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is called classpath, but this can be ambiguous if two concept of the same type exists next two one another. This is why the NeXus vocabulary follows the path we call html name which is unique for a given concept. E.g. NXexperiment:/ENTRY/INSTRUMENT/mydetector. Note that this does not hold the information of the nexus classes referened along the path, so if it is needed, a classpath needs to be returned, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improvement] Add NX doc string as attribute
3 participants