Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base class inheritance #158

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Base class inheritance #158

wants to merge 1 commit into from

Conversation

domna
Copy link
Collaborator

@domna domna commented Sep 13, 2023

This is an example implementation for base class inheritance.
It is related to FAIRmat-NFDI/nexus_definitions#66

What we need to do:

  • Add support for inheritance in nyaml2nxdl
  • Support for resolving inherited base class, i.e., construct the full field list from a bc and its parents
  • The conversion understands that a sub-class inherits from its parent and such can be replaced, e.g., that a NXbeam group can be filled with, e.g., the path /ENTRY[entry]/BEAM_OPTICAL[beam].

@domna domna marked this pull request as draft September 13, 2023 07:33
@coveralls
Copy link

coveralls commented Sep 13, 2023

Pull Request Test Coverage Report for Build 6169389406

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.005%) to 51.117%

Totals Coverage Status
Change from base Build 6160432682: 0.005%
Covered Lines: 5996
Relevant Lines: 11730

💛 - Coveralls

@sherjeelshabih
Copy link
Collaborator

  • Support using base classes in the path, e.g., we want to be able to write /ENTRY[entry]/NXBEAM_OPTICAL[beam] if we want to use an NXbeam_optical if an NXbeam is demanded, but the converter should check all the fields of NXbeam_optical

The converter already picks up on this. It's just that it will nexus.get_node_at_nxdl_path with the following NXDL path:
/ENTRY/BEAM_OPTICAL.

We need to make sure nexus.py understands that NXbeam_optical inherits from NXbeam and such can be replaced.

But is this what we want? Do we want to allow specialized beams be used in AppDefs that list the parent beam? Also, then do we allow using a different specialized beam or just the parent?

@domna
Copy link
Collaborator Author

domna commented Sep 13, 2023

We need to make sure nexus.py understands that NXbeam_optical inherits from NXbeam and such can be replaced.

Yes, I think this is the better formulation of this point. I will update it.

But is this what we want? Do we want to allow specialized beams be used in AppDefs that list the parent beam? Also, then do we allow using a different specialized beam or just the parent?

I would say yes. It should be possible to use any sub-class for a parent class (e.g., for NXbeam any of NXbeam_optical, NXbeam_electron, NXbeam_neutron, ...). IMO this inheritance should also always be compatible, e.g. NXbeam_optical has all the same fields required and recommended as NXbeam but just adds some more. If we demand a specalized class only this specialized class should be allowed (e.g., NXbeam_optical).

@sherjeelshabih
Copy link
Collaborator

IMO this inheritance should also always be compatible, e.g. NXbeam_optical has all the same fields required and recommended as NXbeam but just adds some more.

We will then need scripts to make sure we set up such compatibilty at all times. This can be hectic especially if I want to have a more specialized NXbeam_optical that switches around a bunch of stuff. The framework then should be able to either reject such changes and explain why or provide it as an incompatible to an NXbeam entry during writing.

I also want to bring another point which always puzzles me with NeXus: Why don't we tell what base class/type we exactly used in the data instance aka the h5 file. Currently, we only provide an AppDef and the framework has to resolve the rest of the dataset types using the AppDef as a source of truth. This means whenever I would like to add another group that is used in another technique or so, I have to go and make a new AppDef. This seems to have been designed for applications to know exactly what to expect. But my argument for that is that we can easily write an application that looks through the h5 file with its pointers to what type/concept every dataset is. And then act upon it if it knows how to deal with this concept. If it doesn't, it just ignores it for specialized functionality but still provides access to the user if they know what to do with it. The entanglement of data instance/storage structure dictating what concept we are dealing with is quite a hassle.

With the above strategy, one can easily use NXbeam_optical or whatever they like. If the app understands it aka finds compatible fields in this group, it works with it. But if it doesn't, then it ignores it.

@domna
Copy link
Collaborator Author

domna commented Sep 14, 2023

IMO this inheritance should also always be compatible, e.g. NXbeam_optical has all the same fields required and recommended as NXbeam but just adds some more.

We will then need scripts to make sure we set up such compatibilty at all times. This can be hectic especially if I want to have a more specialized NXbeam_optical that switches around a bunch of stuff. The framework then should be able to either reject such changes and explain why or provide it as an incompatible to an NXbeam entry during writing.

That's why I would mainly use inheritance for adding additional fields to the parent class and not renaming or moving anything (because this changes our notion of the information contained in this class). The question is when we write it as NXbeam_optical and it does not conform to it, do we allow to set it to NXbeam (if allowed by the appdef) if this is fulfilled and show a warning or do we rather abort the whole process?

I also want to bring another point which always puzzles me with NeXus: Why don't we tell what base class/type we exactly used in the data instance aka the h5 file.

This information is written, no? There is the NX_class attribute which collects which class we write. Consuming applications could use this field to understand which class it is dealing with.

Currently, we only provide an AppDef and the framework has to resolve the rest of the dataset types using the AppDef as a source of truth. This means whenever I would like to add another group that is used in another technique or so, I have to go and make a new AppDef.

This is one of my main reasons to have base class inheritance. With this appdefs can be as easy as just stating which combination of classes I expect. We even could go towards a generic experiment which we manually compose. However, we just offload the specifics into the base classes then (i.e., we would still want to require certain fields for an, e.g., mpes experiment with a collection of fields we don't want to (re-)use in any other experiment). I would also argue that our current process is removing the differences between appdefs and base classes, except that we still name them such.

This seems to have been designed for applications to know exactly what to expect. But my argument for that is that we can easily write an application that looks through the h5 file with its pointers to what type/concept every dataset is. And then act upon it if it knows how to deal with this concept. If it doesn't, it just ignores it for specialized functionality but still provides access to the user if they know what to do with it. The entanglement of data instance/storage structure dictating what concept we are dealing with is quite a hassle.

We really can push towards trying also having a generic representation without relying on an appdef (or only a generic appdef, or not using the terms of appdefs and base classes anymore and just consider them using the same modeling principles). And I tend to like this idea as it gets closer to what is done in owl, that is, only relating some properties to an experiment. But again, I think we just offload the variability into the base class taxonomy.

With the above strategy, one can easily use NXbeam_optical or whatever they like. If the app understands it aka finds compatible fields in this group, it works with it. But if it doesn't, then it ignores it.

I think these are all good points. I think we should just discuss/test some of them and see which one works best for us. I'm currently reading also a bit in owl/ontologies modelling so maybe I can draw some inspiration from this as well.

@sherjeelshabih
Copy link
Collaborator

That's why I would mainly use inheritance for adding additional fields to the parent class and not renaming or moving anything (because this changes our notion of the information contained in this class). The question is when we write it as NXbeam_optical and it does not conform to it, do we allow to set it to NXbeam (if allowed by the appdef) if this is fulfilled and show a warning or do we rather abort the whole process?

So you mean that an AppDef asks for an NXbeam_optical but a user provides an NXbeam in the data? I think we should discuss this over a meeting when we have a consensus on the examples you've setup on the definitions repo. I would say if NXbeam_optical is incompatible with NXbeam then we cannot just accept an NXbeam as the data. But I'm not too confident with this. I need more clarification.

This information is written, no? There is the NX_class attribute which collects which class we write. Consuming applications could use this field to understand which class it is dealing with.

The resolution depends on how we modify that NX_class in our AppDef. If we change the doc string etc or we add fields to it this becomes a separate entity in the AppDef than what it is in the base class. I think with this base_class inheritance this will become simpler as we can take the specialized base class and list it in the AppDef rather than modifying it there. The only limitation would be that some fields could be named the same in different groups. Their meaning comes from their hierarchy. That is much easier to handle if we have direct pointers for every single field. So I could mix match my data and still point to the right concept.

This is one of my main reasons to have base class inheritance. With this appdefs can be as easy as just stating which combination of classes I expect. We even could go towards a generic experiment which we manually compose. However, we just offload the specifics into the base classes then (i.e., we would still want to require certain fields for an, e.g., mpes experiment with a collection of fields we don't want to (re-)use in any other experiment). I would also argue that our current process is removing the differences between appdefs and base classes, except that we still name them such.

I agree with you. The current AppDefs seem to be an unnecessary limitation. If we have good base classes, any combination of them can be read correctly and understood. AppDefs would just be our h5 files using these base classes.

We really can push towards trying also having a generic representation without relying on an appdef (or only a generic appdef, or not using the terms of appdefs and base classes anymore and just consider them using the same modeling principles). And I tend to like this idea as it gets closer to what is done in owl, that is, only relating some properties to an experiment. But again, I think we just offload the variability into the base class taxonomy.

With the above strategy, one can easily use NXbeam_optical or whatever they like. If the app understands it aka finds compatible fields in this group, it works with it. But if it doesn't, then it ignores it.

I think these are all good points. I think we should just discuss/test some of them and see which one works best for us. I'm currently reading also a bit in owl/ontologies modelling so maybe I can draw some inspiration from this as well.

It will be nice to experiment with this. Please share some resources if you find something interesting. Thanks!

@domna
Copy link
Collaborator Author

domna commented Sep 14, 2023

That's why I would mainly use inheritance for adding additional fields to the parent class and not renaming or moving anything (because this changes our notion of the information contained in this class). The question is when we write it as NXbeam_optical and it does not conform to it, do we allow to set it to NXbeam (if allowed by the appdef) if this is fulfilled and show a warning or do we rather abort the whole process?

So you mean that an AppDef asks for an NXbeam_optical but a user provides an NXbeam in the data? I think we should discuss this over a meeting when we have a consensus on the examples you've setup on the definitions repo. I would say if NXbeam_optical is incompatible with NXbeam then we cannot just accept an NXbeam as the data. But I'm not too confident with this. I need more clarification.

No. My thought was, that the appdef requires an NXbeam and we might enter some data as /ENTRY[entry]/BEAM_OPTICAL[beam] to denote, that we want to enter the more specific form of NXbeam_optical into an NXbeam group. What happens now if the /entry/beam group does not contain all the fields required by NXbeam_optical, but all fields required by NXbeam. Do we downgrade to NXbeam and warn the user about the process or do we abort the conversion process?

IMO an NXbeam should never be allowed to replace NXbeam_optical if it is required in the appdef. Except for when NXbeam_optical is empty and thus equivalent to NXbeam. But we might want to carry some information from using an empty subclass (e.g., like the empty NXbeam_electron in this PR). In the ontology world this would make sense, but I don't know if we want such thing in nexus.

This information is written, no? There is the NX_class attribute which collects which class we write. Consuming applications could use this field to understand which class it is dealing with.

The resolution depends on how we modify that NX_class in our AppDef. If we change the doc string etc or we add fields to it this becomes a separate entity in the AppDef than what it is in the base class. I think with this base_class inheritance this will become simpler as we can take the specialized base class and list it in the AppDef rather than modifying it there. The only limitation would be that some fields could be named the same in different groups. Their meaning comes from their hierarchy. That is much easier to handle if we have direct pointers for every single field. So I could mix match my data and still point to the right concept.

The easiest would be to tell the writer which classname to write to NX_class, like in the example above.
It's another benefit of having base class taxonomies. We can just simply build a general concept and reuse it without recreating some anonymous classes (as I would call what is happening rn when we extend the base classes in appdefs).

This is one of my main reasons to have base class inheritance. With this appdefs can be as easy as just stating which combination of classes I expect. We even could go towards a generic experiment which we manually compose. However, we just offload the specifics into the base classes then (i.e., we would still want to require certain fields for an, e.g., mpes experiment with a collection of fields we don't want to (re-)use in any other experiment). I would also argue that our current process is removing the differences between appdefs and base classes, except that we still name them such.

I agree with you. The current AppDefs seem to be an unnecessary limitation. If we have good base classes, any combination of them can be read correctly and understood. AppDefs would just be our h5 files using these base classes.

Yes, we still might want to have some notion of an experiment, i.e., as a collection of N scan-axis with M sensors at each scan point + sample + operator + instrument. But we could combine it as we like with what we scan and what instrument we use.

We really can push towards trying also having a generic representation without relying on an appdef (or only a generic appdef, or not using the terms of appdefs and base classes anymore and just consider them using the same modeling principles). And I tend to like this idea as it gets closer to what is done in owl, that is, only relating some properties to an experiment. But again, I think we just offload the variability into the base class taxonomy.

With the above strategy, one can easily use NXbeam_optical or whatever they like. If the app understands it aka finds compatible fields in this group, it works with it. But if it doesn't, then it ignores it.

I think these are all good points. I think we should just discuss/test some of them and see which one works best for us. I'm currently reading also a bit in owl/ontologies modelling so maybe I can draw some inspiration from this as well.

It will be nice to experiment with this. Please share some resources if you find something interesting. Thanks!

I think the W3C primers on rdf, rdfs and owl are very extensive. Currently I'm reading this book which is very nice to get an overview.

@rettigl
Copy link
Collaborator

rettigl commented Sep 16, 2023

I did not follow all the discussions so far, but I somehow have the feeling that we should separate two things in the discussion:

  • the base class inheritance (which I support very much)
  • the introduction of required fields into base classes (which I am not so sure we should do)

The reason for doing the second is not so eminentily clear to me, as indeed it somewhat mixes the purpose of application definition and base classes, and I think there are very good reasons so have the two. We should think very carefully about why we want to introduce requirements in the first place, because any requirement very severely limits the usability of everything we crease, as we have seen in the discussion at the mpes-workshop.

@domna
Copy link
Collaborator Author

domna commented Sep 18, 2023

I did not follow all the discussions so far, but I somehow have the feeling that we should separate two things in the discussion:

  • the base class inheritance (which I support very much)
  • the introduction of required fields into base classes (which I am not so sure we should do)

My feeling is that having a base class inheritance without having required fields does not solve too much. Essentially, we simply add some additional vocabulary in an additional file. While I see that this can declutter our glossary a little it does not give us the notion of something representing an entity (e.g., a detector). We could very well use a class and its subclass interchangeably and just requiring the same fields in an appdef. For me it feels just adding some more files and complicating where a person has to look for glossary terms when writing or reading an appdef without adding any logical abstraction.

The reason for doing the second is not so eminentily clear to me, as indeed it somewhat mixes the purpose of application definition and base classes, and I think there are very good reasons so have the two. We should think very carefully about why we want to introduce requirements in the first place, because any requirement very severely limits the usability of everything we crease, as we have seen in the discussion at the mpes-workshop.

I agree that it is good to have a glossary of terms as we currently have it in the base classes. And I'm not fully arguing that we should get rid of the base classes entirely (even though we can discuss whether we can collect glossary terms also in a hierarchical way). Besides this fact, it is not so evident to me that its good to have the distinction of base classes and appdefs. We had several occurrences in the past where we had problems due to their nature, here are some of the problems we already had (which is not a complete list):

  • Reusing something from an appdef, that is, reusing something which requires some fields to be present, is not really possible. We typically have to copy everything by hand, which is error-prone, slow and hard to maintain.
  • We cannot give the notion of something as a part of, e.g., an instrument, as we cannot really build an NXdetector without an application definition (I know we can do it when we collect the specific "anonymous" group from the appdef - but how would we write this in a file?). However, I think we need such a feature to be able to properly write parts of an instrument, e.g., an instrument just collecting an NXdetector part.
  • For people new to nexus the different required/optional defaults are often confusing.
  • We cannot really interchange something more general with something more specific if we have additional information at runtime. Like inserting a more specific NXccd for an NXdetector.

I think the last point is a crucial one for the requirements you mention. I think we agree that we have to require some fields, but not too much to not be too restrictive. With a taxonomy we can build a very general top-level class where we don't require much and then add more and more specifics into the sub-classes. This is the point where a general glossary might be useful, because we might want to add only one specific field to the parent class, but we don't have everything available for a specific sub-class. We still want to write this field properly defined and show that its part of this taxonomy.

We could also keep the current base classes entirely but simply add this taxonomy features we are currently thinking about as an additional layer in between appdefs and base classes. In this case the base classes would give us the vocabulary for a taxonomy and an appdef would basically just be a specific class in the taxonomy which we attach the notion of being a standardised data collection. This would also allow to be compatible with the current way things are in nexus and we could seamlessly add taxonomies where we need it.

@rettigl
Copy link
Collaborator

rettigl commented Sep 18, 2023

  • Reusing something from an appdef, that is, reusing something which requires some fields to be present, is not really possible. We typically have to copy everything by hand, which is error-prone, slow and hard to maintain.

It is possible by appdef inheritance, but I agree it's probably not elegant, if it only applies to a small part of the tree

  • We cannot give the notion of something as a part of, e.g., an instrument, as we cannot really build an NXdetector without an application definition (I know we can do it when we collect the specific "anonymous" group from the appdef - but how would we write this in a file?). However, I think we need such a feature to be able to properly write parts of an instrument, e.g., an instrument just collecting an NXdetector part.

Well, I think we do that e.g. with NXelectronanalyzer, which has NXdetectioncolumn, NXenergydispersion, etc. as parts, no? That's all defined in the base classes, it would be sufficient to even only cite NXintrument.

  • For people new to nexus the different required/optional defaults are often confusing.

One could just write optionality everywhere explicitly, that can easily be done with the yaml2nxdl

  • We cannot really interchange something more general with something more specific if we have additional information at runtime. Like inserting a more specific NXccd for an NXdetector.

Okay, I think I understand your point now. You want to cite NXbeam, and have every subclass require a few fields to specify as e.g. NXbeam_poalized, and only if these fields are present, you are allowed to replace NXbeam by NXbeam_polarized, right? In principle, that is not a bad idea, as it would allow to build very generic applications, that could work based on the existence of these specialized versions of the generic base classes. Still, we need to make sure to not restrict the file format too much, and not require unnecessary fields in the "base" version of our taxonomy.

@domna
Copy link
Collaborator Author

domna commented Sep 18, 2023

  • Reusing something from an appdef, that is, reusing something which requires some fields to be present, is not really possible. We typically have to copy everything by hand, which is error-prone, slow and hard to maintain.

It is possible by appdef inheritance, but I agree it's probably not elegant, if it only applies to a small part of the tree

It's also relevant between different methods, i.e., most experiments have some form of detector or sensor without having an inheritance between themselves (even though we have to ask whether different detectors in different fields have some commonality or should remain in their specific domain - but there are other examples, e.g., NXsample).

  • We cannot give the notion of something as a part of, e.g., an instrument, as we cannot really build an NXdetector without an application definition (I know we can do it when we collect the specific "anonymous" group from the appdef - but how would we write this in a file?). However, I think we need such a feature to be able to properly write parts of an instrument, e.g., an instrument just collecting an NXdetector part.

Well, I think we do that e.g. with NXelectronanalyzer, which has NXdetectioncolumn, NXenergydispersion, etc. as parts, no? That's all defined in the base classes, it would be sufficient to even only cite NXintrument.

I mean it slightly different here. We have this different classes, yes. However, one could not just build an NXenergydispersion without getting the requirements from the appdef. With what I have in mind we would have a base class which automatically requires the fields to be a proper NXenergydispersion and we would just have to state in the appdef that we want one of such NXenergydispersions. The benefit is that an application could just write an NXenergydispersion which is in compliance with NXmpes without ever needing to know about NXmpes (I know we could just require additional fields in the base class. This is admittedly still a problem here, because we always need to build the union between a class or one of it's possible subclasses and the required fields in the appdef). Currently, we can only do it by using the NXmpes/NXinstrument/NXenergydispersion class somewhere.

  • For people new to nexus the different required/optional defaults are often confusing.

One could just write optionality everywhere explicitly, that can easily be done with the yaml2nxdl

That's true. I also have to admit that part of the confusion is also about how the web view of our definitions is structured and that everything is nested and one has to know that there is relevant information somewhere else. So this might rather be a point of documentation.

  • We cannot really interchange something more general with something more specific if we have additional information at runtime. Like inserting a more specific NXccd for an NXdetector.

Okay, I think I understand your point now. You want to cite NXbeam, and have every subclass require a few fields to specify as e.g. NXbeam_poalized, and only if these fields are present, you are allowed to replace NXbeam by NXbeam_polarized, right? In principle, that is not a bad idea, as it would allow to build very generic applications, that could work based on the existence of these specialized versions of the generic base classes. Still, we need to make sure to not restrict the file format too much, and not require unnecessary fields in the "base" version of our taxonomy.

Exactly. Yes, we have to be careful to not require to much if we are far up in the tree. But I think when we carefully design this, it can be extremely helpful. I also would like this glossary+taxonomy approach as this would allow us to test and change stuff without breaking any of our current definitions. Because I think we need to do a few iterations to get a good solution which balances requiredness and optionality.

There is also an current example in NXpeem which could benefit from this: there are certain fields only required/set for the dispersive mode of the NXenergydispersion. With inheritance there this is very easily solvable. Currently, we just keep things optional and state the situation in the docstrings (there are also other examples where we did it in this fashion already).

@sherjeelshabih
Copy link
Collaborator

The benefit is that an application could just write an NXenergydispersion which is in compliance with NXmpes without ever needing to know about NXmpes

I very much want this. It will also help us with comparing/searching across data. We know that no matter where we find this entity, it's supposed to be read and written in the same way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants