ENH: add schema for "projection" entry in start document #130

tacaswell · 2019-12-10T03:47:33Z

Provide a semantic mapping of the keys in the collected data to a known set of keys drawn from an externally owned vocabulary.

danielballan · 2020-01-07T14:22:35Z

This PR could use a description or a link to some meeting notes if there are any.

danielballan · 2020-02-07T19:35:33Z

On a Pilot call @dylanmcreynolds nudged us to move forward with this. I am personally happy with it but given the importance and the cost of any future changes I think we should have at least one more meeting to pick apart the structure and the names and consider alternatives.

stuartcampbell

I would like to see how this maps on to a more developed schema such as the NeXus SAS definition (https://manual.nexusformat.org/classes/applications/NXsas.html)

danielballan · 2020-02-11T19:02:22Z

event_model/schemas/run_start.json

+            "type": "object",
+            "properties" : {
+                "stream": {"type": "string"},
+                "location": {"enum" : ["event", "configuration"]},


What if it's in the stop document? Or elsewhere in the EventDescriptor, such as the source or shape? Would a dotted object representation be simpler and more comprehensive?

Being in the start document is a far more compelling argument.

I'm skeptical of embracing the dot-ness (as we have previously agreed ed that dot access on dicts is not great) so the code to munge that back to something we can actually use will be annoying, but on the other hand we can write the function once and stuff it it databroker).

danielballan · 2020-02-11T19:05:10Z

event_model/schemas/run_start.json

+            "required" : ["stream", "location", "field"],
+            "additionalProperties": false
+        },
+        "technique": {


The term "technique" might be too limiting. This provides a generic mechanism for mapping any externally-defined metadata schema to the contents of the documents to come. Those schemas might be broken up by experimental technique, by downstream analysis process (applicable to more than one technique such as "scattering" and "diffraction"), by institution, by domain, etc.

"data_remapping", "datamap", "DateMap", "Application", "application_definitions"?

I think we have a bunch of helper functions like list_applications(h), iter_applications(h, name=None) that yields dicts (?) full of {base types or xarrays}?

danielballan · 2020-02-11T19:08:28Z

👍 for @stuartcampbell's request to use this on a fully-worked example or three before we commit to it.

danielballan · 2020-03-11T20:55:14Z

@tacaswell on Slack:

move "techniques" out of start document and into [its] own document class

danielballan · 2020-04-22T01:37:14Z

I am in favor of proceeding on this but doing so in a separate experimental document type that can evolve quickly and make breaking changes as needed. I think @tacaswell suggested this in passing on Slack, as I tersely documented at the time, above.

danielballan · 2020-06-02T23:46:44Z

There was a call on this subject today.

Things there seemed to be strong consensus on:

There will be a list of dicts, with each dict describing how to project the content of the documents on to some externally-defined schema. (In some situations there may be multiple ways to map the documents' contents onto the same schema; the same schema name might appear more than once. That is why why this is a list and not a dict keyed on schema name.) At top level, the dicts will describe the schema itself: a name ("NXtomo"), a version, an external URL to a schema definition, and finally a dict embodying the layout (e.g. Nexus definition).
Within this layout, the "leaves" will be either pointers into a document (with the document name) or a literal value (with "literal" or something like that).
This list of dicts will be included in the document stream, either within the Run Start document itself or in a temporary experimental document type (see below).

Things that might need more thought or discussion to build strong consensus:

What to call the key: "techniques", "projections", or something else
How the top-level keys (the name, version, url, etc.) are spelled and which are required
How to structure and spell the pointers into documents
Whether to add the notion of "experimental document type" to event-model, bluesky, and databroker so as to prototype this idea there, or just add this to the Run Start document from the get-go.

Things that need investigation:

How to encode an XML attribute in JSON. @stuartcampbell reported that there are ~5 different standards for this. At some level we'd be happy to just pick one, but it's worth doing some due diligence regarding whether any of them have clear technical advantages or seem to have higher adoption than the others.

prjemian · 2020-06-03T14:04:04Z

First off, can someone please edit the top box here and describe clearly the intent of this PR, as previously requested? Without this focus stated, the discussion is not focused.

dylanmcreynolds · 2020-06-03T18:21:20Z

I have heard taht NeXus has deprecated XML backends. This leads to the question of how much XML you plan to support transforming to? Namespacing is great, but there seems to be a huge debate on how or if to represent that in JSON. It seems like you could punt on some complexity if you don't intend to output XML.

prjemian · 2020-06-04T18:31:11Z

That's right. HDF5 is now the only supported on-disk format for NeXus data files. The decision to drop the XML backend was between 2012 and 2014-08. (Can't find the specific decision in the notes yet.)

ENH: add schema for technique entry in start document

5a5824e

ronpandolfi approved these changes Feb 7, 2020

View reviewed changes

stuartcampbell self-requested a review February 11, 2020 19:04

stuartcampbell requested changes Feb 11, 2020

View reviewed changes

danielballan reviewed Feb 11, 2020

View reviewed changes

tacaswell changed the title ~~ENH: add schema for technique entry in start document~~ ENH: add schema for "projection" entry in start document Jun 4, 2020

This was referenced Jul 3, 2020

Add projections schema to start doc #179

Merged

create NeXus file from specific dataset aps-8id-dys/ipython-8idiuser#176

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: add schema for "projection" entry in start document #130

ENH: add schema for "projection" entry in start document #130

tacaswell commented Dec 10, 2019 •

edited

Loading

danielballan commented Jan 7, 2020

danielballan commented Feb 7, 2020

stuartcampbell left a comment

danielballan Feb 11, 2020

tacaswell Apr 21, 2020

danielballan Feb 11, 2020

tacaswell Apr 22, 2020

danielballan commented Feb 11, 2020

danielballan commented Mar 11, 2020

danielballan commented Apr 22, 2020

danielballan commented Jun 2, 2020 •

edited

Loading

prjemian commented Jun 3, 2020

dylanmcreynolds commented Jun 3, 2020 •

edited

Loading

prjemian commented Jun 4, 2020

ENH: add schema for "projection" entry in start document #130

Are you sure you want to change the base?

ENH: add schema for "projection" entry in start document #130

Conversation

tacaswell commented Dec 10, 2019 • edited Loading

danielballan commented Jan 7, 2020

danielballan commented Feb 7, 2020

stuartcampbell left a comment

Choose a reason for hiding this comment

danielballan Feb 11, 2020

Choose a reason for hiding this comment

tacaswell Apr 21, 2020

Choose a reason for hiding this comment

danielballan Feb 11, 2020

Choose a reason for hiding this comment

tacaswell Apr 22, 2020

Choose a reason for hiding this comment

danielballan commented Feb 11, 2020

danielballan commented Mar 11, 2020

danielballan commented Apr 22, 2020

danielballan commented Jun 2, 2020 • edited Loading

prjemian commented Jun 3, 2020

dylanmcreynolds commented Jun 3, 2020 • edited Loading

prjemian commented Jun 4, 2020

tacaswell commented Dec 10, 2019 •

edited

Loading

danielballan commented Jun 2, 2020 •

edited

Loading

dylanmcreynolds commented Jun 3, 2020 •

edited

Loading