This repository contains code to help manage schemas and workflows for managing data for the MWET project. There are several workflows supported:
-
Maintian an publish a list metadata fields that are collected for various techniques. For this, the LinkML and its expressive schema definition format is used. In this repository, we manage the LinkML schema.
-
Using LinkML, produce JSON Schema files for the LinkML Schemas. These published JSON Schemas will be maintained in GitHub and re-built whenever the accompanying LinkML schema changes in this repostiory.
-
The MWET repsotory uses SciCat to manage datasets. SciCat has a very flexible "Scientific Metadata" dictionary for each collected dataset. JSON Schemas can be used by the ingestion code to validate incoming Scientific Metadata on ingest.
LinkML if you wish to work directly with these files. The Quick Install Guide page gives more detailed information.
- Create a new conda environment called "linkml"
conda create -n linkml python=3.10
conda activate linkml
- Install packages as specified in
pyproject.toml
python -m pip install -e .
- If you are interested in ingesting example data to the SciCat database
python -m pip install -e ".[ingest]"
This command will install ingesting required packages, as specified in the pyproject.toml
file.
Schemasheets is part of the LinkML toolset that allows a LinkML data description to be converted to a spreadsheet and vice versa. See install guide here.
Generation of spreadsheet through linkml schemasheets
linkml2schemasheets-template -i src/nmr_schema/nmr_schema.yaml -o nmr_concise.tsv -s concise
or
linkml2schemasheets-template -i src/nmr_schema/nmr_schema.yaml -o nmr_exhaustive.tsv -s exhaustive
[exhaustive|concise] represents report style.
Generating a corresponding JSON-Schema definition (nmr.schema.json
) for the NMR datasets:
gen-json-schema --closed src/nmr_schema/nmr_schema.yaml >nmr.schema.json
Check ("valdate") that example csv metadata (src/example_data/example-nmr-metadata.csv
) complies with the SFX metadata defintion (nmr_schema.yaml
):
linkml-validate -s src/nmr_schema/nmr_schema.yaml src/example_data/example-nmr-metadata.csv
Specify the input csv file an output json file.
linkml-convert -s src/nmr_schema/nmr_schema.yaml -o src/nmr_schema/metadata.json --index-slot datasets src/example_data/example-nmr-metadata.csv
For more information on converting between different representations, visit this linkmk documentation.
An example metadata file named example-nmr-metadata.csv
is provided, as well as an example data file named example-nmr-metadata.csv
for ingesting to SciCat database. This data was acquired at University of California, Santa Barbara by Leo Gordon and Raphaële Clément.
To ingest the nmr data and metadata to local SciCat database:
- install packages as described in step 0:
python -m pip install -e ".[ingest]"
-
create an
.env
file at the linkml_nmr folder following the pattern in.env.example
file -
run the following command in terminal to to ingest the example data
cd src/nmr_schema
python ingest_nmr.py
- schema for 2d NMR data
- github action that generate the json schema and puts it in a public location -> github page
If you are developing this library, there are a few things to note.
- Install development dependencies:
python -m pip install ".[dev]"
- Install pre-commit This step will setup the pre-commit package. After this, commits will get run against flake8, black, isort.
pre-commit install
- (Optional) If you want to check what pre-commit would do before commiting, you can run:
pre-commit run --all-files