Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

serializers: add datapackage serializer #1742

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

roll
Copy link

@roll roll commented May 6, 2024

❤️ Thank you for your contribution! (❤️ Thanks for working on the project!)

Description

Data Package is a European Commission-funded project (via NLnet) that is a standard containing a set of lightweight specifications for describing datasets and individual data files. The standard is widely used for data interoperability in other data portal systems like CKAN, Dryad, Our World in Data, and others. The main benefit of adoption the Data Package Standard is a rich set of software for reading and validating datasets - https://datapackage.org/standard/software/ (the most advanced implementations are in Python and R, as well as a Desktop application).

By creating this pull request, we would like to bootstrap a discussion if Invenio RDM (and Zenodo) would consider accepting the datapackage.json format to be one of the export targets for datasets' metadata (we also prepared a PR for invenio-app-rdm in case this PR is accepted). Also, tagging our friends from the University of California @sdiggs and @sapetti9 from Open Knowledge Foundation.

Thanks a lot in advance!

Checklist

Ticks in all boxes and 🟢 on all GitHub actions status checks are required to merge:

Frontend

Reminder

By using GitHub, you have already agreed to the GitHub’s Terms of Service including that:

  1. You license your contribution under the same terms as the current repository’s license.
  2. You agree that you have the right to license your contribution under the current repository’s license.

Copy link
Contributor

github-actions bot commented Jul 6, 2024

This PR was automatically marked as stale.

@github-actions github-actions bot added the stale label Jul 6, 2024
@slint
Copy link
Member

slint commented Sep 20, 2024

Hi @roll, thank you for opening this PR. The only thing missing to integrate this is hooking it up to our REST API responses to be able to retrieve a record in the Data package serialization format via content negotiation.

Do you know what MIMEType would be appropriate for data packages? E.g. for RO-Crate we're using application/ld+json;profile="https://w3id.org/ro/crate/1.1", so would something like application/ld+json;profile="https://datapackage.org/profiles/2.0/datapackage.json" make sense?

If that's fine I can push a commit on top of this PR to finalize this and merge.

@slint slint removed the stale label Sep 20, 2024
@roll
Copy link
Author

roll commented Sep 21, 2024

Thanks @slint!

Data Package is closer to the DataCite format but it hasn't yet gotten its own MIME-type like application/vnd.datacite.datacite+json so I think just plain application/json would work the best (similarly to the native platform's JSON format).

Please feel free to update the PR and let me know you need help with anything else 🤝

@roll
Copy link
Author

roll commented Oct 9, 2024

Hi @slint, do you need any help?

Regarding the exporter I think I didn't get it from the initial reply but I have the PR ready for invenio_add_rdm as well (if you meant this one):

diff --git a/invenio_app_rdm/config.py b/invenio_app_rdm/config.py
index af244486..b354e441 100644
--- a/invenio_app_rdm/config.py
+++ b/invenio_app_rdm/config.py
@@ -751,6 +751,16 @@ APP_RDM_RECORD_EXPORTERS = {
         "content-type": "application/vnd.datacite.datacite+xml",
         "filename": "{id}.xml",
     },
+    # TODO: it requires a `invenio-rdm-recors` version update
+    "datapackage.json": {
+        "name": _("Data Package"),
+        "serializer": (
+            "invenio_rdm_records.resources.serializers:DataPackageSerializer"
+        ),
+        "params": {"options": {"indent": 2, "sort_keys": True}},
+        "content-type": "application/json",
+        "filename": "{id}.datapackage.json",
+    },
     "dublincore": {
         "name": _("Dublin Core XML"),
         "serializer": (

cc @sapetti9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants