Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplemented the export machinery #84

Merged
merged 87 commits into from
Apr 24, 2024
Merged

Reimplemented the export machinery #84

merged 87 commits into from
Apr 24, 2024

Conversation

jmfernandez
Copy link
Member

This extensive work has involved both the support of real deposition sites like Zenodo or B2SHARE, and the redesign and reimplementation of the APIs.

Also, bumped to version 0.99.0

… SPDX.

In this way, other licences short names can now be recognized.
… are included as payloads in the RO-Crates to be generated from an export action.
…ment variables (or similar parameterization)
* Added priority.
* Added stateful fetcher description.
…port plugins system.

* Now the priorities are properly reported.
* Export plugin initialisation has been moved.
With this, adding new fetchers should be easier.

It also opens the door to future customizations, where it is possible to tell custom module names from the local configuration file, which are implementing new fetchers.
These features are needed in order to ease the development of new fetchers and export plugins.
… order to run the tests in their dependency order
…ration for Zenodo and B2SHARE export plugins
…n problem of common (or abstracted) metadata to be passed to both plugins.
…) to pytest

These parameters enable the tests which involve remote services. They are used to pass the configuration files containing the connection tokens used to test these systems.
… export.

Also, add the needed code to propagate it on export.
Now it is properly defined where and when ORCIDs are resolved, in order
to avoid unneeded resolutions when no internet access is available.

RO-Crate generation was already resolving ORCIDs, and the used machinery is now also used by WF when export is being performed.
… abstracted.

Still, WorkflowRunROCrate has some licence resolution work, inherited from unresolved licence uris associated to materialized contents.
As Dataverse distinguishes between supported and custom licences, it
uses its own terminology, and it only wants a single licence, some
additional code was needed to circumvent these limitations.
B2SHARE limits the number of provided licences to one,
and it does not have an alternate mechanism for custom licences.
So some fallback code has been added to partially circumvent these
limitations (although not in the most correct way).
Zenodo is able to represent more than one licence, BUT its native
REST API limits the number of provided licences to one,
and it does not have an alternate mechanism for custom licences.
So only the first licence is preserved.

Upstream has been notified about the limitation, and the potential loss
of associated licences if a metadata update occurs on entries with more
than one associated licence.
…ble.

Both title and description now accept a limited number of placeholders, which are allowing to minimally customize what appears in the published entry.
crypt4gh 1.6 has a side effect issue on key generation, because it changes the umask of the process without restoring it later. Added code to restore the umask after the call.
…hods are only required by exporters which support drafted uploads.
When a dataset is exported to the input cache with a synthetic URI, the caching code was not registering the destination URI, but the intermediate one used to inject. So, the synthethic URI is never properly registered, leading to a later cache miss.

So now, cache code has a look at the returned metadata, and trusts the returned URI instead of the used for the request. This is needed for other cases like the ones where the URI encodes credentials, and the fetch plugin removes them from the URI which is embedded in the returned metadata.
Contexted export plugins now use WfExSBackend instances instead of WF ones. This eases building of test cases.

Also, the development of these test cases already helped to catch several corner cases both in the initialization code and in the cache management system.
This method was not honoring already existing shares, as it was creating a new one each time.
Also, custom `push` method has been removed in favour of the ancestor implementation.

The test case to check this last has helped to uncover a misbehaviour in `push` when two remote files with the same base name are in different directories.
As types-defusedxml is not available for python 3.7, it was not possible to validate type correctness with this python version
@jmfernandez jmfernandez merged commit 35a72b9 into main Apr 24, 2024
6 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant