Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

idutils: Restructure module to be configurable and readable #104

Merged

Conversation

sakshamarora1
Copy link
Contributor

Closes: #103

Copy link
Contributor

@ntarocco ntarocco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extension points are not yet implemented, right? Otherwise, can you point where?

idutils/__init__.py Show resolved Hide resolved
idutils/utils.py Show resolved Hide resolved
Copy link
Contributor

@ntarocco ntarocco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, we want to have a non-configurable core, and add extension points to implement your own IDs checks.
For example, keep PID_SCHEME as they are (the core ones), and also add a current_app.config["IDUTILS_CUSTOME_PID_SCHEMES"] to that we can implement our own. Similar way for the other config vars.

idutils/__init__.py Show resolved Hide resolved
idutils/__init__.py Outdated Show resolved Hide resolved
idutils/ext.py Outdated Show resolved Hide resolved
@sakshamarora1
Copy link
Contributor Author

sakshamarora1 commented Oct 2, 2024

The config on the overlay would look like this:

IDUTILS_CUSTOM_PID_SCHEMES = [
    ("test", lambda x: True),
]

from invenio_rdm_records.config import RDM_RECORDS_IDENTIFIERS_SCHEMES, RDM_RECORDS_PERSONORG_SCHEMES
RDM_RECORDS_IDENTIFIERS_SCHEMES = {
    **RDM_RECORDS_IDENTIFIERS_SCHEMES,
    "test": {"label": _("Test"), "validator": lambda x: True, "datacite": "Test"},
}

RDM_RECORDS_PERSONORG_SCHEMES = {
    **RDM_RECORDS_PERSONORG_SCHEMES,
    "test": {"label": _("Test"), "validator": lambda x: True, "datacite": "Test"},
}

@sakshamarora1
Copy link
Contributor Author

Affected modules that would need a minor release:
(bottom to top)

cds-rdm
invenio-app-rdm
invenio-rdm-records
invenio-banners
invenio-communities
invenio-administration
invenio-drafts-resources
invenio-github
invenio-jobs
invenio-pages
invenio-requests
invenio-users-resources
invenio-vocabularies
invenio-records-resources
commonmeta-py
datacite
marshmallow-utils

idutils/config.py Outdated Show resolved Hide resolved
tests/test_idutils.py Outdated Show resolved Hide resolved
idutils/config.py Outdated Show resolved Hide resolved
idutils/config.py Outdated Show resolved Hide resolved
idutils/config.py Outdated Show resolved Hide resolved
idutils/config.py Outdated Show resolved Hide resolved
.. note:: Some schemes like PMID are very generic.
"""
schemes = []
pid_schemes = PID_SCHEMES + current_app.config["IDUTILS_CUSTOM_PID_SCHEMES"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use entrypoints here?

idutils/ext.py Outdated
def init_idutils_registry(self):
"""Initialize identifier registries."""
# For registering detect_identifier_schemes, etc. functions
self.identifier_detectors_registry = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks ok to me, but how is it used?

Copy link

@anikachurilova anikachurilova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed with @jrcastro2 , looks good to us 🚀
One question: is it possible to override it with the entrypoints, or only extend?

Copy link
Member

@zzacharo zzacharo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the new entrypoints, is there a place in the module that we can document (apart from potentially the docs website) the available entrypoints? See here for example.

idutils/ext.py Outdated
"""Initialize identifier registries."""
# For registering is_* checks (functions)
self.identifier_checks_registry = {}
self.register_entry_point(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: @sakshamarora1 was this a convention you followed? If not, what about:

Suggested change
self.register_entry_point(
self._load_entry_point(


import isbnlib

doi_regexp = re.compile(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we maybe move the relevant regex patterns to a separate file e.g. regex.py? We can leave here actual util functions otherwise this will continue becoming a catch-all place....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes 👍
More options to consider: patterns.py or rules.py?

setup.cfg Outdated
idutils = idutils.ext:finalize_app
invenio_base.api_finalize_app =
idutils = idutils.ext:api_finalize_app
# Custom entrypoints for extending functionality
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be removed?

@pytest.fixture(scope="session")
def create_app():
"""Application factory fixture."""
return create_api
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we miss some tests about the new entrypoint functionalities. Basically, we should test the different entrypoints registration and showcase examples of callable functions that are expected there.

idutils/ext.py Outdated
self.configs_registry = {}
self.register_entry_point(self.configs_registry, "idutils.configs")

def register_entry_point(self, registry, ep_name):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does it look for the end user when they will want to use one of registered functions? I imagine something like identifier_checks_registry.get('my_function')? What about if we expose a property so the end user would do idutils.[checks, normalizers,configs].{my_function}? I am not sure we need to expose proxies to the registries...What do you think?

from flask import current_app
from werkzeug.local import LocalProxy

current_idutils = LocalProxy(lambda: current_app.extensions["idutils"])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think renaming this to something that includes scheme and registry might be better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


import isbnlib

doi_regexp = re.compile(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes 👍
More options to consider: patterns.py or rules.py?

* Adds an entrypoint to register new custom custom schemes
* Ensures that the new schemes are not overriding existing ones
* Adds invenio-app as test dependency required by pytest-invenio

Co-authored-by: Saksham Arora <[email protected]>
@zzacharo zzacharo force-pushed the feature/configurable_idutils branch 3 times, most recently from 4dbe004 to 2a0efea Compare October 15, 2024 08:24
@zzacharo
Copy link
Member

#104 (comment)

That's good but maybe better to attack this when we work on the new major release?

"""Utility file containing ID validators."""


from six.moves.urllib.parse import urlparse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we still using six? should we keep it ?

@zzacharo zzacharo merged commit aded7b9 into inveniosoftware:master Oct 15, 2024
4 checks passed
@tmorrell
Copy link
Contributor

I'm supportive of making the module extensible....but the current implementation breaks usage of idutils outside of flask #105

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

make idutils schema list extensible
7 participants