Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capture affiliation ID data for all parsers when available #104

Open
seasidesparrow opened this issue May 3, 2024 · 0 comments
Open

Capture affiliation ID data for all parsers when available #104

seasidesparrow opened this issue May 3, 2024 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@seasidesparrow
Copy link
Member

Is your feature request related to a problem? Please describe.
Currently, parsers capture affiliation data in text format, and these are added to "affPubRaw" in the ingest data model Affil object. However, affiliation data may also provided as an affiliation identifier in various systems, e.g. ROR, ISNI or GRID, either with or in place of text data. As an example, crossref XML includes the tag <institution_id type="TYPE"> as a possible return field. (See https://www.crossref.org/documentation/schema-library/markup-guide-metadata-segments/affiliations/). The ADS Ingest_Data_Model Affil object already has space for affPubID and affPubIDType, but they are not implemented in base.py or any other parsers yet.

Describe the solution you'd like
We should add logic to each of the content parsers that can detect and properly field insitution identifiers, and store them in the ingest_data_model.affils.affPubID and affPubIDType fields for each contributor that has them.

Additional context
As an example, the input test file jats_springer_EPJC_s10052-023-11699-1.xml has <institution_id> tags for both GRID and ISNI:

[...]
                                <aff id="Aff154">
                                        <label>154</label>
                                        <institution-wrap>
                                                <institution-id institution-id-type="GRID">grid.470046.1</institution-id>
                                                <institution-id institution-id-type="ISNI">0000 0004 0452 0652</institution-id>
                                                <institution content-type="org-name">CPPM, Aix-Marseille Université, CNRS/IN2P3</institution>
                                        </institution-wrap>
                                        <addr-line content-type="city">Marseille</addr-line>                            
                                        <country country="FR">France</country>
                                </aff>
[...]

In this particular example, we see two identifiers, GRID and ISNI. Currently, the ingest_data_model is expecting a single value here; we might consider updating the data model to support a list of id-type objects, or merge multiple values into a single string via a join statement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant