Capture affiliation ID data for all parsers when available #104

seasidesparrow · 2024-05-03T18:43:14Z

Is your feature request related to a problem? Please describe.
Currently, parsers capture affiliation data in text format, and these are added to "affPubRaw" in the ingest data model Affil object. However, affiliation data may also provided as an affiliation identifier in various systems, e.g. ROR, ISNI or GRID, either with or in place of text data. As an example, crossref XML includes the tag <institution_id type="TYPE"> as a possible return field. (See https://www.crossref.org/documentation/schema-library/markup-guide-metadata-segments/affiliations/). The ADS Ingest_Data_Model Affil object already has space for affPubID and affPubIDType, but they are not implemented in base.py or any other parsers yet.

Describe the solution you'd like
We should add logic to each of the content parsers that can detect and properly field insitution identifiers, and store them in the ingest_data_model.affils.affPubID and affPubIDType fields for each contributor that has them.

Additional context
As an example, the input test file jats_springer_EPJC_s10052-023-11699-1.xml has <institution_id> tags for both GRID and ISNI:

[...]
                                <aff id="Aff154">
                                        <label>154</label>
                                        <institution-wrap>
                                                <institution-id institution-id-type="GRID">grid.470046.1</institution-id>
                                                <institution-id institution-id-type="ISNI">0000 0004 0452 0652</institution-id>
                                                <institution content-type="org-name">CPPM, Aix-Marseille Université, CNRS/IN2P3</institution>
                                        </institution-wrap>
                                        <addr-line content-type="city">Marseille</addr-line>                            
                                        <country country="FR">France</country>
                                </aff>
[...]

In this particular example, we see two identifiers, GRID and ISNI. Currently, the ingest_data_model is expecting a single value here; we might consider updating the data model to support a list of id-type objects, or merge multiple values into a single string via a join statement.

The text was updated successfully, but these errors were encountered:

seasidesparrow added the enhancement New feature or request label May 3, 2024

seasidesparrow self-assigned this May 3, 2024

This was referenced May 3, 2024

Xref affil.20240312 #97

Closed

Improves affiliation capture from Crossref #105

Merged

seasidesparrow mentioned this issue Jul 22, 2024

Squashed commit of the following: #117

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Capture affiliation ID data for all parsers when available #104

Capture affiliation ID data for all parsers when available #104

seasidesparrow commented May 3, 2024

Capture affiliation ID data for all parsers when available #104

Capture affiliation ID data for all parsers when available #104

Comments

seasidesparrow commented May 3, 2024