-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metadata requirements #14
Comments
Looking at some real data, I'm wondering about "department" for affiliations. Department doesn't map neatly to many organizational structures that we see in articles. This is likely to lead to many errors in guessing at the department. (I also suspect that users will be frustrated as well, if there not able to figure out what is meant by "department".)
|
In the above, preferred citation is for the SDR resource? (e.g., it has a link to purl). Can the user replace it with there own citation, e.g., for the related resource? |
What is the preferred format / style for the related resource citation? APA? What appears in the paper? |
The preferred citation in the main record should be for the SDR resource. I think that it shouldn't be editable, only the one for the related resource/version of record, but Amy and I didn't discuss that explicitly. Amy, what is the preferred format for the citation? |
We see all sorts of affiliations in the Open Access dataset. How do we mark things like this: |
Of these, the metadata to extract from the document (or look up from DOI/Crossref/ROR/etc) if available is: Abstract and keywords should be extracted if provided as part of the document or DOI metadata. They do not need to be generated locally if not available from either source. If DOI is provided, also construct a preferred citation from the DOI-sourced metadata and include as relatedResource (i.e., link to version of record). (per meeting with @amyhodge, @vivnwong, and @RochelleLundy) |
@arcadiafalcone we often see multiple affiliations per author. Presumably we want to capture them all, right? Does anyone have examples of ROR and/or ORCIDs in a preprint? |
@jcoyne That is correct. And any duplicates from "rounding up" to the parent institution should be removed so that the institution name appears only once. |
Clarification on DOI handling after consultation with Amy: For the purpose of AI tool evaluation: success is extracting a DOI from the PDF that identifies a version of the deposited document. For the purpose of creating specific metadata for the PDF: success is extracting the title, author first and last names, abstract (if present in PDF/DOI), and keywords (if present in PDF/DOI). Either the PDF itself or DOI may be the source of any or all of this information. For the purpose of creating a Cocina record: success is integrating specific document information with default values and representing it in the Cocina schema. This includes the extracted DOI being mapped to For the purpose of a user interface: the DOI is presented to the user as a related resource with type "is version of." The user may change this type to "is version of record of". ETA: If a DOI is not extracted from the PDF, no related resource is created. |
Sorry, I missed this question earlier. I have no preference for the format of the citation. The preferred citation is a touch tricky. We currently do allow users to edit this in H2, including for OA articles. Many folks want all citations to go to a single DOI (eventually the version of record) and not to the open access version, so that their citation counts aren't getting diluted. This is not how the system is intended to work, i.e. if someone read the OA version they really should cite that version. It's a balance between what the users want and what we believe to be the "correct" thing to do from a library/schol comms perspective. |
I met with Amy yesterday and we determined that the easy deposit metadata should for the most part follow H2. The main record description should describe the SDR resource, with the SDR as publisher and deposit date as publication date. If a DOI is supplied for the version of record, that version is described as a related resource with the journal publication information (if applicable). The DOI is an attribute of the related resource, not the deposited file; per Amy, the deposited file will not have its own DOI.
User uploads file, does not provide DOI for version of record; metadata is extracted from document
Title
Authors
-first name
-last name
-ORCID
-affiliations - organization, department
-type = person (not editable by user)
-role = author (not editable by user)
Publication date = date of deposit (not editable by user)
Publisher = Stanford Digital Repository (not editable by user)
Form (following H2 mapping, not editable by user except possible exception below)
-H2 type = Text
-H2 subtype = Article (maybe option to add Preprint also - check with Amy)
-MODS resource type - text
-DataCite type - Text
-if Preprint, also include "grey literature" as AAT genre
Abstract
Keywords
Preferred citation (same model as H2)
Purl
User uploads file, provides DOI for version of record
Main record: same as above (may fall back on DOI as metadata source for user-editable fields)
Related resource (metadata derived from DOI), type = has version of record (need to add to cocina type list)
Preferred citation (check with Amy on format) - title, contributors, publisher/journal info, publication date
DOI
To be discussed further
-Whether this is the best way to represent the version of record - if it makes sense to capture structured metadata for related resource instead of citation, or alternatively to just have the title and link to DOI
-Including form space with additional optional fields for user to fill in, such as links to other related resources like datasets (not automatically derived information)
-ETA: look up ROR for organizational affiliation
The text was updated successfully, but these errors were encountered: