-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The PI_NAME field is current free-text and unconstrained #6
Comments
We should probably provide guidance to fill the PI_NAME variable:
Should Argo manage a vocabulary of the PI_NAME ? probably not We could add an additional variable : PI_NAME_ID |
I agree with the guidance on how to fill the PI_NAME variable. This is simple and easily readable. Would the PI_NAME_ID also be 'unrestricted' in the sense that any ID could be used? Is this more useful because each person should have only one ID rather than multiple spellings of a name? In that case, we should probably discourage PIs from changing the ID over time. I imagine we would not manage a vocab of PI_NAME_ID if we don't manage a vocab of PI_NAME. I'm just wondering about the usefulness of adding yet another uncontrolled PI variable, even if it is supposed to be unique. |
The PI_NAME_ID could be restricted to accepted IDs.
|
I agree guidance on how to fill PI_NAME is good. This can be provided quite easily under the "Comment" column in 2.2.4, 2.3.4, 2.4.4, etc, in the Argo Users Manual. I'm not convinced that a new variable, such as PI_NAME_ID, is needed. I would like to see a stronger argument about why it's needed, aside from the general need for a controlled vocabulary. Without knowing why it's needed specifically, it's difficult to design a new variable. Also, the variable name "PI_NAME_ID" does not make sense, since it implies a duplication of information (name + id). If such a new variable is genuinely needed, I would advocate for the use of ORCID, which is already being used to uniquely identify delayed-mode operators in the global attributes of the D and BD files. I would also suggest the variable name "PI_ORCID", and that it be a new variable in the meta files only. This does not need to be in every profile file. |
In the review of this metadata field that my colleague @lucarduini did this summer (e.g. PI_NAME_2021.xlsx with existing PI_NAME on the GDAC, and possible new value to 1. harmonize different existing spelling for the same PI_NAME and 2. harmonize lower/upper case), we choose the guidance to be FirstName LASTNAME, but FirstName LastName is equally fine. If we do not manage a vocabulary of the PI names, how can we prevent new entries to be populated not following the guidance and matching the corresponding existing PI_NAME value for a specific PI? Could the presence of a PI_(ORC)ID in the meta file automatically update the corresponding PI_NAME value? Other points I see are:
|
I don't have a strong feeling between FirstName LASTNAME vs. FirstName LastName. It is tricky to prevent new entries of PI_NAME that do not follow this guidance if we do not manage a list. Would this be improved at all with using an ORCID? Would we ask the FileChecker to somehow check that the URL is resolvable? Or would we manage a list of acceptable ORCIDs? I agree to the suggestion of PI_ORCID. I'm not sure I understand what Romain means by 'the presence of PI_ORCID automatically updating the PI_NAME value'. Does that mean that PI_NAME entry would be replaced with the PI_ORCID text? Would this be done at the DAC level? As to Romain's first bullet point, I agree that it could be useful to track floats from a certain institution and sometimes the PI_NAME changes over time. Is there another place in the files where the institution is recorded? I think it is important to keep the various PI_NAMEs even if they refer to floats from the same group/institution. Perhaps it would be good to discourage the use of accents and symbols as they are not always handled well. |
The point of this discussion is to explore how to have a controlled vocabulary for the variable PI_NAME, so that the person can be identified uniquely. @tcarval did not think we should manage a vocabulary for PI_NAME, and so suggested a new parallel ID variable. @RomainCancouet suggested that we should, in fact, manage a controlled vocabulary for PI_NAME. In my opinion, PI_NAME is an existing variable, and should never be replaced. Therefore we should manage a controlled vocabulary for it to make sure it is useful. I suggest a new reference table for PI_NAME. Its entries will be char strings of the form "Firstname LASTNAME". Multiple entries for the same person is not allowed, thereby ensuring uniqueness. Multiple unique PIs can be concatenated into one char string separated by comma, e.g. "Firstname1 LASTNAME1, Firstname2 LASTNAME2, ...". In that case, a new parallel ID variable, such as PI_ORCID, is not needed. If ORCID is somehow necessary, then it can be an extra column in the PI_NAME reference table. But we still won't have to create a new parallel ID variable. It is undesirable to create new variables unnecessarily, especially when an existing variable can do the job. |
Dear All, a couple of practical points regarding this issue. If you decide that creating a new vocab for the Argo variable PI_NAME is the most efficient way forward, then do you have a feel for how homonyms will be managed? Also bear in mind that ORCIDs have unique URIs. These can either be referenced in a field in the data files or linked to the PI_NAMES in the new vocabulary. So the relationship between the two could be managed as mappings. |
Gwen, I did not envisage a reference table for individual names, having in mind the GDPR sensibility on personal data.
In case of homonym, well... let's hope we won't have as the table would contain hundreds not thousands of entries |
After a meeting with Vi and reading the discussion on this ticket, we have decided a controlled vocabulary (ideally with a mapping to ORCID) is the best solution. GDPR
Format Many thanks, |
@roswri I agree that we need to control the entries in the PI_NAME variable. This variable currently has a length limit of 64 characters, so there is sufficient space for all the names. Regarding the PI_NAME xlsx - thanks for drafting it. However, I don't understand the purpose of the 3 columns. Could you explain them please? Is 'Identifier' what you propose we use to fill PI_NAME? And why is there a 'Preferred label' and a 'Alternative label'? If the difference is only in setting the last name to upper case or lower case, then we should simply decide on one, and not leave both in the xlsx. Lastly, some PI names are missing in the xlsx. Thanks, Annie |
Hi @apswong , The columns in the excel file relate to the standard columns on the NVS (e.g. https://vocab.nerc.ac.uk/search_nvs/R04/).
Thanks for letting me know that some PI names are missing, I will try to get a more complete list! Could you give me an example of a PI that is missing from the list so I can check something? Thanks, |
Hello @roswri. Examples of some PIs who are missing from the xlsx are 'KENNETH JOHNSON' and 'STEVEN JAYNE'. They are in PI_NAME with multiple PIs concatenated into one char string, separated by commas. |
I reckon the issue might come from the version of the file provided following Luca's analysis. Only one name was kept for multiple PIs, whereas indeed it is better to keep the different PI names, separated by commas. |
Thanks @apswong and @RomainCancouet! I have updated the list with the missing names: PI_NAME_2021_NVS_v2.xlsx
For the purposes of the controlled vocabulary I think each name should be a separate entry, but the guidance should state that multiple terms can be given if separated by commas. Many thanks, |
Hi @roswri.
I agree each name should be a separate entry, and that multiple names can be used to fill PI_NAME if separated by commas. |
Hi @apswong, Thanks for clearing that up, I should have worked out they were the same person since they have the same email address! I will add Pelle E Robbins and David Nicholson to the list. Many thanks, |
Hello @roswri Once the vocab will in place, do you think we can update the content of the meta files on the GDAC with the updated values for PI names? |
Hi @RomainCancouet, I intend to to check for any additional missing PI names in the recent GDAC metadata files, I'm having some technical difficulties at the moment, but I will do that when I can and add any additional PI names to the list. Regarding your question about updating the content of the meta files on the GDAC with the values from the new controlled vocabulary, do you mean from the individual DAC perspective or if it's something the GDAC can do for all meta files across the board? I'm not sure how the process for updating the meta files works, but I imagine as long as there's a mapping between the current terms in use and the new controlled vocabulary terms the files can be updated with the new terms. The next thing to make a decision on is the table metadata e.g. I'm wondering if it should be PI specific, or if there's potential for the table to be used for other fields as well like FLOAT_OWNER, in which case we may want to re-think the table metadata to make it more generic. Any thoughts on this? Thanks, |
Hi @roswri , OK, thanks for the future check of missing PI names. Yes my question regarding the meta files was: are we willing and DACs able to update the content of the meta files once some entries (PI_NAME, etc.) will be constrained and new values suggested? I acknowledge that will require efforts, and maybe this could be done only once other metadata fields have been revisited by this task team? e..g #5, #2, etc. For your question about FLOAT_OWNER, as it is presently populated with a mixture of people, institution or DAC names, I do not know. |
@tcarval @nvs-vocabs/oceanops @RomainCancouet Who should be granted editing permissions for this vocabulary for PI_NAME? I can add extra people if anyone else decides they want or need editing permissions later on. Thanks, |
Hi @tcarval and @nvs-vocabs/oceanops , It seems we were able to reach a decision on the specifics of the PI_NAME NVS table at ADMT, which is great. Many thanks, |
Hello Roseanna - @roswri , I agree to be editor of this new collection |
Thanks to BODC the (constrained) list of available PI names is available in the NVS (https://vocab.nerc.ac.uk/search_nvs/R40/). It is now a matter of updating the list if entries are missing (people could contact the Vocabs Editors as described in ADMT webpage), and use this table with the FileChecker (nvs-vocabs/ArgoVocabs_Meetings#1) to constrain the allowed entries in netCDF meta files. |
Thanks for all for working on PI_NAME. I would like to add that when a float has multiple PIs, the unique entries in R40 can be concatenated into one character string to fill the PI_NAME variable in the various Argo data files. The current practice is to use commas to separate the multiple unique names. This should be explained clearly in the Users Manual and with the GDAC file checker. |
From ADMT-24, there is an agreement on having a controled list of PIs. |
Hi Thierry,
Aren't these same PI names going to appear in open access data in the same
form anyway? If so, the same privacy concerns surely apply to the dataset
entries as well as a vocab entry.
Matt
…On Thu, 26 Oct 2023, 07:14 tcarval, ***@***.***> wrote:
From ADMT-24, there is an agreement on having a controled list of PIs.
There are privacy concerns to expose a PI_NAME in a public list on
Internet.
Can NVS restrict the access to this list ? Should the list be managed
outside the NVS ?
—
Reply to this email directly, view it on GitHub
<#6 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGNGYH5X4FLSOBJDLZK7523YBH5UJAVCNFSM4UAXLSW2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCNZYGA2DONRUGM4Q>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
@tcarval we cannot hide the content of a collection published on the NVS. Have you considered the use of ORCIDS for the PIs? You could capture people's ORCIDS instead of their name if privacy is required. |
@matdon17 , Hello Matt |
Yes, ORCID was proposed, but did not receive strong support. In a next step, we may manage the link to DOI from R40 |
@matdon17 has a point, in that once information is in an open access data file, there is nothing stopping any tool from extracting the information, collating it and re-exposing it under a category. For example, OceanOps has a list of people's names under its 'Contacts' filter, and the Euro Argo fleet monitoring tool has a 'PI_NAME' tick box that can be selected in its search results filter so that names are displayed next to the corresponding float numbers. I thus think that removing or replacing people's names from the NVS R40 collection will only partly address the underlying concern - though it would also be the quickest way to start. |
Issue resolved post ADMT-24. PI_NAME collection is live: https://vocab.nerc.ac.uk/search_nvs/R40/ or https://vocab.nerc.ac.uk/collection/R40/current/ |
@tcarval Now that PI_NAME entries are officially a controlled vocabulary in R40, we should update the "Comment" column in the Users Manual for PI_NAME with this new information, and explain that multiple names can be concatenated, separated by commas, to fill PI_NAME. |
I updated the "comment" column for PI_NAME |
@tcarval Thank you, Thierry. We also need to update the PI_NAME comment column in Sections 2.3.3 and 2.4.4. |
Oups, yes, I just did it, noted in user manual history section |
PI_NAME is a free-text field with no links to an external resource. and as a result may not clearly identify a person uniquely.
PI_NAME has all sorts of variant ways of being populated, including cases, initials, honorifics, etc. Examples include:
Virginie THIERRY
B. Klein
Dr. Birgit Klein
BRECK OWENS, STEVEN JAYNE, P.E. ROBBINS
Pierre-Marie Poulain
M Ravichandran
DEAN ROEMMICH
GREGORY C. JOHNSON
We have options e.g. ORCiD which we could reference: http://www.orcid.org/
The text was updated successfully, but these errors were encountered: