Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add package 2024_Barquera_ChichenItza #211

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

RodrigoBarquera
Copy link

@RodrigoBarquera RodrigoBarquera commented Sep 5, 2024

PR Checklist for a new package submission

  • The package does not exist already in the community archive, also not with a different name.
  • The package title in the POSEIDON.yml conforms to the general title structure suggested here: <Year>_<Last name of first author>_<Region, time period or special feature of the paper>, e.g. 2021_Zegarac_SoutheasternEurope, 2021_SeguinOrlando_BellBeaker or 2021_Kivisild_MedievalEstonia.
  • The package is stored in a directory that is named like the package title.

  • The package is complete and features the following elements:
    • Genotype data in binary PLINK format (not EIGENSTRAT format).
    • A POSEIDON.yml file with not just the file-referencing fields, but also the following meta-information fields present and filled: poseidonVersion, title, description, contributor, packageVersion, lastModified (see here for their definition)
    • A reasonably filled .janno file (for a list of available fields look here and here for more detailed documentation about them).
    • A .bib file with the necessary literature references for each sample in the .janno file.
  • Every file in the submission is correctly referenced in the POSEIDON.yml file and there are no additional, supplementary files in the submission that are not documented there.
  • Genotype data, .janno and .bib file are all named after the package title and only differ in the file extension.
  • The package version in the POSEIDON.yml file is 1.0.0.
  • The poseidonVersion of the package in the POSEIDON.yml file is set to the latest version of the Poseidon schema.
  • The POSEIDON.yml file contains the corresponding checksums for the fields genoFile, snpFile, indFile, jannoFile and bibFile.
  • There is either no CHANGELOG file or one with a single entry for version 1.0.0.

  • The Publication column in the .janno file is filled and the respective .bib file has complete entries for the listed mentioned keys.
  • The .janno file does not include any empty columns or columns only filled with n/a.
  • The order of columns in the .janno file adheres to the standard order as defined in the Poseidon schema here.
  • The .janno and the .ssf files are not fully quoted, so they only use single- or double quotes ("...", '...') to enclose text fields where it is strictly necessary (i.e. their entry includes a TAB).

  • The package passes a validation with trident validate --fullGeno.

  • Large genotype data files are properly tracked with Git LFS and not directly pushed to the repository. For an instruction on how to set up Git LFS please look here. If you accidentally pushed the files the wrong way you can fix it with git lfs migrate import --no-rewrite path/to/file.bed (see here).

@nevrome nevrome changed the title added new package named 2024_Barquera_ChichenItza Add package 2024_Barquera_ChichenItza Sep 6, 2024
@stschiff stschiff self-assigned this Sep 9, 2024
@stschiff
Copy link
Member

stschiff commented Oct 9, 2024

Hi @RodrigoBarquera, this is great. Super that you even entered the relationship columns, which I know is a lot of work.

Sorry for taking so long to give feedback, but I have some points:

  • We actually would like the Collection_ID column to reflect the ID from the actual collection. I see that you've used the column Alternative_IDs for that. I suggest that you simply rename the Alternative_IDs column to Collection_ID and remove the empty Collection_ID column.
  • You have only given date information for the few samples that you've C14-dated. But I'm sure you can also give dates for all samples that have no C14-date, right? We have contextual in the Date_Type for that, and it would be good to fill. We generally aspire to have at least contextual dates for every single sample, to facilitate meta-analyses through space and time. Note that with contextual dates, you should only fill columns Date_BC_AD_Start, Date_BC_AD_Median and Date_BC_AD_End, where the median can just be the mid-point of the interval.
  • I see that you've left columns ´Endogenous Nr_SNPs, Coverage_on_Target_SNPs, Damage, Contamination, Contamination_Err, Contamination_MeasandContamination_Note` empty. I'm sure these information are available in your paper, right? Do you need help with these? We have three student assistants now who can help with this. Let us know! I would be willing to leave these empty for now, but if it's just about needing help, let us help.
  • The Genetic_Source_Accession_IDs should be filled. They can all have the exact same Project Accession ID entry from the ENA.

Again, let us know if you need help with this and we can ask someone from our team.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants