-
Notifications
You must be signed in to change notification settings - Fork 21
Citation and References
When using grenepipe, please cite:
grenepipe: A flexible, scalable, and reproducible pipeline
to automate variant calling from sequence reads.
Lucas Czech and Moises Exposito-Alonso. Bioinformatics. 2022.
doi:10.1093/bioinformatics/btac600 [pdf]
Furthermore, please do not forget to cite all tools that you selected to be run for your analysis. See below for their references.
AdapterRemoval
AdapterRemoval: Easy cleaning of next-generation sequencing reads.
Lindgreen S.
BMC Res Notes. 2012.
doi:10.1186/1756-0500-5-337
AdapterRemoval v2: Rapid adapter trimming, identification, and read merging.
Schubert M, Lindgreen S, Orlando L.
BMC Res Notes. 2016.
doi:10.1186/s13104-016-1900-2
Cutadapt
Cutadapt removes adapter sequences from high-throughput sequencing reads.
Martin M.
EMBnet journal. 2011.
doi:10.14806/ej.17.1.200
fastp
fastp: an ultra-fast all-in-one FASTQ preprocessor.
Chen S, Zhou Y, Chen Y, Gu J.
Bioinformatics. 2018.
doi:10.1093/bioinformatics/bty560
SeqPrep
SeqPrep: Tool for stripping adaptors and/or merging paired reads with overlap into single reads.
John, JS.
https://github.com/jstjohn/SeqPrep
skewer
Skewer: a fast and accurate adapter trimmer for next-generation sequencing paired-end reads.
Jiang H, Lei R, Ding S-W, Zhu S.
BMC Bioinformatics. 2014.
doi:10.1186/1471-2105-15-182
trimmomatic
Trimmomatic: A flexible trimmer for Illumina sequence data.
Bolger AM, Lohse M, Usadel B.
Bioinformatics. 2014.
doi:10.1093/bioinformatics/btu170
Bowtie 2
Fast gapped-read alignment with Bowtie 2.
Langmead B, Salzberg SL.
Nat Methods. 2012.
doi:10.1038/nmeth.1923
bwa mem and bwa aln
Fast and accurate short read alignment with Burrows-Wheeler transform.
Li H, Durbin R.
Bioinformatics. 2009.
doi:10.1093/bioinformatics/btp324
bwa mem2
Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems.
Vasimuddin M, Misra S, Li H, Aluru S.
2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 2019.
doi:10.1109/IPDPS.2019.00041
BamUtil clipOverlap
An efficient and scalable analysis framework for variant extraction and refinement from population scale DNA sequence data.
Jun G, Wing MK, Abecasis GR, Kang HM.
Genome Research, 25(6), gr.176552.114. 2015.
doi:10.1101/GR.176552.114
Picard MarkDuplicates
Picard toolkit.
Broad Institute; 2018.
GitHub repository, online: http://broadinstitute.github.io/picard/
DeDup
EAGER: efficient ancient genome reconstruction.
Peltzer A, Jäger G, Herbig A, Seitz A, Kniep C, Krause J, et al.
Genome Biol. 2016.
doi:10.1186/s13059-016-0918-z
GATK BaseRecalibrator
The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data.
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al.
Genome Res. 2010.
doi:10.1101/GR.107524.110
samtool merge and samtool mpileup
The Sequence Alignment/Map format and SAMtools.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al.
Bioinformatics. 2009.
doi:10.1093/bioinformatics/btp352
mapDamage
mapDamage: testing for damage patterns in ancient DNA sequences.
Ginolhac A, Rasmussen M, Gilbert MTP, Willerslev E, Orlando L.
Bioinformatics. 2011.
doi:10.1093/bioinformatics/btr347
mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters.
Jónsson H, Ginolhac A, Schubert M, Johnson PLF, Orlando L.
Bioinformatics. 2013.
doi:10.1093/bioinformatics/btt193
DamageProfiler
DamageProfiler: Fast damage pattern calculation for ancient DNA.
Neukamm J, Peltzer A, Nieselt K.
bioRxiv. 2020.
doi:10.1101/2020.10.01.322206
bcftools call
A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data.
Li H.
Bioinformatics. 2011.
doi:10.1093/bioinformatics/btr509
freebayes
Haplotype-based variant detection from short-read sequencing.
Garrison E, Marth G.
arXiv. 2012.
arxiv:1207.3907
BEDOPS: high-performance genomic feature operations.
Neph S, Kuehn MS, Reynolds AP, Haugen E, Thurman RE, Johnson AK, et al.
Bioinformatics. 2012;28.
doi:10.1093/bioinformatics/bts277
GATK HaplotypeCaller, GATK SelectVariants, GATK VariantFiltration, GATK VariantRecalibrator
The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data.
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al.
Genome Res. 2010.
doi:10.1101/GR.107524.110
HAF-pipe
Accurate Allele Frequencies from Ultra-low Coverage Pool-Seq Samples in Evolve-and-Resequence Experiments.
Tilk S, Bergland A, Goodman A, Schmidt P, Petrov D, Greenblum S.
G3: Genes|Genomes|Genetics. 2019.
doi:10.1534/g3.119.400755
Maximum Likelihood Estimation of Frequencies of Known Haplotypes from Pooled Sequence Data.
Kessner D, Turner T, Novembre J.
Molecular Biology and Evolution. 2013.
doi:10.1093/molbev/mst016
FastQC
FastQC: a quality control tool for high throughput sequence data.
Andrews S.
Babraham Bioinformatics, Babraham Institute, Cambridge, United Kingdom; 2010.
Online: https://www.bioinformatics.babraham.ac.uk/projects/fastqc
samtool stats and samtool flagstat
The Sequence Alignment/Map format and SAMtools.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al.
Bioinformatics. 2009.
doi:10.1093/bioinformatics/btp352
QualiMap
Qualimap 2: Advanced multi-sample quality control for high-throughput sequencing data.
Okonechnikov K, Conesa A, García-Alcalde F.
Bioinformatics. 2016.
doi:10.1093/bioinformatics/btv566
Picard CollectMultipleMetrics
Picard toolkit.
Broad Institute; 2018.
GitHub repository, online: http://broadinstitute.github.io/picard/
snpEff
A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3.
Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, et al.
Fly. 2012.
doi:10.4161/fly.19695
VEP
The Ensembl Variant Effect Predictor.
McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GRS, Thormann A, Flicek P, Cunningham F.
Genome Biology. 2016.
doi:10.1186/s13059-016-0974-4
SeqKit
SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation.
Shen W, Le S, Li Y, Hu F.
PLOS ONE 11(10), e0163962. 2016.
doi:10.1371/journal.pone.0163962
bcftools stats
A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data.
Li H.
Bioinformatics. 2011.
doi:10.1093/bioinformatics/btr509
MultiQC
MultiQC: Summarize analysis results for multiple tools and samples in a single report.
Ewels P, Magnusson M, Lundin S, Käller M.
Bioinformatics. 2016.
doi:10.1093/bioinformatics/btw354
Snakemake
Snakemake--a scalable bioinformatics workflow engine.
Köster J, Rahmann S.
Bioinformatics. 2012.
doi:10.1093/bioinformatics/bts480
Sustainable data analysis with Snakemake.
Mölder F, Jablonski KP, Letcher B, Hall MB, Tomkins-Tinch CH, Sochat V, Forster J, Lee S, Twardziok SO, Kanitz A, Wilm A, Holtgrewe M, Rahmann S, Nahnsen S, Köster J.
F1000Res 10, 33. 2021.
doi:10.12688/f1000research.29032.2
Bioconda
Bioconda: A sustainable and comprehensive software distribution for the life sciences.
Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, et al.
Nat Methods. 2018.
doi:10.1038/s41592-018-0046-7
Fastq file format
The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants.
Cock PJA, Fields CJ, Goto N, Heuer ML, Rice PM.
Nucleic Acids Res. 2009.
doi:10.1093/nar/gkp1137
Fasta file format
Improved tools for biological sequence comparison.
Pearson WR, Lipman DJ.
Proceedings of the National Academy of Sciences. 1988.
doi:10.1073/pnas.85.8.2444
SAM/BAM file format
The Sequence Alignment/Map format and SAMtools.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup.
Bioinformatics. 2009.
doi:10.1093/bioinformatics/btp352
VCF file format
The variant call format and VCFtools.
Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al.
Bioinformatics. 2011.
doi:10.1093/bioinformatics/btr330
GrENE-net
Genomics of rapid Evolution in Novel Environments network (GrENE-net).
Online: https://grenenet.org/