aDNA-analysis

About this repository

This repository has been created to complement my final-year thesis from my BSc in Genetics at the Universitat Autònoma de Barcelona (UAB). Here are some bash scripts and pipelines for the analysis of aDNA aimed at people who are beginners or who are not used to the use of these bioinformatics tools. It has been created taking into account the problems that I have faced myself, so I comment on them to help the community.

Before starting the analysis

When performing an analysis of any kind with aDNA, it is necessary to take into account some particularities:

The samples will always have very low coverage. This implies that for most procedures we will need to impute our variants with the reference genome
The reliability of the results will depend on the quality of the imputation. Obtaining a very large coverage from very few variants has the consequence that a proportion of the added variants will not correspond to those that our sample would hypothetically have.
For some analyzes, the damage that the DNA has suffered overtime must be taken into account

Sample preparation: aDNAprep.sh

Required packages

samtools (for indexing: .bai)
freebayes (for the calling variant: .vcf)
beagle (for the imputed: .vcf.gz <- compressed file. If the provided script is used, it will be designated like .imputed.vcf.gz)

All these applications can be installed through the Linux terminal quickly:

apt-get install samtools
apt-get install freebayes
apt-get install beagle

To keep in mind when using the aDNAprep.sh script

A bash script is provided with a basic pipeline to get, from .bam files, .vcf files with the unimputed variants and with the imputed variants. To consider:

It is recommended to modify the script by defining the variable $ id (line 5) with the user's preferences. In my case, I wanted to keep the first 5 letters of the original file name.
It is important to bear in mind that the nomenclature of the chromosomes in the .bam must coincide with that of the reference genome (.fa) that is used. To know the nomenclature, you can show the header of the indexed bam. You can download the reference .fa from the Google cloud repository: https://console.cloud.google.com/storage/browser/genomics-public-data/references;tab=objects?prefix=&forceOnObjectsSortingFiltering=false

head *.bai

At the time of making the variant calling, by my preference and to illustrate the example, it has only been done on chromosome 1 (line 11). Remove the -r command if you want the variant calling of the entire genome.
The imputation with Beagle requires a genetic map and a reference vcf of the chromosomes to be analyzed. They can be obtained from the Beagle resources: https://faculty.washington.edu/browning/beagle/beagle.html
The script has been designed keeping in mind that it is located in the same place as the .bam files. It is highly recomended that all the resource files are in a different directory (in my cas, a directory called source inside the the directory where the script is). Also, all files are generated in the same folder. However, it is possible to change the source of the inputs and the destination of the outputs by adding the Linux path of the directory where it is located. To run the script, it is simply necessary to either be in the same directory where the script is saved, have the applications it uses installed and write:

.\aDNAprep.sh

If you want to modify the script, after modifying the script and before executing it, it is necessary to change the notation to Linux. It can be done as follows:

awk '{sub ("\ r $", ""); print} 'aDNAprep.sh > DNAprep2.sh
mv aDNAprep2.sh aDNAprep.sh
.\aDNAprep.sh

Steps to follow: script breakdown

1. Indexing the .bam file

samtools index your.bam

You get a .bai file

2. Variant Calling

freebayes -f your_ref_genome.fa -r chromosome your.bam > output_name_you_want.vcf

For more variant calling options it is recommended to consult the freebayes manual (--help). In the proposed script aDNAprep.sh only chromosome 1 is analyzed. It is important that when referencing the chromosome, it is compulsory it has the same annotation as the reference .fa which, in turn, must have the same annotation as the .bam. Check that as above.

3. Imputation

beagle gp=true impute=true gt=your.vcf ref=chromosome_ref_from_beagle.vcf.gz map=your_chr_geneticmap_from_beagle.map out=name_and_path_you_want

For more options, consult the Beagle manual. All the files have been obtained from the resources on the Beagle page itself. It is also possible to use minimac4, which is usually recommended for aDNA, for samples in which the coverage is not very low. Otherwise, the restrictive and quality criteria will be impossible to assume. With Beagle, it is possible to impute whatever the number of variants that are going to be added and, in short, even though the quality of the imputation is not good in some windows.

4. Unzip the .vcf (optional)

gunzip *.vcf.gz

You can leave it as a .gz if your further analysis dont need you to unzip it.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
figuresANDplots		figuresANDplots
Plotting-aDNA-ROH.md		Plotting-aDNA-ROH.md
README.md		README.md
READMEspanish.md		READMEspanish.md
aDNAprep.sh		aDNAprep.sh
getRohData.py		getRohData.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

aDNA-analysis

About this repository

Before starting the analysis

Sample preparation: aDNAprep.sh

Required packages

To keep in mind when using the aDNAprep.sh script

Steps to follow: script breakdown

1. Indexing the .bam file

2. Variant Calling

3. Imputation

4. Unzip the .vcf (optional)

About

Releases

Packages

Languages

albanieto/aDNA-analysis

Folders and files

Latest commit

History

Repository files navigation

aDNA-analysis

About this repository

Before starting the analysis

Sample preparation: aDNAprep.sh

Required packages

To keep in mind when using the aDNAprep.sh script

Steps to follow: script breakdown

1. Indexing the .bam file

2. Variant Calling

3. Imputation

4. Unzip the .vcf (optional)

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages