One line scripts for bioinformatics

Extracting fasta records

First index the fasta file

samtools faidx mysequences.fasta

Then retrieve sequences by ID

samtools faidx mysequences.fasta id1 id2 id3

If you have a file with lots of ID's or you have some other command (eg awk) that produces IDs with one on each line you can use xargs to pass these IDs to samtools faidx. For example if all the IDs are in a file called ids.txt;

cat ids.txt | xargs samtools faidx mysequences.fasta

Reformatting fasta records

A great tool for this is bioawk .

For example to add a fasta compatible prefix like this

>comp12345_c0_seq1
to
>lcl|comp12345_c0_seq1

can be done with the following bioawk command

bioawk -c fastx '{printf ">lcl|%s\n%s\n", $name, $seq}' original.fasta > reformatted.fasta

Counting the number of sequences in a fasta file

cat mysequences.fasta | grep -c ">"

Untaring multiple files in a folder

for file in *.gz; do tar -xvfz $file;done

Annotating entries in a fasta file ( not quite one line)

Assuming we have an indexed blast database built as follows

makeblastdb -in uniprot_sprot.fasta -input_type 'fasta' -dbtype 'prot' -parse_seqids

Find homologs with blastp

blastp -query contigs.fasta -db uniprot_sprot.fasta -outfmt '6 qseqid sacc evalue stitle' -max_target_seqs 1 > contigs.blastp

Join blastp results to the fasta file. This just prints in tabular format

join <(bioawk -c fastx '{print $name,$seq}' contigs.fasta) <(awk -F '\t' '{print $1,$2,$3,$4}' contigs.blastp)

And finally reformat output to fasta with annotations

join <(bioawk -c fastx '{print $name,$seq}' contigs.fasta) <(awk -F '\t' '{print $1,$2,$3,$4}' contigs.blastp) | awk -F '\t' {printf(">%s %s\n%s\n",$1,$4,$2)}'

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

One line scripts for bioinformatics

Extracting fasta records

Reformatting fasta records

Counting the number of sequences in a fasta file

Untaring multiple files in a folder

Annotating entries in a fasta file ( not quite one line)

About

Releases

Packages

License

tauqeer9/bioinformatics-one-liners-1

Folders and files

Latest commit

History

Repository files navigation

One line scripts for bioinformatics

Extracting fasta records

Reformatting fasta records

Counting the number of sequences in a fasta file

Untaring multiple files in a folder

Annotating entries in a fasta file ( not quite one line)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages