See overview and documentation:
- To simplify handling of files, combine all the forward reads to one file and all the reverse reads to another.
cat *_1.fastq.gz >> forward_reads.fq.gz
cat *_2.fastq.gz >> reverse_reads.fq.gz
-
Run trinity to predict transcripts and their potential proteins from RNA-Seq alignment:
-
Run trinity for de novo transcriptome assembly:
./01_runTrinity.sh forward_reads.fq.gz reverse_reads.fq.gz
Note: You will get the transcripts fasta file in trinity_run folder.
-
Predict CDSs from transcriptome:
./02_runTransDecoder.sh trinity.fasta
Note: You will get the protein sequence (trinity.fasta.transdecoder.pep) in working directory.
-
-
Run maker by five steps:
-
Generate the CTL files:
module load GIF/maker module rm perl/5.22.1 maker -CTL
This will generate 3 CTL files (
maker_opts.ctl
,maker_bopts.ctl
andmaker_exe.ctl
), you will need to edit them to make changes to the MAKER run. For the first round, change these lines inmaker_opts.ctl
file:genome=TAIR10_chr_all.fas est=trinity.fasta protein=trinity-swissprot-pep.fasta est2genome=1 protein2genome=1 TMP=/dev/shm
-
Execute MAKER (03_maker_start.sh) in a slurm file. It is essential to request more than 1 node with multiple processors to run this efficiently.
# Define a base name for maker output folder as the first argument. ./03_maker_start.sh maker_case
-
Upon completion, train SNAP and AUGUSTUS:
Use the same base name as previous step for first argument. ./04_maker_process.sh maker_case
-
Train GeneMark with genome sequence:
05_runGeneMark.sh TAIR10_chr_all.fas
-
Once complete, modify the following lines in
maker_opts.ctl
file:snaphmm=maker.snap.hmm gmhmm=gmhmm.mod # Define a species as you want, but the name should not be existing in the augustus/config/species folder. augustus_species=maker_20171103
Then, run (03_maker_start.sh) again:
# Use the same base name as previous step for first argument. ./03_maker_start.sh maker_case
-
Finalize predictions:
06_maker_finalize.sh maker_case
You will get the predicted gene models (
maker_case.gff
), protein sequences (maker_case.maker.proteins.fasta
) and transcript sequence (maker_case.maker.transcripts.fasta
) in the working directory.
-