See overview and documentation:
- To simplify handling of files, combine all the forward reads to one file and all the reverse reads to another.
cat *_1.fastq.gz >> forward_reads.fq.gz
cat *_2.fastq.gz >> reverse_reads.fq.gz
Run trinity to predict transcripts and their potential proteins from RNA-Seq alignment:
Run trinity for de novo transcriptome assembly:
./ forward_reads.fq.gz reverse_reads.fq.gz
Note: You will get the transcripts fasta file in trinity_run folder.
Predict CDSs from transcriptome:
./ trinity.fasta
Note: You will get the protein sequence (trinity.fasta.transdecoder.pep) in working directory.
Run maker by five steps:
Generate the CTL files:
module load GIF/maker module rm perl/5.22.1 maker -CTL
This will generate 3 CTL files (
), you will need to edit them to make changes to the MAKER run. For the first round, change these lines inmaker_opts.ctl
file:genome=TAIR10_chr_all.fas est=trinity.fasta protein=trinity-swissprot-pep.fasta est2genome=1 protein2genome=1 TMP=/dev/shm
Execute MAKER ( in a slurm file. It is essential to request more than 1 node with multiple processors to run this efficiently.
# Define a base name for maker output folder as the first argument. ./ maker_case
Upon completion, train SNAP and AUGUSTUS:
Use the same base name as previous step for first argument. ./ maker_case
Train GeneMark with genome sequence: TAIR10_chr_all.fas
Once complete, modify the following lines in
file:snaphmm=maker.snap.hmm gmhmm=gmhmm.mod # Define a species as you want, but the name should not be existing in the augustus/config/species folder. augustus_species=maker_20171103
Then, run ( again:
# Use the same base name as previous step for first argument. ./ maker_case
Finalize predictions: maker_case
You will get the predicted gene models (
), protein sequences (maker_case.maker.proteins.fasta
) and transcript sequence (maker_case.maker.transcripts.fasta
) in the working directory.