Skip to content
Hande Topa edited this page Jan 28, 2016 · 10 revisions

Welcome to the diffsplicing wiki!

Here you can find a brief manual for using our R and Matlab codes in order to reproduce the results in our paper titled "Analysis of differential splicing suggests different modes of short-term splicing regulation".

Creating bowtie reference index files:

bowtie-build -f --ntoa ref_transcriptome.fa ref_transcriptome

Aligning the reads in fastq files against index:

bowtie -q -v 3 --trim3 0 --trim5 0 --all -m 100 --threads 4 --sam ref_transcriptome *.fastq *.sam

Convert sam files into bam files with samtools (optional):

samtools view -hSb -o *.bam *.sam

Expression level estimation by BitSeq:

parseAlignment *.bam --format=BAM -o *.prob --trSeqFile ref_transcriptome.fa --trInfoFile *.tr --verbose

estimateExpression *.prob -o * -t *.tr --outType RPKM -p parameters1.txt -P 2

Above process produces the *.rpkm file which contains 500 MCMC samples of the expression level estimates for each transcript included in the reference transcriptome file.

Computing overall gene expression levels:

indexFile="tr_indices"

mcmcFileName="t1_r1.rpkm"

noSkip=7

start_line=1

end_line=15530

get_genelevels(indexFile,mcmcFileName,noSkip,start_line,end_line,sep="\t")

Output: t1_r1.rpkm_gene

Computing scaling factors:

mcmc_filenames_in=paste("t",rep(1:10),"_r1.rpkm_gene",sep="")

noSkip=0

noLines=3811

medianNorm(mcmc_filenames_in,noLines,noSkip)

Output: scaling_factors

Scaling the overall gene expression levels, relative and absolute transcript expression levels:

indexFile="tr_indices"

mcmcFileName="t1_r1.rpkm"

noSkip=0

start_line=1

end_line=15530

scaling_ind=1 # change according to the mcmcFileName

getgene_trratios(indexFile,mcmcFileName,noSkip,start_line,end_line,scaling_ind,'scaling_factors',"\t")

Outputs: t1_r1.rpkm_gene_scaled, t1_r1.rpkm_reltr_scaled, t1_r1.rpkm_abstr_scaled

Computing BitSeq means and variances:

mcmc_filenames_in=list("t1_r1.rpkm_gene_scaled") # apply the same also for *_reltr_scaled and *_abstr_scaled

noLines=3811 # noLines=15530

noSkip=0

getMeanTecVar(mcmc_filenames_in,noLines,noSkip)

Outputs: t1_r1.rpkm_gene_scaled_MeanTecVar, t1_r1.rpkm_reltr_scaled_MeanTecVar, t1_r1.rpkm_abstr_scaled_MeanTecVar.