Skip to content
cziegenhain edited this page Jul 11, 2020 · 25 revisions

Run the example dataset

To get to know zUMIs, we are providing an example dataset of 1 million reads generated with the SCRB-seq protocol.

wget https://github.com/sdparekh/zUMIs/raw/zUMIs-version1/ExampleData/barcoderead_HEK.1mio.fq.gz
wget https://github.com/sdparekh/zUMIs/raw/zUMIs-version1/ExampleData/cDNAread_HEK.1mio.fq.gz

If you do not have a STAR index yet, we are providing a dummy index of chromosome 22 build with STAR-2.7.3a for download from Google Drive:

https://drive.google.com/file/d/1PcaU3uaiaYYivOCLgn0VCmU0gMeyc9_M/view?usp=sharing

Extract the files after downloading:

tar -xvjf chr22_reference.tar.bz2

Now run zUMIs:

bash <path-to-zUMIs>/zUMIs.sh -y <test>.yaml 

Setup using the YAML config file

We have simplified starting zUMIs by switching to a config file for zUMIs2.0. Have a look at this annotated preset. In case you are not familiar with YAML files and/or prefer to use a graphical user interface for this, we provide a RShiny application to create YAML files. Run it in your local RStudio...

runApp('zUMIs/zUMIs-config_shiny.R')

...or use the convenient online version of the Shiny app.

Note that you will need to provide full paths to each file. Relative paths or use of ~ is discouraged.

Once you have created your config file, the run is started by calling the zUMIs-master script:

zUMIs.sh -y <myRun.yaml>

Find all possible arguments to this script here: Note that the STAR, samtools, pigz and Rscript executables used to be passed on the command line, but should now be defined in the YAML file.

  USAGE: zUMIs.sh [options]
	-h  Print the usage info.

### Required parameters ##

	-y  <YAML config file> : Path to the YAML config file. Required.

### Program paths ##
	-d  <zUMIs-dir>   	 : Directory containing zUMIs scripts.  Default: path to this script.

Config-file generation using Rshiny

To find out how you can configure the analyis using the Shiny app, check out the detailed explanations of both Mandatory parameters and Optional parameters.

Preparing STAR index for mapping

Please refer to the STAR manual!

It is not necessary to generate the genome index with specific overhang and splice-site reference, zUMIs passes the GTF file to STAR while mapping to insert junctions on the fly. If you have spike-ins in your dataset, they can either be added in the genome or add on the fly while mapping by giving the path to the according fasta file as an additional reference sequence in the configuration YAML.

Here is an example:

STAR --runMode genomeGenerate --runThreadN 16 --genomeDir mm10_STAR5idx_noGTF --limitGenomeGenerateRAM 111000000000 --genomeFastaFiles mm10.fa

Customizing mapping parameters

As default, zUMIs performs conventional or two-pass (can be set in YAML config file) mapping using STAR with the following parameters:

STAR --genomeDir "STARidx" --runThreadN "p" --readFilesCommand samtools view --sjdbGTFfile "gtf" --outFileNamePrefix "sample." --outSAMtype BAM Unsorted --outSAMmultNmax 1 --outFilterMultimapNmax 50 --outSAMunmapped Within --quantMode TranscriptomeSAM --sjdbOverhang "readlength - 1" --twopassMode Basic --readFilesIn 

Note that the read length is automatically detected by zUMIs.

For optimal results, it may be useful to modify mapping parameters, depending on the data and reference at hand. As an example, data with many splice junctions (eg at sequencing depths >500M reads) may need to increase the limits of splice junctions in STAR. In this case you should supplement your zUMIs config file as such:

additional_STAR_params: --limitOutSJcollapsed 2000000 --limitSjdbInsertNsj 2000000