For the installation of this pipeline any Python install compatable Conda is required.
The pipeline itself will run on Python 3.8.5 and R 3.6.3. For exact dependencies view environment.yaml
and r-dependencies.R
.
for easy installation you need (Mini)Conda.
Miniconda installation from folder where you want to install Miniconda:
cd </path/to/files/dir/>
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
follow the instructions of the installation process, give the location where you want Miniconda to be installed and answer YES to add Miniconda to your path.
go to the directory where the analysis need to be performed
cd </path/to/analysis/dir>
git clone https://github.com/tgac-vumc/QDNAseq.snakemake/
cd QDNAseq.snakemake
install Mamba as drop-in replacement for Conda with Mamba's improved installation-performance:
conda install -c conda-forge mamba
create the environment using Mamba:
mamba env create --name QDNAseq-snakemake --file environment.yaml
activate the environment by:
conda activate QDNAseq-snakemake
Then run the R-script r-dependencies.R in the terminal to install the non-conda R dependencies in the environment:
Rscript r-dependencies.R
Under development
go to analysis dir and prepare analysis by copy or create links to fastq.gz files:
cd </path/to/analysis/dir>
mkdir fastq
cd fastq
to link a single file:
ln -s <path/to/file>
to link all files from a folder:
for file in <path/to/fastq/files>/*.fastq.gz
do ln -s $file
done
Open the configuration file config.yaml
to check the settings that snakemake will use and change according to your needs.
For providing service-analysis, set setting
to 'service'
. For research purposes, set setting
to 'research'
. For all settings set setting
to 'all'
.
One of the options in the configfile is dewaving
, if set to 'true'
QNDAseq objects will be dewaved before segmentation.
These options change the rules performed in the pipeline, see the rule-graph in the next section.
Make sure that snakemake is able to find the excecutive file Snakefile by performing a dry-run:
cd ../QDNAseq.snakemake
snakemake -n
Check the rules that are planned to be performed, conform the rule-graph.
An visualization of the order of rules to be performed can be viewed by running the following command and opening the DAG-file
snakemake --forceall --rulegraph | dot -Tsvg > DAG.svg
Rulegraphs for the intial settings 'service'
, 'research'
and 'all'
are commited to this repro in the files DAG_<setting>.svg
.
When ready, run the analysis
snakemake
Useful snakemake options
-j , --cores, --jobs
: Use at most N cores in parallel (default: 1). If N is omitted, the limit is set to the number of available cores.
-n , --dryrun
: Do not execute anything. but show rules which are planned to be performed.
-k , --keep-going
: Go on with independent jobs if a job fails.
-f , --force
: Force the execution of the selected target or the first rule regardless of already created output.
-R , --forcerun
: Force the re-execution or creation of the given rules or files. Use this option if you changed a rule and want to have all its output in your workflow updated.
-U , --until
: Runs the pipeline until it reaches the specified rules or files. Only runs jobs that are dependencies of the specified rule or files, does not run sibling DAGs.
for all options go to https://snakemake.readthedocs.io/en/v5.31.1/executing/cli.html#all-options