Hi-C data Analysis and Processing PIpeline (HAPPY). This software outputs the contact matrix with different bin sizes (resolutions): 1k, 5k, 10k, 50k, 100k.
- conda
- snakemake
- BWA-MEM
- pairtools
- cooler
- SAMtools
conda
and snakemake
need to be installed manually and the rest will be automatically installed during the first launch of the program.
The output pair file can be easily filered based on provided condition.
git clone [email protected]:dawnmy/HAPPY.git
-
Adapt the config file for the pipeline Modify the
config/config.yaml
file in the program folder to adapt to your data location. -
Download the reference genome and create BWA index and fasta index (
.fai
). For instance GRCh38 for homo sapiens:
wget https://www.encodeproject.org/files/GRCh38_no_alt_analysis_set_GCA_000001405.15/@@download/GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta.gz -O GRCh38.fasta.gz
pigz -d GRCh38.fasta.gz
bwa index GRCh38.fasta
samtools fai GRCh38.fasta
- Make the chromosome sizes file based on the
.fai
cut -f1,2 GRCh38.fasta.fai > chrom.all.sizes
- Launch the pipeline With 20 threads
snakemake -s runHiC.smk --use-conda -j 20
If you use SGE submission system:
snakemake -s runHiC.smk --use-conda -c "qsub -cwd -pe multislot {threads} -i /dev/null -v PATH" -j 2