Skip to content

NJC12/missing_link_association_function

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This code is associated with the following manuscript. If you use any part of the source code, please cite us:

"Noah Connally, Sumaiya Nazeen, Daniel Lee, Huwenbo Shi, John Stamatoyannopoulos, Sung Chun, Chris Cotsapas, Christopher Cassa, and Shamil Sunyaev. The missing link between genetic association and regulatory function. medRxiv (2021). https://www.medrxiv.org/content/10.1101/2021.06.08.21258515v2"

Requirements

Python >= 3.5.3
Python packages:
	numpy 1.18.5
	scipy 1.4.1
R 3.4.3
R packages: 
	susieR 0.9.0
	matrixcalc 1.0-3
	Matrix 1.2-12
	reticulate 1.16 
bedtools v2.26.0

This code has been run with GCC 6.3.0.

Directory Structure

Data: contains auxiliary data files needed to run the scripts.
Code/
	Fine-mapping: contains the R script for running fine-mapping of gwas variants using SuSiE alogorithm.
	Chromatin-Analysis: contains the scripts for computing ABD scores for the gene-feature pairs per tissue type.

Instructions

i) Fine-mapping:
	```
	Rscript susie-finemapping.R <z_file> <ld_matrix_npz_file> <N> <out_file>
	```
ii) Computing ABD scores:
	a. Finding candidate peaks per chromatin mark per tissue type:
	   ```
	   python3 find_candidate_peaks.py input.narrowPeak causativeGenes.bed step1.candidatePeak
	   ```
	b. Recenter and overlap candidate peaks per chromatin mark per tissue type:
	   ```
	   ./recenterNoverlap.sh step1.candidatePeak chr_sizes blacklist-hg19.bed step2.recenteredPeak
	   ```
	c. Find common peaks between H3K27AC, H3K4me1, and H3K4me3 peaks per tissue type
	   ```
	   ./findCommonPeaks.sh ac.recenteredPeak me1.recenteredPeak me3.recenteredPeak step3.commonPeak
	   ```
	d. Compute activity by distance for gene-feature pairs per tissue type:
	   ```
	   python3 abd-compute.py step3.commonPeak reFlat.gencode.v19 output
	   ```