-
Notifications
You must be signed in to change notification settings - Fork 0
/
Ideas.Rmd
57 lines (36 loc) · 3.83 KB
/
Ideas.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
title: "Ideas for 2D Anlysis"
output: html_notebook
---
# Objective:
Hello Keshav,
here is the little explanation for the “2D-analysis” that we want to do. We have 2 datasets that we want to compare.
The first is the present/absent-effector list for all EPEC strains. So it tells us which effectors are present in each strain (they all have different sets of effectors). EPEC binds to the host (mouse) cell in the small intestine and ejects these effectors through the T3SS into the host cell. Inside the host cell, they alter epithelial cell function by blocking or inducing certain pathways.
The second is the 3’RNA-Seq dataset that you have been analyzing the last months. Here we have the gene expression of the infected intestinal epithelial cells of the host. So some genes are up- and some are downregulated inside the host after EPEC infection.
What we want to investigate is: **is there a correlation between the presence/absence of effectors that EPEC injects and the gene expression in the host? Possible would be a certain combination of effectors that leads to a certain host response in the intestinal cells of the mice**.I hope it helps. Please let me know if anything is unclear.
attached you find the present/absent-effector list for all strains that we discussed yesterday.“0” means this effector is not present in this strain (=absent), “1” means this effector is present once in this strain, and 2 means the effector is present twice.In case it’s too complicated to work with “three conditions” (“0”, “1”, “2”), you can just consider the “2”s as “1”s (so then you can consider that both “1” and “2” means “present”).
### 1. Correlation via matcor
1. Correlation between L2fc and Homology Numbers
2. Obtain homology numbers of effectors but as per the strains vs effector names.
3. We already have the l2fc table.
4. Filter it based on Differential Genes (keep only rows (genes) that are differential genes and also significant )
5. Perform two matrix correlation between l2fc matrix and homology of effectors matrix and use network plots and heatmaps to visualise anddetermine associations.
### 2. Splitting based on effector presence
1. From the net_eff column, can see that only 10 effectors are mainly present. Divide the effectors into 3 groups - High , Medium and Low.
2. For every effector, say Effector_A, can we split the strains into two groups (with Effector_A, without Effector_A),then use wilcoxon test to test, which gene is highly related to the presence of a Effector_A.
3. This mainly needs to be done for High value effectors --> most likley to contribute to Gene Expression.
4. Not sure the log2foldchange is better or the original gene expression matrix.
### 3. CCA - Canonical Correlation Analysis
1. X numeric matrix (n * p), containing the X coordinates. Y numeric matrix (n * q), containing the Y coordinates.
2. The canonical correlation analysis seeks linear combinations of the ’X’ variables which are the most
correlated with linear combinations of the ’Y’ variables
### 3a. Regularised CCA
### 4. Use Effector Presence in DESeq2 Metadata
1. One can do this directly in DESeq2 by including the effectors as a variable: ~ effector.A.status + strain + 0 (just cbind the effector matrix to metadata should be enough).
2. Since the effector status doesn't vary within strain, this will basically "pull out" an estimate of the group average of effector-positive and effector-negative strains, allowing for a direct comparison.
3. Not sure if this is what we are going for in this 2D analysis.
### 5. Clustering
1. To look for more complicated patterns, one could select the differentially expressed genes across all strains.
2. cluster the expression matrix on those genes.
3. Overlay the effector status (use a heatmap for such things).
### 6. GSVD - Generalised Singular Value Decomposition