-
Notifications
You must be signed in to change notification settings - Fork 5
/
README.caps_verbose
60 lines (47 loc) · 2.5 KB
/
README.caps_verbose
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
# caps_verbose.patch
patch for CAPS v2 to produce more verbose output
http://bioinf.gen.tcd.ie/~faresm/software/files/caps2_src.zip
We provide a patch (caps_verbose.patch) to CAPS source code (caps.cpp),
that makes the program output its generated trees, as well as the
p-value for each correlated amino acid pair. For the moment, the patch
introduces these changes only for inter-protein analyses:
CAPS simulates number (-r) of random MSA, where each simulated alignment
has the same number of columns (length) as the real data and is tested
by the same method. This ensures that the data are compared to a null
distribution without coevolution pressures. Correlation information from
the simulated data is stored and sorted into vector totaltemp:
The correlation threshold for each protein pair is based on the
simulated data in totaltemp, found at a certain index, value. It is
determined by the alpha (-a) run-time option, referred here as
threshval:
[1] int value = floor(((totaltemp.size())*(1-(threshval))))+1;
threshold = totaltemp[value];
Correlation between residues R1 and R2 is determined as the mean of
correlation R1→R2 and correlation R2→R1. Each must be higher than the
threshold, whereas thresholdR is always 0.01:
if((fabs(Correl1[cor])>=threshold && fabs(Correl1[cor])>=thresholdR)
&& (fabs(Correl2[cor])>=threshold && fabs(Correl2[cor])>=thresholdR))
We seek to calculate the p-value of each correlation using the formula
from [1], where value is the correlation closest index within the
totaltemp vector. We introduce a new function getIndex, which ranks a
value K within a vector v, defines its lower bound value as it, then
determines the index of it within the vector.
Here, v = totaltemp and K = Correl_[cor] and the p-value of Correl_[cor]
is returned as alphathresh:
double getIndex(std::vector<double> const& v, double K) {
auto const it = std::lower_bound(v.begin(), v.end(), fabs(K));
if (it != v.end()) {
int index = distance(v.begin(), it);
alphathresh = (((int)1+(double)v.size()-(int)index)/(double)v.size());
return alphathresh;
}
else {
cerr << "ELEMENT NOT FOUND!" << endl;
}
}
We use the index of lower bound it, since an exact match of the
correlation value Correl_[cor] is unlikely to be found within totaltemp:
double Pvalue1 = getIndex(totaltemp, Correl1[cor]);
double Pvalue2 = getIndex(totaltemp, Correl2[cor]);
The patched executable of CAPS is installed as "vCAPS", so it can be
installed along the the official binary "caps" provided by upstream.