-
Notifications
You must be signed in to change notification settings - Fork 1
/
_results.qmd
382 lines (337 loc) · 15.5 KB
/
_results.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
<!-- detected compounds -->
Multiple compounds were detected in the dental calculus samples. Compounds detected
at a lower concentration than the lower limit of quantitation (LLOQ) were considered
not present. Not all the compounds detected in the first batch could be replicated
in the second batch (@tbl-compound-detect).
For a full list of targeted compounds, see Supplementary Material.
```{r}
#| label: detection-tbl-setup
uhplc_calculus_detect <- uhplc_calculus_long %>%
#uhplc_calculus_detect <- uhplc_calculus_demography %>%
#remove_missing() %>%
ungroup() %>%
filter(quant > 0) %>%
distinct(compound, batch) %>%
pivot_wider(names_from = batch, values_from = batch) %>%
mutate(across(!compound, ~ if_else(!is.na(.x), TRUE, FALSE)))
corr_weight_quant <- uhplc_calculus_long %>%
ungroup() %>%
filter(batch == "batch2",
quant > 0) %>%
select(id, weight, compound, quant) %>%
pivot_wider(names_from = compound, values_from = quant) %>%
select(where(is.numeric)) %>%
cor(use = "pairwise.complete.obs") %>%
.[1,]
max_quant <- uhplc_calculus_long %>%
filter(batch == "batch2",
id != "MB329"
) %>%
group_by(compound) %>%
summarise(
max_quant = max(quant, na.rm = T),
pos_y = 0.8 * max_quant
)
corr_labels <- tibble(
compound = names(corr_weight_quant)[-1],
corr = signif(corr_weight_quant[-1], 2),
) %>%
inner_join(max_quant) %>%
arrange(compound)
#%>%
# mutate(corr = as.character(corr))
#corr_labels$compounds <- prettified_names[corr_labels$compounds]
```
<!-- May need to limit the list to compounds that were detected in either batch and keep the full list in supplementary materials -->
```{r}
#| label: tbl-compound-detect
#| tbl-cap: "Target compound including whether it was detected (TRUE) or not (FALSE) in each batch, as well as the lower limit of quantitation (LLOQ) in ng. CBD = cannabidiol; CBN = cannabinol; THC = tetrahydrocannabinol; THCA-A = tetrahydrocannabinolic acid A; THCVA = tetrahydrocannabivarin acid."
uhplc_calculus_detect %>%
left_join(compound_name_abbrev, by = c("compound" = "abbrev")) %>%
left_join(lloq, by = c("name" = "compound")) %>%
mutate(across(batch1:batch2, ~ replace_na(.x, FALSE))) %>%
filter(!(batch1 == FALSE & batch2 == FALSE)) %>%
select(name, batch1, batch2, lloq) %>%
arrange(name) %>%
knitr::kable(col.names = c("Compound", "Batch 1", "Batch 2", "LLOQ"))
```
The pattern we expect to see in authentic compounds representing compounds
trapped within the dental calculus,
is a reduction in the quantity from wash 1 to wash 3 as potential surface contaminants
are washed off, and then a
spike in the final extraction when entrapped compounds are released and detected.
Most plots show a large increase in extracted mass of a compound between the
calculus wash extracts (wash 1-3) and the dissolved calculus (calc). Most samples
containing theophylline and caffeine had the largest quantity of the compound
extracted from the first wash, then decreasing in washes 2 and 3. There is
an increase between wash 3 and the dissolved calculus in all samples.
The patterns are consistent across batches 1 and 2.
Nicotine and cotinine have the same relative quantities in the samples, i.e., the
sample with the highest extracted quantity of nicotine also had the highest extracted
quantity of cotinine (@fig-auth-plot-batch2).
```{r}
#| label: fig-auth-plot-batch2
#| fig-cap: "(A) Number of samples in which each compound was detected in the first and second batch. (B) Quantity (ng) of each compound extracted from each sample in batch 2. The plot displays the extracted quantity across the three washes and final calculus extraction (calc). Each coloured line represents a different calculus sample. CBD = cannabidiol; CBN = cannabinol; THC = tetrahydrocannabinol; THCA-A = tetrahydrocannabinolic acid A; THCVA = tetrahydrocannabivarin acid."
#| fig-width: 8
#| fig-height: 6
auth_plot <- uhplc_data_long %>%
filter(
batch == "batch2",
!compound %in% c("cbd", "cbn", "cocaine", "thc", "thca-a", "thcva") # remove compounds not detected in batch 2
) %>%
semi_join(quant_filter, by = c("sample", "compound")) %>% # remove compounds not detected in each sample
mutate(
extraction = factor(extraction, levels = c("wash1", "wash2", "wash3", "calc")),
#sample = as.factor(sample)
) %>%
ggplot(aes(x = extraction, y = quant, group = sample, colour = sample)) +
geom_line() +
geom_point(size = 0.2) +
#facet_wrap(~ compound, scales = "free_y", labeller = compound_name_repair) +
facet_wrap(~ compound, scales = "free_y", labeller = labeller(compound = compound_names), dir = "v") +
theme_bw() +
scale_colour_viridis_d() +
scale_y_continuous(position = "right") +
labs(y = "Quantity (ng)") +
theme(
legend.position = "none",
axis.title.x = element_blank(),
axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1),
panel.grid.major.x = element_blank(),
panel.grid.minor.y = element_blank()
)
compound_detect_plot <- uhplc_calculus_long %>%
#remove_missing() %>%
filter(quant > 0) %>%
ggplot(aes(x = compound, fill = batch)) +
#geom_jitter(width = 0.2) +
#geom_boxplot() +
geom_bar(position = "dodge") +
#facet_wrap(~ batch) +
theme_bw() +
#scale_fill_viridis_d() +
scale_x_discrete(labels = compound_names) +
scale_fill_viridis_d(labels = c(1,2), end = 0.6) +
theme(
axis.title.x = element_blank(),
axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1),
legend.position = "left",
panel.grid.major.x = element_blank()
) +
labs(y = "Count", fill = "Batch")
compound_detect_plot + auth_plot + plot_layout(widths = c(1,1.8)) + plot_annotation(tag_levels = "A")
```
<!-- detection vs. preservation -->
To see if preservation of the skeletal remains had any effect on the detection of
compounds, we compare extracted quantities of compounds to the various levels of
skeletal preservation. Our results from batch 2 suggest that detection of a compound
may be linked to the preservation of the skeleton, with better preservation leading
to increased extraction quantity ([@fig-detection-preservation]A). We also
find a weak positive correlation between the weight of the calculus sample and the
quantity of compound extracted from the calculus ([@fig-detection-preservation]B).
```{r}
#| label: fig-detection-preservation
#| fig-cap: "(A) Violin plot with overlaid box plots depicting the distribution of extracted quantities of each compound from batch 2 separated by state of preservation of the skeleton. (B) Extracted quantity (ng) of compound plotted against weights of the calculus samples from batch 2. r = Pearson correlation coefficient."
#| fig-width: 4.8
#| fig-asp: 1.6
preservation_plot <- uhplc_calculus_long %>%
filter(
!is.na(preservation),
quant > 0,
batch == "batch2",
#id != "MB329" # unusually high nicotine and cotinine quantity in batch 1
) %>%
ggplot(aes(x = preservation, y = quant)) +
geom_violin(aes(fill = preservation), alpha = 0.6) +
#geom_jitter(width = 0.2) +
geom_boxplot(width = 0.2) +
facet_wrap(~ compound, scales = "free_y", labeller = labeller(compound = compound_names), dir = "v", ncol = 1) +
theme_bw() +
theme(
legend.position = "none",
panel.grid.major.x = element_blank(),
axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)
) +
labs(x = "Preservation", y = "Quantity (ng)") +
scale_fill_viridis_d()
weight_quant_plot <- uhplc_calculus_long %>%
filter(
batch == "batch2",
quant > 0,
#id != "MB329" # unusually high nicotine and cotinine quantity in batch 1
) %>%
ggplot(aes(x = weight, y = quant)) +
geom_smooth(method = "lm", se = F, col = viridisLite::viridis(1, begin = 0.6)) +
geom_point(col = viridisLite::viridis(1)) +
geom_text(
data = corr_labels,
aes(x = 24, y = pos_y, label = paste("r =", corr)),
#vjust = -1, hjust = -0.1) +
) +
facet_wrap(
~ compound,
scales = "free_y",
labeller = labeller(compound = compound_names),
dir = "v",
ncol = 1
) +
theme_bw() +
theme(
axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank()
) +
labs(y = "Quantity (ng)", x = "Weight (mg)")
preservation_plot + weight_quant_plot + plot_annotation(tag_levels = "A")
```
<!-- detection of tobacco as accuracy -->
```{r}
#| label: tobacco-accuracy-setup
#tobacco <- uhplc_calculus_demography %>%
# should only be in males
tobacco <- uhplc_calculus_long %>%
filter(
batch == "batch2",
compound %in% c("nicotine", "cotinine"),
sex == "m" | sex == "pm"
) %>%
mutate(detection = if_else(quant == 0, 0, 1)) # 0 = not detected; NA = not included in batch 2.
tobacco_accuracy <- tobacco %>%
remove_missing(vars = "quant") %>%
group_by(sample, id, .drop = F) %>%
summarise(
detection = sum(detection),
) %>%
left_join(select(demography, id, pipe_notch, preservation, age), by = "id") %>%
mutate(
pipe_notch = if_else(pipe_notch > 0, "Y", "N"),
correct = case_when(
detection == 0 & pipe_notch == "N" ~ 1,
detection > 0 & pipe_notch == "Y" ~ 1,
detection > 0 & pipe_notch == "N" ~ NaN, # no way of knowing macroscopically if the person smoked without pipe
TRUE ~ 0
)
)
accuracy_age <- tobacco_accuracy %>% # move to supplementary material
group_by(age) %>%
summarise(mean = mean(correct, na.rm = T))
# accuracy in replicated samples
replicated_accuracy <- tobacco %>%
group_by(sample,id) %>%
summarise(detection = sum(detection)) %>%
remove_missing() %>%
left_join(select(demography, id, pipe_notch, preservation, age), by = "id") %>%
mutate(
pipe_notch = if_else(pipe_notch > 0, "Y", "N"),
correct = case_when(
detection == 0 & pipe_notch == "N" ~ 1,
detection > 0 & pipe_notch == "Y" ~ 1,
detection > 0 & pipe_notch == "N" ~ NaN, # no way of knowing macroscopically if the person smoked without pipe
TRUE ~ 0
)
)
```
The presence of pipe notch(es) in an individual and concurrent detection of nicotine
and/or cotinine is used as a crude indicator of the accuracy of the method. Only
males were used in accuracy calculations, as pipe notches are ubiquitous in males,
but not in females.
In batch 2, the method was able to detect some form of tobacco in
`r nrow(filter(tobacco_accuracy, pipe_notch == "Y", detection > 0))`
of `r nrow(filter(tobacco_accuracy, pipe_notch == "Y"))`
individuals with a pipe notch
(`r scales::percent(nrow(filter(tobacco_accuracy, pipe_notch == "Y", detection > 0)) / nrow(filter(tobacco_accuracy, pipe_notch == "Y")), accuracy = 0.1)`).
When also considering correct the absence of a tobacco alkaloid together with the
absence of a pipe notch, the accuracy of the method is
`r scales::percent(mean(tobacco_accuracy$correct, na.rm = T), accuracy = 0.1)`.
Accuracy in the old adult age category is
`r scales::percent(filter(accuracy_age, age == "old")$mean, accuracy = 0.1)`,
but with only `r sum(tobacco_accuracy$age == "old")` individuals.
One individual---an old adult, probable female---was positive
for both nicotine and cotinine, and had no signs of a pipe notch.
<!-- missing data -->
### Correlations between detected alkaloids and diseases
For further statistical analyses, only the UHPLC-MS/MS results from batch 2
were used, as batch 1 had multiple compounds that were not detected in batch 2
and may have been contaminated.
```{r}
#| label: tbl-pearson
#| tbl-cap: "Pearson correlation (*r*) on dichotomous skeletal lesions and compound concentrations (ng/mg) from the second batch. Correlations between pairs of dichotomous variables are removed due to incompatibility with a Pearson correlation. Moderate and strong correlations in **bold**. OA = osteoarthritis; VOP = vertebral osteophytosis; SN = Schmorl's nodes; DDD = degenerative disc disease; CO = cribra orbitalia; CMS = chronic maxillary sinusitis; SA = salicylic acid; PN = pipe notches."
#| tbl-colwidths: [15,10,10,10,10,10,15,10,10]
corr_tib <- conc_cor %>%
as_tibble(rownames = "var") %>%
filter(var != "age",
var != "periodont_status")
corr_tib_long <- corr_tib %>%
pivot_longer(
-var,
names_to = "name",
values_to = "corr"
)
corr_tib_out <- corr_tib %>%
column_to_rownames("var") %>%
as.matrix() %>%
round(digits = 2)
cols <- 0
for(i in 7:nrow(corr_tib)){ # remove repeated values
cols <- cols + 1
corr_tib_out[i,1:cols] <- ""
}
corr_names <- c("Caries", "Nicotine", "SA", "Calculus", "PN", "Theophylline", "Caffeine", "Cotinine")
row.names(corr_tib_out) <- c(row.names(corr_tib_out)[1:6], corr_names)
corr_tib_out <- corr_tib_out[-nrow(corr_tib),] # remove last row because it has no values (all duplicate coefficients)
#mode(corr_tib_out) <- "numeric"
options(knitr.kable.NA = "")
corr_tib_out %>%
as_tibble(rownames = "var") %>%
mutate(across(!var, as.numeric)) %>%
tibble::column_to_rownames("var") %>%
mutate(across(everything(), ~kableExtra::cell_spec(.x, bold = if_else(.x >= 0.4, T, F)))) %>% mutate(across(everything(), ~ str_remove_all(.x, "NA"))) %>%
#kableExtra::cell_spec(bold = if_else(corr_tib_out >= 0.4, T, F)) #%>%
#kableExtra::spec_color(na_color = "fff") %>%
knitr::kable(col.names = corr_names)
#kableExtra::kbl(col.names = corr_names) %>%
```
```{r}
prettified_names_lower <- sapply(prettified_names, str_to_lower)
corr_tib_out <- corr_tib_long %>%
mutate(across(where(is.character), plyr::revalue, prettified_names_lower, warn_missing = F)) %>%
mutate(corr = signif(corr, 2))
polycorr_tib_out <- polycorr_tib %>%
mutate(across(where(is.character), plyr::revalue, prettified_names_lower, warn_missing = F)) %>%
mutate(corr = signif(corr, 2))
```
Point-biserial correlation was conducted on paired continuous and dichotomous
variables, to see if any relationships exist between extracted concentrations
and other variables.
The strongest point-biserial (Pearson) correlation correlations were a near-perfect
positive correlation between `r cor_statement(corr_tib_out, strength = "strong")`,
and moderate correlations between
`r cor_statement(corr_tib_out, strength = "moderate")` (@tbl-pearson).
Polychoric correlation was conducted on the dichotomised compounds and pathological
conditions, as well as the discretised dental diseases.
Salicylic acid was removed due to its ubiquitous presence in the sample, and is
likely to cause spurious correlations.
Strong correlations were found between `r cor_statement(polycorr_tib_out, strength = "strong")`.
Moderate correlations were found between `r cor_statement(polycorr_tib_out, strength = "moderate")`.
Remaining correlations were weak or absent (@fig-polycorr).
Correlations with age will be depressed because age was largely controlled for
in the sample selection.
```{r}
#| label: fig-polycorr
#| fig-cap: "Plot of the polychoric correlations (*rho*). Larger circles and increased opacity indicates a stronger correlation coefficient. OA = osteoarthritis; VOP = vertebral osteophytosis; SN = Schmorl’s nodes; DDD = degenerative disc disease; CO = cribra orbitalia; CMS = chronic maxillary sinusitis; SA = salicylic acid."
#| fig-width: 5
#| fig-height: 4
polycorr$rho %>%
ggcorrplot::ggcorrplot(
method = "circle", type = "lower",
outline.color = "grey40",
colors = c(viridisLite::plasma(1, begin = 0), "white", viridisLite::inferno(1, begin = 0.7)),
legend.title = expression(rho)
) +
scale_x_discrete(labels = prettified_names) +
scale_y_discrete(position = "right", labels = prettified_names) +
theme(
legend.position = "left",
legend.title = element_text(face = "italic")
)
```