This repository contains all data analysis files for the pre-processing and normalization of the (HILIC-based) untargeted metabolomics data from CHRIS.
- Marilyn De Graeve
- Johannes Rainer
- Vinicius Verri Hernandes
- Mar Garcia-Aloy
The MS data is provided as a self-contained SQL database that also contains
sample annotations (batch IDs etc). The data can be loaded as a MsExperiment
object through the
MsBackendSql
package.
Analysis scripts are supposed to be run on the calculation cluster using the
respective shell script (e.g. peak_detection.sh
for peak_detection.Rmd
). The
script should then be executed with sbatch --mem-per-cpu=24000 -c 10 --partition=batch ./peak_detection.sh
.
- general_data_overview.Rmd: general overview and summary statistics as well as initial quality assessment of the whole data set.
-- NEEDS TO BE UPDATED AND FIXED --
- peak_detection.Rmd: peak detection and peak post-processing of both positive and negative polarity data.
- peak_detection_qa.Rmd: quality assessment and summaries for the peak detection step.
TODO:
- evaluate alignment on QC samples:
- check EICs before/after: are we able to alleviate the retention time shifts caused by the LC maintenance in November 2021?
- normalization.Rmd: implementation of various data normalisation approaches (between-sample, within-batch and between-batch) for 1 specific polarity.
The analysis requires a recent version of R (version >= 3.6.0) and a set of R packages that can be installed with the code below.
install.packages("BiocManager")
library(BiocManager)
BiocManager::install(c("BiocStyle",
"xcms",
"RColorBrewer",
"pander",
"UpSetR",
"pheatmap",
"SummarizedExperiment",
"MsExperiment",
"Spectra",
"MetaboCoreUtils",
"MsBackendSql",
"MsQuality",
"writexl",
"ProtGenerics",
"MsCoreUtils",
"MetaboAnnotation"))
install.packages(c("tidyverse",
"readxl",
"readr",
"RSQLite",
"rmarkdown",
"kableExtra",
"magick",
"vioplot",
"pandoc",
"pander",
"car"))
mzML files for this data set are accessible from the calculation clusters at the base path /data/massspec/mzML/.
MS data was recorded in profile mode. Sciex wiff files were converted to mzML
file format using proteowizard
. The profile mode mzML files were centroided in
R
to generate the centroided mzML files. The scripts to perform the
wiff-to-mzML conversion and centroiding can be found
here.