This tutorial shows how to use HCA R package for single-cell RNA data analysis, including immune cell annotation, tumor cell annotation, and data integration.
This package has several dependencies with version constraints:
- Seurat: version <5 (excluding 4.9); 4.3 recommended
- Matrix: version = 1.5-3
- scGate: version = 1.2.0
- future: version = 1.31.0
# scGate
library(remotes)
remotes::install_github("carmonalab/scGate", ref = "v1.2.0")
# Matrix
# download Matrix_1.5-3.tar.gz from https://cran.r-project.org/src/contrib/Archive/Matrix/
install.packages("Matrix_1.5-3 .tar.gz", repos = NULL, type = "source") # Install from a local directory
# future
# download future_1.31.0.tar.gz from https://cran.r-project.org/src/contrib/Archive/future/
install.packages("future_1.31.0.tar.gz", repos = NULL, type = "source") # Install from a local directory
# install our package
devtools::install_github('Liuzhicheng048/iCNA')
devtools::install_github('YangJAT/HCA')
We recommend utilizing a high-memory server for optimal performance.
library(HCA)
library(viridis)
library(Seurat)
library(iCNA)
library(stringr)
library(scGate)
library(future)
load the data:
scGate_DB <- readRDS("data/scGate_DB.rds")
datafilt <- readRDS("data/sc_datafilt.rds")
"datafilt" is a Seurat object that requires only basic cell filtering and includes a column labeled "sample" to define the cell-to-sample correspondence, with no need for additional processing.
non_epi <- c("EPCAM-", "CDH1-", "KRT7-", "KRT18-", "KRT19-", "ALB-", "AFP-") # for human
non_epi <- c("Krt5-", "Krt14-", "Krt6a-", "Dsp-", "Krt17-", "Lgals7-") # for mouse
dataimmu <- anno_immune(datafilt,
scGate_DB = scGate_DB,
organism = "human", # or mouse
non_epi = non_epi,
min_cell = 100,
ncore = 1) # Multi-core functionality is not available on Windows
Note 1: This step is optional. If your data has undergone CD45 sorting, then you only need to run immune cell annotation, and data integration can also be skipped.
Note 2: The input Seurat object must include a column labeled "sample" to define the cell-to-sample correspondence.
datacanc <- anno_tumor(datafilt,
scGate_DB = scGate_DB,
organism = "human", # or mouse
thres_sig = 0.005, # Adjust this threshold based on scatter_plot.png
thres_cor = 0.5, # Adjust this threshold based on scatter_plot.png
ncore = 1, # Multi-core functionality is not available on Windows
isFilter = TRUE)
If the code runs successfully, an image (inferCNV/scatter_plot.png) will be generated in the current path. You can select the threshold range based on the scatter plot positions in the image.
dataintg <- integrate(dataimmu, datacanc,
min_tumor = 50,
rm_doublet = FALSE,
prop_doublet = 0.075)
# If you skipped the annotation of tumor cells, please run
# dataintg <- dataimmu
After running, the Seurat object will contain a column labeled "celltype_sig2," representing the annotation results.
source("data/Other functions.R")
# clustering
dataintg <- autocluster(dataintg, nfeatures = 2000,
ndim = 15, neigh = 20,
dist = 1, res = 3) # Set a higher resolution (res) to capture more clusters
# celltype visualization
dimplot_new(dataintg,
reduction = "umap",
pt.size = 0.2, label = T,
group.by = c("celltype_sig2"))
# High-resolution clustering visualization
dimplot_new(dataintg,
reduction = "umap",
pt.size = 0.5, label = T,
group.by = c("seurat_clusters"))
Simultaneously review the cell type annotations and Seurat clustering results, removing clusters that encompass cells from divergent lineages (e.g., myeloid and lymphoid lineages within a single cluster) or clusters with atypical spatial positioning on the UMAP plot (e.g., T cell subsets positioned in close proximity to myeloid cells).
# exclude any problematic clusters
select = c("31","35","39","40","51")
dataintg <- dataintg[,!(dataintg$seurat_clusters %in% select)]
# re-analyze
dataintg <- autocluster(dataintg, nfeatures = 2000,
ndim = 15, neigh = 20,
dist = 1, res = 3)
dimplot_new(dataintg,
reduction = "umap",
pt.size = 0.2, label = T,
group.by = c("celltype_sig2"))
source("data/Other functions.R")
# dotplot visualization (with default gene set)
name = "marker_dotplot.pdf"
dotplot_marker(dataintg,
group.by = "celltype_sig2",
marker = NULL,
species = "human", # or mouse
output = name,
height = 6)
# dotplot visualization (manually selected gene set)
Tcell = c("Cd3d", "Cd3e")
CD8T = c("Cd8a", "Cd8b1")
gene_list <- list(name1 = Tcell,
name2 = CD8T)
name = "marker_dotplot.pdf"
dotplot_marker(dataintg,
group.by = "celltype_sig2",
marker = gene_list,
species = NULL,
output = name,
height = 6)
This visualization specifically delineates the comparison between the control and experimental groups
source("data/Other functions.R")
# UMAP density plot
prop_density(datafilt = datafilt,
group = "group", # grouping information
coord = "umap")
# Back-to-back plot
prop_back2back(datafilt = datafilt,
group = "group", # grouping information
cluster = "seurat_clusters",
order = TRUE)
# Sample-level proportional distribution difference
input <- data.frame(table(dataimmu$sample, dataimmu$celltype_sig2))
prop_plot_hca(input, rotate = 45, decreasing = T, species = "human")
More analysis and visualization capabilities will be introduced in upcoming updates.
The iCNA package is essentially a more installable version of the infercna package (see https://github.com/jlaffy/infercna), created to address the challenges often encountered with installing infercna across different environments. If you use our package, please cite both our study (https://doi.org/10.1016/j.ccell.2024.10.008) and the related article for the infercna package (https://doi.org/10.1016/j.cell.2019.06.024).