This tutorial shows how to use HCA R package for single-cell RNA data analysis, including immune cell annotation, tumor cell annotation, and data integration.
This package has several dependencies with version constraints:
- Seurat: version <5 (excluding 4.9); 4.3 recommended
- Matrix: version = 1.5-3
- scGate: version = 1.2.0
- future: version = 1.31.0
# scGate
remotes::install_github("carmonalab/scGate", ref = "v1.2.0")
# Matrix
# download Matrix_1.5-3.tar.gz from
install.packages("Matrix_1.5-3 .tar.gz", repos = NULL, type = "source") # Install from a local directory
# future
# download future_1.31.0.tar.gz from
install.packages("future_1.31.0.tar.gz", repos = NULL, type = "source") # Install from a local directory
# install our package
We recommend utilizing a high-memory server for optimal performance.
load the data:
scGate_DB <- readRDS("data/scGate_DB.rds")
datafilt <- readRDS("data/sc_datafilt.rds")
"datafilt" is a Seurat object that requires only basic cell filtering and includes a column labeled "sample" to define the cell-to-sample correspondence, with no need for additional processing.
non_epi <- c("EPCAM-", "CDH1-", "KRT7-", "KRT18-", "KRT19-", "ALB-", "AFP-") # for human
non_epi <- c("Krt5-", "Krt14-", "Krt6a-", "Dsp-", "Krt17-", "Lgals7-") # for mouse
dataimmu <- anno_immune(datafilt,
scGate_DB = scGate_DB,
organism = "human", # or mouse
non_epi = non_epi,
min_cell = 100,
ncore = 1) # Multi-core functionality is not available on Windows
Note 1: This step is optional. If your data has undergone CD45 sorting, then you only need to run immune cell annotation, and data integration can also be skipped.
Note 2: The input Seurat object must include a column labeled "sample" to define the cell-to-sample correspondence.
datacanc <- anno_tumor(datafilt,
scGate_DB = scGate_DB,
organism = "human", # or mouse
thres_sig = 0.005, # Adjust this threshold based on scatter_plot.png
thres_cor = 0.5, # Adjust this threshold based on scatter_plot.png
ncore = 1, # Multi-core functionality is not available on Windows
isFilter = TRUE)
If the code runs successfully, an image (inferCNV/scatter_plot.png) will be generated in the current path. You can select the threshold range based on the scatter plot positions in the image.
dataintg <- integrate(dataimmu, datacanc,
min_tumor = 50,
rm_doublet = FALSE,
prop_doublet = 0.075)
# If you skipped the annotation of tumor cells, please run
# dataintg <- dataimmu
After running, the Seurat object will contain a column labeled "celltype_sig2," representing the annotation results.
source("data/Other functions.R")
# clustering
dataintg <- autocluster(dataintg, nfeatures = 2000,
ndim = 15, neigh = 20,
dist = 1, res = 3) # Set a higher resolution (res) to capture more clusters
# celltype visualization
reduction = "umap",
pt.size = 0.2, label = T, = c("celltype_sig2"))
# High-resolution clustering visualization
reduction = "umap",
pt.size = 0.5, label = T, = c("seurat_clusters"))
Simultaneously review the cell type annotations and Seurat clustering results, removing clusters that encompass cells from divergent lineages (e.g., myeloid and lymphoid lineages within a single cluster) or clusters with atypical spatial positioning on the UMAP plot (e.g., T cell subsets positioned in close proximity to myeloid cells).
# exclude any problematic clusters
select = c("31","35","39","40","51")
dataintg <- dataintg[,!(dataintg$seurat_clusters %in% select)]
# re-analyze
dataintg <- autocluster(dataintg, nfeatures = 2000,
ndim = 15, neigh = 20,
dist = 1, res = 3)
reduction = "umap",
pt.size = 0.2, label = T, = c("celltype_sig2"))
source("data/Other functions.R")
# dotplot visualization (with default gene set)
name = "marker_dotplot.pdf"
dotplot_marker(dataintg, = "celltype_sig2",
marker = NULL,
species = "human", # or mouse
output = name,
height = 6)
# dotplot visualization (manually selected gene set)
Tcell = c("Cd3d", "Cd3e")
CD8T = c("Cd8a", "Cd8b1")
gene_list <- list(name1 = Tcell,
name2 = CD8T)
name = "marker_dotplot.pdf"
dotplot_marker(dataintg, = "celltype_sig2",
marker = gene_list,
species = NULL,
output = name,
height = 6)
This visualization specifically delineates the comparison between the control and experimental groups
source("data/Other functions.R")
# UMAP density plot
prop_density(datafilt = datafilt,
group = "group", # grouping information
coord = "umap")
# Back-to-back plot
prop_back2back(datafilt = datafilt,
group = "group", # grouping information
cluster = "seurat_clusters",
order = TRUE)
# Sample-level proportional distribution difference
input <- data.frame(table(dataimmu$sample, dataimmu$celltype_sig2))
prop_plot_hca(input, rotate = 45, decreasing = T, species = "human")
More analysis and visualization capabilities will be introduced in upcoming updates.
The iCNA package is essentially a more installable version of the infercna package (see, created to address the challenges often encountered with installing infercna across different environments. If you use our package, please cite both our study ( and the related article for the infercna package (