2022-01-17 多個樣本單細胞分析流程

讀數(shù)據(jù)

合并數(shù)據(jù)，創(chuàng)建seurat對象

合并樣本的數(shù)據(jù)可以使用cbind（）基因數(shù)相同時
或者用merge（）基因數(shù)不同時
創(chuàng)建時攀细，用min.cells,min.features進行簡單的過濾

QC

人類中MT開頭浊闪，小鼠中mt開頭

實際情況中遇到的問題

比如第二種情況如果選5%進行過濾刽沾，會過濾掉很多細胞读处，可以考慮用25%過濾
第五種情況大概率樣本不能用

不同的樣本可以用不同的cutoff尘分，可以參考下面這篇文章中各種組織的線粒體cutoff
Systematic determination of the mitochondrial proportion in human and mice tissues for single-cell RNA-sequencing data quality control - PubMed (nih.gov)

飽和曲線

過濾+標準化

dim(),顯示有幾行幾列

#計算表達量變化顯著的基因FindVariableFeatures
experiment.aggregate <- FindVariableFeatures(experiment.aggregate, 
                                             selection.method = "vst",
                                             nfeatures = 1000)

正常情況表達量越高的基因猜惋，差異越大，如果不是培愁，代表數(shù)據(jù)量太小了著摔，或者QC有問題。

均一化及PCA降維

TSNE聚類（當細胞量幾千時定续，用UMAP）

找marker

先初篩

p_value：該基因在自己的亞群其他細胞亞群中的差異
logFC：平均表達倍數(shù)
pct1：該基因在自己的亞群中百分之多少的細胞中表達
pct2：在其他亞群中百分之多少的細胞中表達
p_val_adj:一般看這個谍咆，就是FDR

細篩

3.在自己的亞群中至少50%的細胞表達，也可以PCT2<0.5
4.5.按情況選擇要不要做

注釋（代碼在cell_marker_annotation.R）

0亞群很多是T細胞的marker私股，可以初步注釋為T細胞
1亞群也像T或者NK細胞
0和1很相似摹察，可以計算這個亞群的差異基因，代碼在sub_analysis_scRNA

計算特定兩組細胞之間的差異基因倡鲸，決定要不要把兩個亞群合并

這里看的是0和3的差異

發(fā)現(xiàn)0和3差異大的都是線粒體基因供嚎，說明本身差異不大，那就可以合并
再看看0,3和1的差異

如果這些差異是有意義的峭状，那么1可以不合并

免疫細胞注釋可以用singleR

注釋小結(jié)

可以把每個亞群找到的marker用在線工具做功能富集克滴，初步看一下功能

代碼

####  Code Description              ####
#---  1. Written by WoLin @ 2019.03.15，last update 19.07.14 ---#
#---  2. Analysis for single sample    ---#
#---  3. Support 10X data & expression matrix ---#
#---  4. Need to change:sample data, sam.name, dims ---#

#### 1. 加載分析使用的工具包 ####
library(Seurat)
library(ggplot2)
library(cowplot)
library(Matrix)
library(dplyr)
library(ggsci)

#### 2. 讀入原始表達數(shù)據(jù) ####
# 以下兩種方式二選一
# 10X 數(shù)據(jù)
bm1 <- Read10X("./BoneMarrow/BM1/")#read10×函數(shù)只需要寫到文件夾宁炫，不需要寫文件名
bm2 <- Read10X("./BoneMarrow/BM2/")
# panglaoBD下載的Rdata可以直接用load讀出一個相同的矩陣

# 這里的列名就是barcode,代表一個細胞偿曙，行名是基因
colnames(bm1)[1:10]  # 看bm1中前10個細胞的名字
colnames(bm2)[1:10]
# 合并前在細胞上（列名）打上樣本標簽
colnames(bm1) <- paste(colnames(bm1),"BM1",sep = "_") # 把列名改成細胞名_樣本來源
colnames(bm2) <- paste(colnames(bm2),"BM2",sep = "_")

#將所有讀入的數(shù)據(jù)合并成一個大的矩陣，確保行數(shù)（基因數(shù)）相等羔巢，增加列
#合并時需注意行名一致
#既有10X的數(shù)據(jù)又有表達矩陣的數(shù)據(jù)望忆，全部轉(zhuǎn)換為表達矩陣再進行合并
#關(guān)于矩陣合并請見單獨的矩陣合并腳本“merge_matrix.R”（行數(shù)不一樣樣時用這個代碼）
experiment.data <- cbind(bm1,bm2) # 行數(shù)相同時可以用cbind（樣本1罩阵，樣本2）

#創(chuàng)建一個叫multi的文件夾用于存放分析結(jié)果
sam.name <- "multi"
if(!dir.exists(sam.name)){
  dir.create(sam.name)
}

#### 3. 創(chuàng)建Seurat分析對象 ####
experiment.aggregate <- CreateSeuratObject(
  experiment.data,
  project = "multi", 
  min.cells = 10,#基因至少在10個細胞里有表達，過濾基因
  min.features = 200,#細胞最少表達200個基因启摄，過濾細胞
  names.field = 2,
  names.delim = "_")
#將數(shù)據(jù)寫到文件中一邊后續(xù)分析使用
save(experiment.aggregate,file=paste0("./",sam.name,"/",sam.name,"_raw_SeuratObject.RData"))

#### 4. 數(shù)據(jù)概覽 & QC ####
#查看SeuratObject中的對象
slotNames(experiment.aggregate)
#assay
experiment.aggregate@assays
#細胞及細胞中基因與RNA數(shù)量
dim(experiment.aggregate@meta.data)#顯示有幾行幾列稿壁，行是細胞
View(experiment.aggregate@meta.data)
#第一列是樣本名，在創(chuàng)建seurat對象時定義下劃線后面的部分是樣本名
#第二列ncountRNA是UMI數(shù)歉备，第三列nfeature是基因數(shù)
table(experiment.aggregate@meta.data$orig.ident)#統(tǒng)計matadata的第一列元素的個數(shù)

#轉(zhuǎn)換成表達矩陣
experiment.aggregate.matrix <- as.matrix(experiment.aggregate@assays$RNA@counts)

##QC：統(tǒng)計線粒體基因在每個細胞中的占比
experiment.aggregate[["percent.mt"]] <- PercentageFeatureSet(experiment.aggregate, 
pattern = "^MT-")#MT開頭的是線粒體基因傅是，在小鼠中是mt開頭
#小提琴圖可視化
pdf(paste0("./",sam.name,"/QC-VlnPlot.pdf"),width = 8,height = 4.5)
VlnPlot(experiment.aggregate, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), 
        ncol = 3)
dev.off()

##QC：統(tǒng)計基因數(shù)，RNA蕾羊，線粒體基因分布
gene.freq <- do.call("cbind", tapply(experiment.aggregate@meta.data$nFeature_RNA,
experiment.aggregate@meta.data$orig.ident,quantile,probs=seq(0,1,0.05)))
rna.freq <- do.call("cbind", tapply(experiment.aggregate@meta.data$nCount_RNA,experiment.aggregate@meta.data$orig.ident,
quantile,probs=seq(0,1,0.05)))
mt.freq <- do.call("cbind", tapply(experiment.aggregate@meta.data$percent.mt,experiment.aggregate@meta.data$orig.ident,quantile,probs=seq(0,1,0.05)))
freq.combine <- as.data.frame(cbind(gene.freq,rna.freq,mt.freq))
colnames(freq.combine) <- c(paste(colnames(gene.freq),"Gene",sep = "_"),
                            paste(colnames(rna.freq),"RNA",sep = "_"),
                            paste(colnames(mt.freq),"MT",sep = "_"))
write.table(freq.combine,file = paste0(sam.name,"/QC-gene_frequency.txt"),quote = F,sep = "\t")
rm(gene.freq,rna.freq,mt.freq)
View(freq.combine)
#先看小提琴圖喧笔，如果細胞質(zhì)量還可以，保留80%的細胞龟再，如果比較差书闸，保留60-70%細胞

##QC：基因數(shù)與線粒體基因以及RNA數(shù)量的分布相關(guān)性
plot1 <- FeatureScatter(experiment.aggregate, feature1 = "nCount_RNA", feature2 = "percent.mt")
plot2 <- FeatureScatter(experiment.aggregate, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")
pdf(paste0("./",sam.name,"/QC-FeatureScatter.pdf"),width = 8,height = 4.5)
CombinePlots(plots = list(plot1, plot2),legend = "none")
dev.off()
rm(plot1,plot2)
#紅色點（BM1）已經(jīng)接近飽和曲線的拐點（測到umi越多，測到的基因越多利凑，到一定程度浆劲，即使增加umi，也不能增加很多測到的基因）
#藍色點（BM2）還沒到飽和曲線的拐點哀澈，這個樣本的細胞測到的基因量少

#### 5. 篩選細胞 ####
cat("Before filter :",nrow(experiment.aggregate@meta.data),"cells\n")
experiment.aggregate <- subset(experiment.aggregate, 
                               subset = 
                                 nFeature_RNA > 500 &  # 基因數(shù)>500
                                 nCount_RNA > 1000 &   # UMI>1000
                                 nCount_RNA < 20000 &  # UMI<20000（過濾雙細胞）
                                 percent.mt < 5)       # 線粒體基因百分比<5
cat("After filter :",nrow(experiment.aggregate@meta.data),"cells\n")
table(experiment.aggregate@meta.data$orig.ident)#看過濾完兩個樣本還有多少細胞

#### 6. 表達量標準化 ####
experiment.aggregate <- NormalizeData(experiment.aggregate, 
                                      normalization.method = "LogNormalize",
                                      scale.factor = 10000)

#計算表達量變化顯著的基因FindVariableFeatures
experiment.aggregate <- FindVariableFeatures(experiment.aggregate, 
                                             selection.method = "vst",
                                             nfeatures = 1000) 
#一般500-2500個feature（基因）牌借，細胞類型越復(fù)雜，需要的feature（基因）越多

#展示標準化之后的整體表達水平
top10 <- head(x = VariableFeatures(experiment.aggregate), 10)
plot1 <- VariableFeaturePlot(experiment.aggregate)
plot2 <- LabelPoints(plot = plot1, points = top10)
pdf(file = paste0(sam.name,"/Norm-feature_variable_plot.pdf"),width = 8,height = 5)
CombinePlots(plots = list(plot1, plot2),legend = "none")
dev.off()

#### 7. 均一化與PCA ####
#均一化（需要一點時間）
experiment.aggregate <- ScaleData(
  object = experiment.aggregate,
  do.scale = FALSE,
  do.center = FALSE,
  vars.to.regress = c("orig.ident","percent.mt"))#去批次的因素割按，這里選擇不同樣本來源和線粒體基因百分比
#任何批次效應(yīng)校正都會損失一些信息膨报，所以一開始不要進行太強的批次效應(yīng)校正

#PCA降維計算（兩個作用：1看批次效應(yīng)校正得怎么樣 2聚類）
experiment.aggregate <- RunPCA(object = experiment.aggregate, 
                               features = VariableFeatures(experiment.aggregate),
                               verbose = F,npcs = 50)

#PCA結(jié)果展示-1
pdf(paste0("./",sam.name,"/PCA-VizDimLoadings.pdf"),width = 7,height = 5)
VizDimLoadings(experiment.aggregate, dims = 1:2, reduction = "pca")
dev.off()

#PCA結(jié)果展示-2
pdf(paste0("./",sam.name,"/PCA-DimPlot.pdf"),width = 5,height = 4)
DimPlot(experiment.aggregate, reduction = "pca")
dev.off()

#PCA結(jié)果展示-3
pdf(paste0("./",sam.name,"/PCA-DimHeatmap.pdf"),width = 5,height = 4)
DimHeatmap(experiment.aggregate, dims = 1:6, cells = 500, balanced = TRUE)
dev.off()

#### 8. 確定細胞類群分析PC ####
#耗時較久，一般不用
experiment.aggregate <- JackStraw(experiment.aggregate, num.replicate = 100,dims = 40)
experiment.aggregate <- ScoreJackStraw(experiment.aggregate, dims = 1:40)
pdf(paste0("./",sam.name,"/PCA-JackStrawPlot_40.pdf"),width = 6,height = 5)
JackStrawPlot(object = experiment.aggregate, dims = 1:40)
dev.off()

#碎石圖
pdf(paste0("./",sam.name,"/PCA-ElbowPlot.pdf"),width = 6,height = 5)
ElbowPlot(experiment.aggregate,ndims = 40)
dev.off()
#一般拐點不超過20

#確定用于細胞分群的PC
dim.use <- 1:20

#### 9. 細胞分群TSNE算法 ####
#TSNE算法(細胞量比較少的時候（幾千）哲虾，用UMAP)
experiment.aggregate <- FindNeighbors(experiment.aggregate, dims = dim.use)#計算細胞相似性
experiment.aggregate <- FindClusters(experiment.aggregate, resolution = 0.5)#resolution越高丙躏，細胞分出來的類越多
experiment.aggregate <- RunTSNE(experiment.aggregate, dims = dim.use, 
                                do.fast = TRUE)
pdf(paste0("./",sam.name,"/CellCluster-TSNEPlot_res0.5_",max(dim.use),"PC.pdf"),width = 5,height = 4)
DimPlot(object = experiment.aggregate, pt.size=0.5,label = T)
dev.off()

#按照數(shù)據(jù)來源分組展示細胞異同--畫在一張圖中
pdf(paste0("./",sam.name,"/CellCluster-TSNEPlot_SamGroup_",max(dim.use),"PC.pdf"),width = 5,height = 4)
DimPlot(object = experiment.aggregate, 
        group.by="orig.ident", 
        pt.size=0.5,reduction = "tsne")
dev.off()

#按照數(shù)據(jù)來源分組展示細胞異同--畫在多張圖中
pdf(paste0("./",sam.name,"/CellCluster-TSNEPlot_SamGroup_slipt_",max(dim.use),"PC.pdf"),width = 8,height = 4)
DimPlot(object = experiment.aggregate, 
        split.by ="orig.ident", 
        pt.size=0.5,reduction = "tsne")
dev.off()

table(experiment.aggregate@meta.data$orig.ident)
View(experiment.aggregate@meta.data)#剛才的分群結(jié)果已經(jīng)加到每個細胞的最后一列
table(experiment.aggregate@meta.data$orig.ident,experiment.aggregate@meta.data$seurat_clusters)#每個樣本中每群細胞數(shù)量
#個性化畫圖代碼見Sub_analysis_scRNA

#### 10. 計算marker基因 ####
#這一步計算的時候可以把min.pct以及l(fā)ogfc.threshold調(diào)的比較低，然后再基于結(jié)果手動篩選
all.markers <- FindAllMarkers(experiment.aggregate, only.pos = TRUE, 
                              min.pct = 0.3, logfc.threshold = 0.25)
write.table(all.markers,
            file=paste0("./",sam.name,"/",sam.name,"_total_marker_genes_tsne_",max(dim.use),"PC.txt"),
            sep="\t",quote = F,row.names = F)

# 遍歷每一個cluster然后展示其中前4個基因
marker.sig <- all.markers %>% 
  mutate(Ratio = round(pct.1/pct.2,3)) %>%
  filter(p_val_adj <= 0.05)  # 本條件為過濾統(tǒng)計學(xué)不顯著的基因

for(cluster_id in unique(marker.sig$cluster)){
  # cluster.markers <- FindMarkers(experiment.aggregate, ident.1 = cluster, min.pct = 0.3)
  # cluster.markers <- as.data.frame(cluster.markers) %>% 
  #   mutate(Gene = rownames(cluster.markers))
  cl4.genes <- marker.sig %>% 
    filter(cluster == cluster_id) %>%
    arrange(desc(avg_log2FC))
  cl4.genes <- cl4.genes[1:min(nrow(cl4.genes),4),"gene"]
  
  #VlnPlot
  pvn <- VlnPlot(experiment.aggregate, features = cl4.genes,ncol = 2)
  pdf(paste0("./",sam.name,"/MarkerGene-VlnPlot_cluster",cluster_id,"_tsne_",max(dim.use),"PC.pdf"),width = 7,height = 6)
  print(pvn)
  dev.off()
  
  #Feather plot 
  pvn <- FeaturePlot(experiment.aggregate,features=cl4.genes,ncol = 2)
  pdf(paste0("./",sam.name,"/MarkerGene-FeaturePlot_cluster",cluster_id,"_tsne_",max(dim.use),"PC.pdf"),width = 7,height = 6)
  print(pvn)
  dev.off()
  
  #RidgePlot
  pvn<-RidgePlot(experiment.aggregate, features = cl4.genes, ncol = 2)
  pdf(paste0("./",sam.name,"/MarkerGene-RidgePlot_cluster",cluster_id,"_tsne_",max(dim.use),"PC.pdf"),width = 7,height = 6)
  print(pvn)
  dev.off()
}
rm(cl4.genes,cluster_id,pvn)

#熱圖展示Top marker基因
#篩選top5的marker基因束凑，可以通過參數(shù)改為其他數(shù)值
top5 <- marker.sig %>% group_by(cluster) %>% 
  top_n(n = 5, wt = avg_log2FC)

#top-marker基因熱圖
pdf(paste0("./",sam.name,"/MarkerGene-Heatmap_all_cluster_tsne_",max(dim.use),"PC.pdf"))
DoHeatmap(experiment.aggregate, features = top5$gene,size = 2) +
  theme(legend.position = "none", 
        axis.text.y = element_text(size = 6))
dev.off()

#top-marker基因dotplot
pdf(paste0("./",sam.name,"/MarkerGene-DotPlot_all_cluster_tsne_",max(dim.use),"PC.pdf"),width = 50,height = 5)
DotPlot(experiment.aggregate, features = unique(top5$gene))+
  RotatedAxis()
dev.off()

最后編輯于：2022.01.24 13:03:25

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末晒旅，一起剝皮案震驚了整個濱河市，隨后出現(xiàn)的幾起案子汪诉，更是在濱河造成了極大的恐慌废恋，老刑警劉巖，帶你破解...
沈念sama閱讀 206,126評論 6贊 481
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件扒寄，死亡現(xiàn)場離奇詭異鱼鼓，居然都是意外死亡，警方通過查閱死者的電腦和手機该编，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 88,254評論 2贊 382
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進店門迄本，熙熙樓的掌柜王于貴愁眉苦臉地迎上來，“玉大人课竣，你說我怎么就攤上這事嘉赎≈孟保” “怎么了？”我有些...
開封第一講書人閱讀 152,445評論 0贊 341
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵公条，是天一觀的道長拇囊。經(jīng)常有香客問我，道長靶橱，這世上最難降的妖魔是什么寥袭？我笑而不...
開封第一講書人閱讀 55,185評論 1贊 278
?港島之戀（遺憾婚禮）
正文為了忘掉前任，我火速辦了婚禮关霸，結(jié)果婚禮上传黄，老公的妹妹穿的比我還像新娘。我一直安慰自己队寇，他們只是感情好尝江，可當我...
茶點故事閱讀 64,178評論 5贊 371
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布。她就那樣靜靜地躺著英上，像睡著了一般。火紅的嫁衣襯著肌膚如雪啤覆。梳的紋絲不亂的頭發(fā)上苍日，一...
開封第一講書人閱讀 48,970評論 1贊 284
城市分裂傳說
那天，我揣著相機與錄音窗声，去河邊找鬼相恃。笑死，一個胖子當著我的面吹牛笨觅，可吹牛的內(nèi)容都是我干的拦耐。我是一名探鬼主播，決...
沈念sama閱讀 38,276評論 3贊 399
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼见剩，長吁一口氣：“原來是場噩夢啊……” “哼杀糯！你這毒婦竟也來了？” 一聲冷哼從身側(cè)響起苍苞，我...
開封第一講書人閱讀 36,927評論 0贊 259
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤固翰，失蹤者是張志新（化名）和其女友劉穎，沒想到半個月后羹呵，有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體骂际，經(jīng)...
沈念sama閱讀 43,400評論 1贊 300
?護林員之死
正文獨居荒郊野嶺守林人離奇死亡，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點故事閱讀 35,883評論 2贊 323
?白月光啟示錄
正文我和宋清朗相戀三年冈欢，在試婚紗的時候發(fā)現(xiàn)自己被綠了歉铝。大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
茶點故事閱讀 37,997評論 1贊 333
活死人
序言：一個原本活蹦亂跳的男人離奇死亡凑耻，死狀恐怖太示，靈堂內(nèi)的尸體忽然破棺而出柠贤，到底是詐尸還是另有隱情，我是刑警寧澤先匪，帶...
沈念sama閱讀 33,646評論 4贊 322
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布种吸，位于F島的核電站，受9級特大地震影響呀非，放射性物質(zhì)發(fā)生泄漏坚俗。R本人自食惡果不足惜，卻給世界環(huán)境...
茶點故事閱讀 39,213評論 3贊 307
男人毒藥：我在死后第九天來索命
文/蒙蒙一岸裙、第九天我趴在偏房一處隱蔽的房頂上張望猖败。院中可真熱鬧，春花似錦降允、人聲如沸恩闻。這莊子的主人今日做“春日...
開封第一講書人閱讀 30,204評論 0贊 19
一樁弒父案剧董，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽幢尚。三九已至，卻和暖如春翅楼，著一層夾襖步出監(jiān)牢的瞬間尉剩，已是汗流浹背。一陣腳步聲響...
開封第一講書人閱讀 31,423評論 1贊 260
情欲美人皮
我被黑心中介騙來泰國打工毅臊，沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留理茎，地道東北人。一個月前我還...
沈念sama閱讀 45,423評論 2贊 352
代替公主和親
正文我出身青樓管嬉，卻偏偏與公主長得像皂林，于是被迫代替她去往敵國和親。傳聞我的和親對象是個殘疾皇子蚯撩，可洞房花燭夜當晚...
茶點故事閱讀 42,722評論 2贊 345