同系列文章:
- sc-RAN-seq 數(shù)據(jù)分析||Seurat新版教程:Guided Clustering Tutorial
- sc-RAN-seq 數(shù)據(jù)分析||Seurat新版教程: Integrating datasets to learn cell-type specific responses
- 單細胞轉(zhuǎn)錄組數(shù)據(jù)分析||Seurat新版教程: Integration and Label Transfer
- sc-RAN-seq 數(shù)據(jù)分析||Seurat新版教程: Using sctransform in Seurat
- 單細胞轉(zhuǎn)錄組數(shù)據(jù)分析||Seurat新版教程:Differential expression testing
- 單細胞轉(zhuǎn)錄組 數(shù)據(jù)分析||Seurat新版教程:New data visualization methods in v3.0
- 單細胞轉(zhuǎn)錄組數(shù)據(jù)分析||Seurat并行策略
- Seurat教程上新||Mixscape : 用多模態(tài)單細胞數(shù)據(jù)篩選免疫檢查點
- Seurat Weekly NO.0 || 開刊詞
- Seurat Weekly NO.1 || 到底分多少個群是合適的清钥?!
- Seurat Weekly NO.2 || 我該如何取子集
- 你到底想要什么樣的umap/tsne圖?
- scRNA-seq擬時分析 || Monocle2 踩坑教程
- scRNA-seq數(shù)據(jù)分析 || Monocle3
Seurat,一個單細胞數(shù)據(jù)分析工具箱
十大函數(shù)
支撐這個魚骨架的是是下面的十個函數(shù),細心的讀者也許已經(jīng)發(fā)現(xiàn)帮哈,大師已經(jīng)插上了小紅旗。在Seurat v2到v3的過程中渗饮,其實是有函數(shù)名變化的但汞,當然最主要的我認為是參數(shù)中gene到features的變化,這也看出Seurat強烈的求生欲——既然單細胞不止做轉(zhuǎn)錄組那我也就不能單純地叫做gene了互站,所有表征單細胞的features均可以用我Seurat來分析了私蕾。另外,相對于features而言gene只不過是一個實例而已胡桃。所以在升級Seurat的時候一個關(guān)鍵的地方就是函數(shù)名以及參數(shù)的更改踩叭。至于新的功能和算法其實并不多,如果用不到Seurat v3的新功能(如UMAP降維)其實不升級到v3做單細胞轉(zhuǎn)錄組是完全沒問題的翠胰。
據(jù)不完全統(tǒng)計Seurat包大約有130多個函數(shù)容贝,我們有必要問號一遍嗎?不必要之景,當你有需求的時候去查就好了斤富,但是人類很多時候并不知道自己需要的是什么,所以我建議還是把他的函數(shù)說明手冊拿出來瀏覽一遍锻狗,至少把目錄看一遍满力,大概知道他能做什么。
> library(Seurat)
> packageVersion("Seurat")
[1] ‘3.0.1’
pbmc.counts <- Read10X(data.dir = "D:\\Users\\Administrator\\Desktop\\RStudio\\single_cell\\filtered_gene_bc_matrices\\hg19")
pbmc <- CreateSeuratObject(counts = pbmc.counts)
pbmc <- NormalizeData(object = pbmc)
pbmc <- FindVariableFeatures(object = pbmc)
pbmc <- ScaleData(object = pbmc)
pbmc <- RunPCA(object = pbmc)
pbmc <- FindNeighbors(object = pbmc)
pbmc <- FindClusters(object = pbmc)
pbmc <- RunTSNE(object = pbmc)
數(shù)據(jù)讀入
library(Seurat)
packageVersion("Seurat")
[1] ‘3.0.1’
# https://satijalab.org/seurat/mca.html
library(dplyr)
library(ggsci)#我想改變一下配色
學習一個R包當然是把他的標準流程走一遍了轻纪,下載它的官網(wǎng)數(shù)據(jù)放在我的電腦上油额。
# Load the PBMC dataset
list.files("D:\\Users\\Administrator\\Desktop\\RStudio\\single_cell\\filtered_gene_bc_matrices\\hg19")
[1] "barcodes.tsv" "genes.tsv" "matrix.mtx"
?Read10X
## Not run:
# For output from CellRanger < 3.0
data_dir <- 'path/to/data/directory'
list.files(data_dir) # Should show barcodes.tsv, genes.tsv, and matrix.mtx
expression_matrix <- Read10X(data.dir = data_dir)
seurat_object = CreateSeuratObject(counts = expression_matrix)
# For output from CellRanger >= 3.0 with multiple data types
data_dir <- 'path/to/data/directory'
list.files(data_dir) # Should show barcodes.tsv.gz, features.tsv.gz, and matrix.mtx.gz
data <- Read10X(data.dir = data_dir)
seurat_object = CreateSeuratObject(counts = data$`Gene Expression`)
seurat_object[['Protein']] = CreateAssayObject(counts = data$`Antibody Capture`)
也就是說Seurat可以直接用Read10X函數(shù)讀取cellrangerV2 和V3的數(shù)據(jù)了,省去了不少麻煩是不是刻帚?潦嘶!
創(chuàng)建Seurat對象。
pbmc.data <- Read10X(data.dir = "D:\\Users\\Administrator\\Desktop\\RStudio\\single_cell\\filtered_gene_bc_matrices\\hg19")
# Initialize the Seurat object with the raw (non-normalized data).
?CreateSeuratObject
pbmc <- CreateSeuratObject(counts = pbmc.data, project = "pbmc3k", min.cells = 3, min.features = 200)
> pbmc
An object of class Seurat
13714 features across 2700 samples within 1 assay
Active assay: RNA (13714 features)
如果不是10X的技術(shù),如是BD流程崇众,得到表達矩陣后,也是可以用Seurat分析的掂僵。需要注意的是矩陣要是要求矩陣行為基因,列為細胞編號顷歌。
library(Matrix)
matrix = read.delim(matrix.path, header = T, stringsAsFactors = FALSE)
pbmc <- CreateSeuratObject(counts = matrix, project = "project name", min.cells = 3, min.features = 200) # 參數(shù)自己寫的
> pbmc # 檢查一下是否讀對了看峻。
創(chuàng)建完其實是有兩個S4類數(shù)據(jù)集的,一個是Seurat對象一個是bgCMatrix衙吩,要知道每個對象中存儲的是什么數(shù)據(jù)互妓,可以通過或者 $來提取查看:
線粒體細胞和紅細胞比例。
# The [[ operator can add columns to object metadata. This is a great place to stash QC stats
pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")
?PercentageFeatureSet
HB.genes_total <- c("HBA1","HBA2","HBB","HBD","HBE1","HBG1","HBG2","HBM","HBQ1","HBZ") # 人類血液常見紅細胞基因
HB_m <- match(HB.genes_total,rownames(pbmc@assays$RNA))
HB.genes <- rownames(pbmc@assays$RNA)[HB_m]
HB.genes <- HB.genes[!is.na(HB.genes)]
pbmc[["percent.HB"]]<-PercentageFeatureSet(pbmc,features=HB.genes)
# write nGene_nUMI_mito_HB
head(pbmc@meta.data)[,c(2,3,4,5)]
nCount_RNA nFeature_RNA percent.mt percent.HB
AAACATACAACCAC 2419 779 3.0177759 0
AAACATTGAGCTAC 4903 1352 3.7935958 0
AAACATTGATCAGC 3147 1129 0.8897363 0
AAACCGTGCTTCCG 2639 960 1.7430845 0
AAACCGTGTATGCG 980 521 1.2244898 0
AAACGCACTGGTAC 2163 781 1.6643551 0
Feature、count冯勉、線粒體基因澈蚌、紅細胞基因占比可視化。
# Visualize QC metrics as a violin plot
VlnPlot(pbmc, features = c("nFeature_RNA", "nCount_RNA", "percent.mt","percent.HB"), ncol = 4)#+scale_color_npg() 不起作用
#p1_aaas = p1 + scale_color_aaas()
#p2_aaas = p2 + scale_fill_aaas()
注意到?jīng)]有灼狰,之前v2的nUMI和nGene分別改為了nCount_RNA宛瞄、nFeature_RNA。
這幾個指標之間的相關(guān)性交胚。
# FeatureScatter is typically used to visualize feature-feature relationships, but can be used
# for anything calculated by the object, i.e. columns in object metadata, PC scores etc.
?FeatureScatter
plot1 <- FeatureScatter(pbmc, feature1 = "nCount_RNA", feature2 = "percent.mt")+scale_color_npg()
plot2 <- FeatureScatter(pbmc, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")+scale_color_npg()
plot3 <- FeatureScatter(pbmc, feature1 = "nCount_RNA", feature2 = "percent.HB")+scale_color_npg()
CombinePlots(plots = list(plot1, plot2,plot3),legend="none")
?CombinePlots
CombinePlots函數(shù)是不是很好用呢份汗,親測ggplot2繪制的圖即使不是Seurat繪圖函數(shù)繪制出來的圖也是可以用這個函數(shù)來組合的。在簡單的探索之后蝴簇,我們來做一下實質(zhì)性的QC和均一化工作杯活。
均一化與標準化
pbmc <- subset(pbmc, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5) #
pbmc <- NormalizeData(pbmc, normalization.method = "LogNormalize", scale.factor = 10000)
Performing log-normalization
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
注意這里直接用了subset函數(shù),而不是之前的SubsetData熬词。
特征選擇:高變基因
#Identification of highly variable features (feature selection)
pbmc <- FindVariableFeatures(pbmc, selection.method = "vst", nfeatures = 2000)
Calculating gene variances
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating feature variances of standardized and clipped values
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
# Identify the 10 most highly variable genes
top10 <- head(VariableFeatures(pbmc), 10)
# plot variable features with and without labels
plot1 <- VariableFeaturePlot(pbmc)
plot2 <- LabelPoints(plot = plot1, points = top10, repel = TRUE)
CombinePlots(plots = list(plot1, plot2),legend="bottom")
標準化
# Scaling the data
all.genes <- rownames(pbmc)
pbmc <- ScaleData(pbmc, features = all.genes)
Centering and scaling data matrix
|====================================================================================================================================| 100%
>
特征提扰跃:PCA降維
#Perform linear dimensional reductionPerform linear dimensional reduction
pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc))
PC_ 1
Positive: CST3, TYROBP, LST1, AIF1, FTL, FTH1, LYZ, FCN1, S100A9, TYMP
FCER1G, CFD, LGALS1, S100A8, CTSS, LGALS2, SERPINA1, IFITM3, SPI1, CFP
PSAP, IFI30, SAT1, COTL1, S100A11, NPC2, GRN, LGALS3, GSTP1, PYCARD
Negative: MALAT1, LTB, IL32, IL7R, CD2, B2M, ACAP1, CD27, STK17A, CTSW
CD247, GIMAP5, AQP3, CCL5, SELL, TRAF3IP3, GZMA, MAL, CST7, ITM2A
MYC, GIMAP7, HOPX, BEX2, LDLRAP1, GZMK, ETS1, ZAP70, TNFAIP8, RIC3
PC_ 2
Positive: CD79A, MS4A1, TCL1A, HLA-DQA1, HLA-DQB1, HLA-DRA, LINC00926, CD79B, HLA-DRB1, CD74
HLA-DMA, HLA-DPB1, HLA-DQA2, CD37, HLA-DRB5, HLA-DMB, HLA-DPA1, FCRLA, HVCN1, LTB
BLNK, P2RX5, IGLL5, IRF8, SWAP70, ARHGAP24, FCGR2B, SMIM14, PPP1R14A, C16orf74
Negative: NKG7, PRF1, CST7, GZMB, GZMA, FGFBP2, CTSW, GNLY, B2M, SPON2
CCL4, GZMH, FCGR3A, CCL5, CD247, XCL2, CLIC3, AKR1C3, SRGN, HOPX
TTC38, APMAP, CTSC, S100A4, IGFBP7, ANXA1, ID2, IL32, XCL1, RHOC
PC_ 3
Positive: HLA-DQA1, CD79A, CD79B, HLA-DQB1, HLA-DPB1, HLA-DPA1, CD74, MS4A1, HLA-DRB1, HLA-DRA
HLA-DRB5, HLA-DQA2, TCL1A, LINC00926, HLA-DMB, HLA-DMA, CD37, HVCN1, FCRLA, IRF8
PLAC8, BLNK, MALAT1, SMIM14, PLD4, LAT2, IGLL5, P2RX5, SWAP70, FCGR2B
Negative: PPBP, PF4, SDPR, SPARC, GNG11, NRGN, GP9, RGS18, TUBB1, CLU
HIST1H2AC, AP001189.4, ITGA2B, CD9, TMEM40, PTCRA, CA2, ACRBP, MMD, TREML1
NGFRAP1, F13A1, SEPT5, RUFY1, TSC22D1, MPP1, CMTM5, RP11-367G6.3, MYL9, GP1BA
PC_ 4
Positive: HLA-DQA1, CD79B, CD79A, MS4A1, HLA-DQB1, CD74, HLA-DPB1, HIST1H2AC, PF4, TCL1A
SDPR, HLA-DPA1, HLA-DRB1, HLA-DQA2, HLA-DRA, PPBP, LINC00926, GNG11, HLA-DRB5, SPARC
GP9, AP001189.4, CA2, PTCRA, CD9, NRGN, RGS18, GZMB, CLU, TUBB1
Negative: VIM, IL7R, S100A6, IL32, S100A8, S100A4, GIMAP7, S100A10, S100A9, MAL
AQP3, CD2, CD14, FYB, LGALS2, GIMAP4, ANXA1, CD27, FCN1, RBP7
LYZ, S100A11, GIMAP5, MS4A6A, S100A12, FOLR3, TRABD2A, AIF1, IL8, IFI6
PC_ 5
Positive: GZMB, NKG7, S100A8, FGFBP2, GNLY, CCL4, CST7, PRF1, GZMA, SPON2
GZMH, S100A9, LGALS2, CCL3, CTSW, XCL2, CD14, CLIC3, S100A12, CCL5
RBP7, MS4A6A, GSTP1, FOLR3, IGFBP7, TYROBP, TTC38, AKR1C3, XCL1, HOPX
Negative: LTB, IL7R, CKB, VIM, MS4A7, AQP3, CYTIP, RP11-290F20.3, SIGLEC10, HMOX1
PTGES3, LILRB2, MAL, CD27, HN1, CD2, GDI2, ANXA5, CORO1B, TUBA1B
FAM110A, ATP1A1, TRADD, PPA1, CCDC109B, ABRACL, CTD-2006K23.1, WARS, VMO1, FYB
讓我們看一下PCA都有哪些結(jié)果。
- 每個細胞在PC軸上的坐標
head(pbmc@reductions$pca@cell.embeddings)
PC_1 PC_2 PC_3 PC_4 PC_5 PC_6 PC_7 PC_8 PC_9 PC_10 PC_11
AAACATACAACCAC -4.7296855 -0.5184265 -0.7623220 -2.3156790 -0.07160006 0.1317334 1.6301191 -1.1830836 1.2492647 1.9031544 -0.6730318
AAACATTGAGCTAC -0.5174029 4.5918957 5.9091921 6.9118856 -1.96243034 2.7866168 1.5096526 -0.3590951 -0.8247207 0.5923264 0.5552679
AAACATTGATCAGC -3.1891063 -3.4695154 -0.8313710 -2.0019985 -5.10442765 2.1216390 0.3480312 3.7184168 -1.0047037 1.1045846 2.1435963
AAACCGTGCTTCCG 12.7933021 0.1007166 0.6310221 -0.3687338 0.21838204 -2.8403815 0.8120182 0.1296084 -0.6425828 -3.4380616 1.9194359
AAACCGTGTATGCG -3.1288078 -6.3481412 1.2507776 3.0191026 7.84739502 -1.3017150 -2.3968660 -0.4419815 -2.9410710 -1.1924961 3.5800224
AAACGCACTGGTAC -3.1088963 0.9262125 -0.6482331 -2.3244378 -2.00526763 1.4851268 0.2660897 -0.4260322 -0.2045553 -1.5314854 1.3273948
PC_12 PC_13 PC_14 PC_15 PC_16 PC_17 PC_18 PC_19 PC_20 PC_21 PC_22
AAACATACAACCAC 3.6826462 0.8619755 -0.9450621 0.44368696 -0.09859108 -1.278487 1.3471705 0.46616434 -3.32957838 0.6047326 -1.28799604
AAACATTGAGCTAC 0.7339369 -2.2206068 2.8265252 0.05269715 -3.02766520 1.475587 2.4833998 0.94279874 -1.31018534 2.8866347 3.14488549
AAACATTGATCAGC 2.4368587 0.4965275 0.2784826 1.02114017 -0.82800163 1.149522 -0.6589667 1.97244531 0.46454913 -1.1989118 -4.00419302
AAACCGTGCTTCCG 0.5003049 0.6798730 -2.1597502 0.12053353 -1.27353626 -0.175932 2.6215382 -0.23445861 0.01131591 1.0903436 0.03797187
AAACCGTGTATGCG -0.6615918 -0.8076382 -0.9672315 0.74307188 2.00702431 2.270533 -2.7461609 0.06917022 -0.06584539 -0.7148572 2.84307560
AAACGCACTGGTAC -1.4131061 -0.7284717 1.0847490 0.69347069 1.50328633 2.899008 -1.7733431 -2.17298340 -0.93092390 1.4390144 -1.37939596
PC_23 PC_24 PC_25 PC_26 PC_27 PC_28 PC_29 PC_30 PC_31 PC_32 PC_33
AAACATACAACCAC -0.7179464 -1.4319395 0.83214783 -0.1679234 -0.7161114 1.84440151 -3.05781318 -1.6074727 -2.1517497 0.8687982 -0.41061522
AAACATTGAGCTAC -1.8457533 -1.3159421 -0.08628477 1.0921087 1.8536483 -0.85897429 -1.54691012 -2.0578432 -0.3853023 1.2088837 0.38597220
AAACATTGATCAGC 0.8930977 0.1537439 2.16359773 0.3598690 -2.7626285 -2.11701966 -2.29029854 -0.6499149 -1.8554231 -2.4971457 0.33549022
AAACCGTGCTTCCG 0.8687202 -2.4761498 0.67236429 -0.5985153 -0.1156671 -1.43199736 -1.55234852 -1.8298758 1.2738518 -0.6358930 -0.53630017
AAACCGTGTATGCG -0.3910890 -0.4052759 -1.84733460 -0.5151174 0.5780494 -0.99341983 0.06359586 -2.1070341 -3.1590629 -0.5086174 0.02705711
AAACGCACTGGTAC -1.5354609 -0.1891822 -1.39375039 -0.2486056 0.6756537 0.06547818 -1.61450533 1.2957103 -2.4891537 -0.3063821 -1.12912898
PC_34 PC_35 PC_36 PC_37 PC_38 PC_39 PC_40 PC_41 PC_42 PC_43 PC_44
AAACATACAACCAC -1.2894803 -1.3787980 0.6540711 0.5840483 -2.3962943 -0.08226692 -0.7486769 -2.1045346 0.21276227 -1.5445694 -0.09575044
AAACATTGAGCTAC -1.1530093 0.3536307 0.4779224 -0.8652969 0.7381044 -0.38738971 0.7657008 -1.2475574 -0.03955515 1.9097925 -2.12198023
AAACATTGATCAGC 1.4541669 1.3478909 -0.2352528 -3.1292462 0.2888443 0.10283533 -5.2083218 -0.8155861 -0.54565799 0.4010832 -1.21380173
AAACCGTGCTTCCG 0.8386651 1.2911685 1.2571691 -0.5025198 1.3793049 2.06967998 -2.2411401 -1.9964884 -1.13684041 1.9122370 0.66098397
AAACCGTGTATGCG -1.6797643 -1.1789580 -1.8789147 1.3673211 1.0768095 1.59383195 -1.5163564 0.9922202 1.59552318 -0.7380092 -0.19908409
AAACGCACTGGTAC -0.1862429 -3.8669912 0.4143763 0.9287853 1.3974358 2.55985354 -3.5595247 1.6952044 1.62933600 1.5996360 -0.16145240
PC_45 PC_46 PC_47 PC_48 PC_49 PC_50
AAACATACAACCAC -1.5215216 1.2413559 -1.6210075 -0.4655044 1.2091308 -1.8230275
AAACATTGAGCTAC -0.3750793 0.3240155 -0.3722397 1.3962012 -0.2224967 -0.9573797
AAACATTGATCAGC 0.7932551 2.8606879 -1.1490237 2.1686602 -1.0267464 -1.7466499
AAACCGTGCTTCCG -1.4980962 -2.3318838 -0.1933483 -0.7501199 0.3104423 -1.3381283
AAACCGTGTATGCG 0.6442588 -1.1071806 0.7092294 2.4249639 1.1466671 -1.3304418
AAACGCACTGGTAC 1.6184979 2.1078466 -2.2183316 -2.3865239 3.8680262 2.6023484
- 每個基因?qū)γ總€PC軸的貢獻度(loading值)
head(pbmc@reductions$pca@feature.loadings)
PC_1 PC_2 PC_3 PC_4 PC_5 PC_6 PC_7 PC_8 PC_9 PC_10
PPBP 0.010990202 0.01148426 -0.15176092 0.10403737 0.003299077 0.005342462 0.021318113 -0.008489070 -0.044269116 0.001400899
LYZ 0.116231706 0.01472515 -0.01280613 -0.04414540 0.049906881 0.065502380 -0.013871113 0.006693022 0.002358130 0.016020633
S100A9 0.115414362 0.01895146 -0.02368853 -0.05787777 0.085382309 0.045792197 -0.003264718 0.063778055 -0.017075914 0.029052363
IGLL5 -0.007987473 0.05454239 0.04901533 0.06694722 0.004603231 0.006223648 0.015035064 0.045562328 -0.014507167 0.023561387
GNLY -0.015238762 -0.13375626 0.04101340 0.06912322 0.104558611 -0.001310612 -0.082875350 0.015749055 0.035273464 0.041223421
FTL 0.118292572 0.01871142 -0.00984755 -0.01555269 0.038743505 -0.046662904 0.009732039 0.024969600 -0.003831722 0.019832614
PC_11 PC_12 PC_13 PC_14 PC_15 PC_16 PC_17 PC_18 PC_19 PC_20
PPBP 0.044863274 0.03016378 0.047134363 0.029237266 -0.0007364288 -0.0203226407 -0.039660967 -0.015647566 0.005726442 0.0094676092
LYZ -0.006755543 -0.01058222 0.010219384 -0.003357793 -0.0047166530 0.0021186397 -0.005522048 -0.014290198 -0.001907910 -0.0102060486
S100A9 -0.011657440 -0.01258808 -0.013741596 0.004428299 -0.0055191933 0.0008913828 -0.002733916 -0.002231082 0.010547068 -0.0008216471
IGLL5 0.015524704 0.01130811 -0.007237628 -0.012830423 0.0041744575 0.0335608964 -0.021776281 -0.010757306 0.022308563 0.0032278043
GNLY 0.048009355 0.06699642 0.010345031 -0.011239154 -0.0098440849 0.0203211722 0.020958052 0.007021271 0.002015113 -0.0159543322
FTL -0.001943726 0.01027191 -0.008383042 -0.003565683 0.0114076594 -0.0001738238 0.002335748 0.005825073 -0.001105492 -0.0088204939
PC_21 PC_22 PC_23 PC_24 PC_25 PC_26 PC_27 PC_28 PC_29 PC_30
PPBP 0.0029964810 -0.018582366 -0.0151428472 -0.011897098 0.0151225125 -0.001076671 0.0053611374 -0.000841606 0.0006853714 0.005872398
LYZ 0.0001300122 -0.004764486 -0.0002154396 0.002548603 -0.0004441571 0.003031771 0.0113930218 -0.001396761 -0.0103970973 -0.008782445
S100A9 0.0009857512 -0.001117424 -0.0008416341 -0.001567696 -0.0048855904 -0.001991376 -0.0054592020 -0.002967175 0.0059151260 0.011957343
IGLL5 -0.0351949335 -0.009983501 0.0241007907 -0.034169906 -0.0407579048 -0.019900995 -0.0208135294 -0.020108972 -0.0162282086 -0.002920229
GNLY -0.0054222188 0.009659472 0.0022483863 0.000215567 0.0048624598 0.014696711 0.0005396567 0.015794414 0.0092743862 -0.012203335
FTL 0.0047372372 -0.005878347 -0.0023076679 -0.015293995 0.0054536363 -0.001035276 0.0149072047 0.005597638 0.0063626914 0.009166101
PC_31 PC_32 PC_33 PC_34 PC_35 PC_36 PC_37 PC_38 PC_39 PC_40
PPBP 0.013086886 0.0008607900 0.012128703 -0.0009461223 -1.922878e-02 0.004541394 0.011542981 0.011429708 -1.217387e-05 -0.007032663
LYZ 0.001540172 -0.0003404526 0.001920834 -0.0010591244 2.964347e-05 0.005862608 0.008998625 -0.009836389 -5.602974e-03 -0.009227851
S100A9 0.009530345 -0.0151782583 0.003747623 -0.0064760799 2.406278e-03 -0.002071706 0.006369340 -0.016186976 -1.076186e-02 -0.002632939
IGLL5 0.007157640 -0.0004105958 -0.014691475 0.0436436450 3.019391e-03 -0.009648881 -0.014490356 0.012896714 3.714680e-03 -0.038642695
GNLY -0.018267660 -0.0065138955 0.007815716 -0.0048936380 9.212739e-04 -0.018786034 0.002183408 -0.012800079 -1.574655e-02 0.001831823
FTL 0.008757715 -0.0110614625 -0.011354104 0.0097377813 5.344130e-03 0.010270291 -0.002307617 -0.007765635 1.552409e-03 0.003765866
PC_41 PC_42 PC_43 PC_44 PC_45 PC_46 PC_47 PC_48 PC_49 PC_50
PPBP -0.0008612626 -0.011266674 0.015925649 0.012209582 0.0132620332 -0.0168464149 -0.002926878 0.0166900128 -3.071535e-05 -0.006571050
LYZ -0.0139353433 0.015835626 0.008829158 0.009923354 -0.0005053747 0.0005763965 -0.003656557 -0.0060199063 2.377456e-03 -0.002022636
S100A9 0.0047445877 0.011376942 0.001033356 -0.015291229 0.0001170734 -0.0011615944 -0.001255734 -0.0018189781 1.341860e-02 -0.004627139
IGLL5 -0.0231096888 -0.001424071 0.009561528 -0.026310314 0.0240009713 -0.0026671000 -0.001782701 -0.0188853751 1.604000e-03 -0.026632961
GNLY 0.0012187715 -0.016853940 0.002993127 0.023204504 -0.0110674164 -0.0240057148 -0.021747836 -0.0006874352 9.393104e-03 -0.002300090
FTL 0.0025105942 0.017913309 -0.002983271 -0.004176980 -0.0131060686 0.0001666315 0.006107927 -0.0076046483 7.912781e-03 0.008658740
···
head(pbmc@reductionspca@stdev)
[1] 7.098420 4.495493 3.872592 3.748859 3.171755 2.545292
head(pbmc@reductions$pca@key)
[1] "PC_"
···
對Loading值一番研究互拾。
> # Get the feature loadings for a given DimReduc
> t(Loadings(object = pbmc[["pca"]])[1:5,1:5])
PPBP LYZ S100A9 IGLL5 GNLY
PC_1 0.010990202 0.11623171 0.11541436 -0.007987473 -0.01523876
PC_2 0.011484260 0.01472515 0.01895146 0.054542385 -0.13375626
PC_3 -0.151760919 -0.01280613 -0.02368853 0.049015330 0.04101340
PC_4 0.104037367 -0.04414540 -0.05787777 0.066947217 0.06912322
PC_5 0.003299077 0.04990688 0.08538231 0.004603231 0.10455861
> # Get the feature loadings for a specified DimReduc in a Seurat object
> t(Loadings(object = pbmc, reduction = "pca") [1:5,1:5])
PPBP LYZ S100A9 IGLL5 GNLY
PC_1 0.010990202 0.11623171 0.11541436 -0.007987473 -0.01523876
PC_2 0.011484260 0.01472515 0.01895146 0.054542385 -0.13375626
PC_3 -0.151760919 -0.01280613 -0.02368853 0.049015330 0.04101340
PC_4 0.104037367 -0.04414540 -0.05787777 0.066947217 0.06912322
PC_5 0.003299077 0.04990688 0.08538231 0.004603231 0.10455861
> # Set the feature loadings for a given DimReduc
> new.loadings <- Loadings(object = pbmc[["pca"]])
> new.loadings <- new.loadings + 0.01
> Loadings(object = pbmc[["pca"]]) <- new.loadings
> VizDimLoadings(pbmc)
# Examine and visualize PCA results a few different ways
print(pbmc[["pca"]], dims = 1:5, nfeatures = 5)
PC_ 1
Positive: CST3, TYROBP, LST1, AIF1, FTL
Negative: MALAT1, LTB, IL32, IL7R, CD2
PC_ 2
Positive: CD79A, MS4A1, TCL1A, HLA-DQA1, HLA-DQB1
Negative: NKG7, PRF1, CST7, GZMB, GZMA
PC_ 3
Positive: HLA-DQA1, CD79A, CD79B, HLA-DQB1, HLA-DPB1
Negative: PPBP, PF4, SDPR, SPARC, GNG11
PC_ 4
Positive: HLA-DQA1, CD79B, CD79A, MS4A1, HLA-DQB1
Negative: VIM, IL7R, S100A6, IL32, S100A8
PC_ 5
Positive: GZMB, NKG7, S100A8, FGFBP2, GNLY
Negative: LTB, IL7R, CKB, VIM, MS4A7
DimPlot(pbmc, reduction = "pca")
#DimHeatmap(pbmc, dims = 1, cells = 500, balanced = TRUE)
DimHeatmap(pbmc, dims = 1:15, cells = 500, balanced = TRUE)
選擇多少個維度進行下一階段的分析呢歪今?基于PAC有以下幾種方法可以探索。
# Determine the ‘dimensionality’ of the dataset
# NOTE: This process can take a long time for big datasets, comment out for expediency. More
# approximate techniques such as those implemented in ElbowPlot() can be used to reduce
# computation time
pbmc <- JackStraw(pbmc, num.replicate = 100)
pbmc <- ScoreJackStraw(pbmc, dims = 1:20)
plot1<-JackStrawPlot(pbmc, dims = 1:15)
plot2<-ElbowPlot(pbmc)
CombinePlots(plots = list(plot1, plot2),legend="bottom")
可以看出大概在PC為十的時候颜矿,每個軸是有區(qū)分意義的寄猩。
每個軸顯著相關(guān)的基因,這個也可以作為后面分析選擇基因的一個參考骑疆。
#Returns a set of genes, based on the JackStraw analysis, that have statistically significant associations with a set of PCs.
?PCASigGenes
head(PCASigGenes(pbmc,pcs.use=2,pval.cut = 0.7))
[1] "PPBP" "LYZ" "S100A9" "IGLL5" "GNLY" "FTL"
好了田篇,最重要的一步來了,聚類分析封断。Seurat采用的是graph-based聚類方法,k-means方法在V3中已經(jīng)不存在了舶担。
聚類
# Cluster the cells
#Identify clusters of cells by a shared nearest neighbor (SNN) modularity optimization based clustering algorithm. First calculate k-nearest neighbors and construct the SNN graph. Then optimize the modularity function to determine clusters. For a full description of the algorithms, see Waltman and van Eck (2013) The European Physical Journal B. Thanks to Nigel Delaney (evolvedmicrobe@github) for the rewrite of the Java modularity optimizer code in Rcpp!
pbmc <- FindNeighbors(pbmc, dims = 1:10)
pbmc <- FindClusters(pbmc, resolution = 0.5)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 2638
Number of edges: 96033
Running Louvain algorithm...
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.8720
Number of communities: 9
Elapsed time: 0 seconds
> table(pbmc@active.ident) # 查看每一類有多少個細胞
0 1 2 3 4 5 6 7 8
697 483 480 344 271 162 155 32 14
# 提取某一類細胞坡疼。
> head(subset(as.data.frame(pbmc@active.ident),pbmc@active.ident=="2"))
pbmc@active.ident
AAACCGTGCTTCCG 2
AAAGAGACGCGAGA 2
AAAGCAGATATCGG 2
AAAGTTTGTAGCGT 2
AAATGTTGAACGAA 2
AAATGTTGTGGCAT 2
提取某一cluster細胞。
> pbmc
An object of class Seurat
13714 features across 2638 samples within 1 assay
Active assay: RNA (13714 features)
1 dimensional reduction calculated: pca
> subpbmc<-subset(x = pbmc,idents="2")
> subpbmc
An object of class Seurat
13714 features across 480 samples within 1 assay
Active assay: RNA (13714 features)
1 dimensional reduction calculated: pca
?WhichCells
head(WhichCells(pbmc,idents="2"))
[1] "AAACCGTGCTTCCG" "AAAGAGACGCGAGA" "AAAGCAGATATCGG" "AAAGTTTGTAGCGT" "AAATGTTGAACGAA" "AAATGTTGTGGCAT"
# Look at cluster IDs of the first 5 cells
head(Idents(pbmc), 5)
AAACATACAACCAC AAACATTGAGCTAC AAACATTGATCAGC AAACCGTGCTTCCG AAACCGTGTATGCG
1 3 1 2 6
Levels: 0 1 2 3 4 5 6 7 8
#提取部分細胞
> pbmc
An object of class Seurat
32738 features across 2700 samples within 1 assay
Active assay: RNA (32738 features)
> head(colnames(pbmc@assays$RNA@counts)[1:30])
[1] "AAACATACAACCAC" "AAACATTGAGCTAC" "AAACATTGATCAGC" "AAACCGTGCTTCCG" "AAACCGTGTATGCG" "AAACGCACTGGTAC"
> subbset<-subset(x=pbmc,cells=colnames(pbmc@assays$RNA@counts)[1:30])
> subbset
An object of class Seurat
32738 features across 30 samples within 1 assay
Active assay: RNA (32738 features)
當然衣陶,我們用的基本都是默認參數(shù)柄瑰,建議?FindClusters一下剪况,看看具體的參數(shù)設(shè)置教沾,比如雖然是圖聚類,但是卻有不同的算法译断,這個要看相應(yīng)的文獻了授翻。
Algorithm for modularity optimization (1 = original Louvain algorithm; 2 = Louvain algorithm with multilevel refinement; 3 = SLM algorithm; 4 = Leiden algorithm). Leiden requires the leidenalg python.
系統(tǒng)發(fā)育分析(Phylogenetic Analysis of Identity Classes)
#Constructs a phylogenetic tree relating the 'average' cell from each identity class.
# Tree is estimated based on a distance matrix constructed in either gene expression space or PCA spac
?BuildClusterTree
pbmc<-BuildClusterTree(pbmc)
Tool(object = pbmc, slot = 'BuildClusterTree')
Phylogenetic tree with 9 tips and 8 internal nodes.
Tip labels:
0, 1, 2, 3, 4, 5, ...
Rooted; includes branch lengths.
PlotClusterTree(pbmc)
識別低質(zhì)量樣本。
#Calculate the Barcode Distribution Inflection
?CalculateBarcodeInflections
pbmc<-CalculateBarcodeInflections(pbmc)
SubsetByBarcodeInflections(pbmc)
可視化降維
非線性降維——這個目的是為了可視化,而不是特征提瓤疤啤(PCA)巡语,雖然它也可以用來做特征提取。
- UMAP
#Run non-linear dimensional reduction (UMAP/tSNE)
# If you haven't installed UMAP, you can do so via reticulate::py_install(packages =
# 'umap-learn')
pbmc <- RunUMAP(pbmc, dims = 1:10)
UMAP(a=None, angular_rp_forest=False, b=None, init='spectral',
learning_rate=1.0, local_connectivity=1, metric='correlation',
metric_kwds=None, min_dist=0.3, n_components=2, n_epochs=None,
n_neighbors=30, negative_sample_rate=5, random_state=None,
repulsion_strength=1.0, set_op_mix_ratio=1.0, spread=1.0,
target_metric='categorical', target_metric_kwds=None,
target_n_neighbors=-1, target_weight=0.5, transform_queue_size=4.0,
transform_seed=42, verbose=True)
Construct fuzzy simplicial set
Construct embedding
completed 0 / 500 epochs
completed 50 / 500 epochs
completed 100 / 500 epochs
completed 150 / 500 epochs
completed 200 / 500 epochs
completed 250 / 500 epochs
completed 300 / 500 epochs
completed 350 / 500 epochs
completed 400 / 500 epochs
completed 450 / 500 epochs
> head(pbmc@reductions$umap@cell.embeddings) # 提取UMAP坐標值淮菠。
UMAP_1 UMAP_2
AAACATACAACCAC 1.7449461 -2.770269
AAACATTGAGCTAC 4.6293025 12.092374
AAACATTGATCAGC 5.2499804 -2.011440
AAACCGTGCTTCCG -11.9875174 -1.568554
AAACCGTGTATGCG 0.1626114 -8.743275
AAACGCACTGGTAC 2.9192858 -1.487868
- t-SNE
pbmc <- RunTSNE(pbmc, dims = 1:10)
head(pbmc@reductions$tsne@cell.embeddings)
tSNE_1 tSNE_2
AAACATACAACCAC -11.701490 7.120466
AAACATTGAGCTAC -21.981401 -21.330793
AAACATTGATCAGC -1.292978 23.822324
AAACCGTGCTTCCG 30.877776 -9.926240
AAACCGTGTATGCG -34.619197 8.773753
AAACGCACTGGTAC -3.046157 13.013471
比較一下兩個可視化的結(jié)果。
# note that you can set `label = TRUE` or use the LabelClusters function to help label
# individual clusters
plot1<-DimPlot(pbmc, reduction = "umap",label = TRUE)+scale_color_npg()
plot2<-DimPlot(pbmc, reduction = "tsne",label = TRUE)+scale_color_npg()
CombinePlots(plots = list(plot1, plot2),legend="bottom")
可以看出,兩者的降維可視化的結(jié)構(gòu)是一致的顺少,UMAP方法更加緊湊——在降維圖上桅咆,同一cluster離得更近,不同cluster離得更遠拥知,作為一種后來的算法有一定的優(yōu)點踏拜,但是t-SNE結(jié)構(gòu)也能很好地反映cluster的空間結(jié)構(gòu)。
差異分析
# Finding differentially expressed features (cluster biomarkers)
# find all markers of cluster 1
cluster1.markers <- FindMarkers(pbmc, ident.1 = 1, min.pct = 0.25)
head(cluster1.markers, n = 5)
p_val avg_logFC pct.1 pct.2 p_val_adj
IL32 1.894810e-92 0.8373872 0.948 0.464 2.598542e-88
LTB 7.953303e-89 0.8921170 0.981 0.642 1.090716e-84
CD3D 1.655937e-70 0.6436286 0.919 0.431 2.270951e-66
IL7R 3.688893e-68 0.8147082 0.747 0.325 5.058947e-64
LDHB 2.292819e-67 0.6253110 0.950 0.613 3.144372e-63
這是一種one-others的差異分析方法举庶,就是cluster1與其余的cluster來做比較执隧,當然這個是可以指定的,參數(shù)就是ident.2。
# find all markers distinguishing cluster 5 from clusters 0 and 3
cluster5.markers <- FindMarkers(pbmc, ident.1 = 5, ident.2 = c(0, 3), min.pct = 0.25)
head(cluster5.markers, n = 5)
p_val avg_logFC pct.1 pct.2 p_val_adj
FCGR3A 7.583625e-209 2.963144 0.975 0.037 1.040018e-204
IFITM3 2.500844e-199 2.698187 0.975 0.046 3.429657e-195
CFD 1.763722e-195 2.362381 0.938 0.037 2.418768e-191
CD68 4.612171e-192 2.087366 0.926 0.036 6.325132e-188
RP11-290F20.3 1.846215e-188 1.886288 0.840 0.016 2.531900e-184
看看輸出結(jié)果都是什么户侥。
?FindMarkers
data.frame with a ranked list of putative markers as rows,
and associated statistics as columns (p-values, ROC score, etc., depending on the test used (test.use)).
The following columns are always present:
avg_logFC: log fold-chage of the average expression between the two groups. Positive values indicate that the gene is more highly expressed in the first group
pct.1: The percentage of cells where the gene is detected in the first group
pct.2: The percentage of cells where the gene is detected in the second group
p_val_adj: Adjusted p-value, based on bonferroni correction using all genes in the dataset
我們還可以輸出一個總表镀琉。
# find markers for every cluster compared to all remaining cells, report only the positive ones
> pbmc.markers <- FindAllMarkers(pbmc, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)
Calculating cluster 0
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 08s
Calculating cluster 1
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 06s
Calculating cluster 2
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 21s
Calculating cluster 3
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 07s
Calculating cluster 4
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 08s
Calculating cluster 5
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 27s
Calculating cluster 6
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 21s
Calculating cluster 7
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 24s
Calculating cluster 8
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 19s
> head(pbmc.markers)
p_val avg_logFC pct.1 pct.2 p_val_adj cluster gene
RPS12 2.008629e-140 0.5029988 1.000 0.991 2.754633e-136 0 RPS12
RPS27 2.624075e-140 0.5020359 0.999 0.992 3.598656e-136 0 RPS27
RPS6 1.280169e-138 0.4673635 1.000 0.995 1.755623e-134 0 RPS6
RPL32 4.358823e-135 0.4242773 0.999 0.995 5.977689e-131 0 RPL32
RPS14 3.618793e-128 0.4283480 1.000 0.994 4.962812e-124 0 RPS14
RPS25 5.612007e-124 0.5203411 0.997 0.975 7.696306e-120 0 RPS25
> pbmc.markers %>% group_by(cluster) %>% top_n(n = 2, wt = avg_logFC)
# A tibble: 18 x 7
# Groups: cluster [9]
p_val avg_logFC pct.1 pct.2 p_val_adj cluster gene
<dbl> <dbl> <dbl> <dbl> <dbl> <fct> <chr>
1 1.96e-107 0.730 0.901 0.594 2.69e-103 0 LDHB
2 1.61e- 82 0.922 0.436 0.11 2.20e- 78 0 CCR7
3 7.95e- 89 0.892 0.981 0.642 1.09e- 84 1 LTB
4 1.85e- 60 0.859 0.422 0.11 2.54e- 56 1 AQP3
5 0. 3.86 0.996 0.215 0. 2 S100A9
6 0. 3.80 0.975 0.121 0. 2 S100A8
7 0. 2.99 0.936 0.041 0. 3 CD79A
8 9.48e-271 2.49 0.622 0.022 1.30e-266 3 TCL1A
9 2.96e-189 2.12 0.985 0.24 4.06e-185 4 CCL5
10 2.57e-158 2.05 0.587 0.059 3.52e-154 4 GZMK
11 3.51e-184 2.30 0.975 0.134 4.82e-180 5 FCGR3A
12 2.03e-125 2.14 1 0.315 2.78e-121 5 LST1
13 7.95e-269 3.35 0.961 0.068 1.09e-264 6 GZMB
14 3.13e-191 3.69 0.961 0.131 4.30e-187 6 GNLY
15 1.48e-220 2.68 0.812 0.011 2.03e-216 7 FCER1A
16 1.67e- 21 1.99 1 0.513 2.28e- 17 7 HLA-DPB1
17 7.73e-200 5.02 1 0.01 1.06e-195 8 PF4
18 3.68e-110 5.94 1 0.024 5.05e-106 8 PPBP
這里可以注意一下only.pos 參數(shù),可以指定返回positive markers 基因蕊唐。test.use可以指定檢驗方法屋摔,可選擇的有:wilcox,bimod替梨,roc钓试,t,negbinom副瀑,poisson弓熏,LR,MAST糠睡,DESeq2挽鞠。
cluster間保守conserved marker基因.
#Finds markers that are conserved between the groups
#構(gòu)建一個分組方式:
pbmc[['groups']] <- sample(x = c('g1', 'g2'), size = ncol(x = pbmc), replace = TRUE)
head(FindConservedMarkers(pbmc, ident.1 = 0, ident.2 = 1, grouping.var = "groups"))
Testing group g2: (0) vs (1)
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 04s
Testing group g1: (0) vs (1)
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 05s
g2_p_val g2_avg_logFC g2_pct.1 g2_pct.2 g2_p_val_adj g1_p_val g1_avg_logFC g1_pct.1 g1_pct.2
S100A4 6.687228e-33 -0.8410733 0.678 0.959 9.170864e-29 2.583278e-35 -0.9569347 0.687 0.945
MALAT1 2.380860e-16 0.2957331 1.000 1.000 3.265112e-12 7.501490e-20 0.3201783 1.000 1.000
VIM 1.192054e-14 -0.4921326 0.816 0.951 1.634783e-10 1.193930e-19 -0.4945881 0.798 0.945
ANXA2 3.115304e-13 -0.6406856 0.169 0.461 4.272327e-09 2.186333e-18 -0.7695240 0.164 0.504
ANXA1 5.226901e-18 -0.7544607 0.451 0.800 7.168173e-14 1.413468e-17 -0.6660324 0.507 0.803
S100A11 1.832303e-16 -0.6955104 0.288 0.665 2.512820e-12 1.208765e-17 -0.7757913 0.256 0.584
g1_p_val_adj max_pval minimump_p_val
S100A4 3.542707e-31 6.687228e-33 5.166555e-35
MALAT1 1.028754e-15 2.380860e-16 1.500298e-19
VIM 1.637356e-15 1.192054e-14 2.387860e-19
ANXA2 2.998337e-14 3.115304e-13 4.372665e-18
ANXA1 1.938430e-13 1.413468e-17 1.045380e-17
S100A11 1.657700e-13 1.832303e-16 2.417530e-17
繪制基因小提琴圖,其中右邊的圖使用原始的count取log繪制的狈孔,其實趨勢還是蠻一致的信认。
plot1<-VlnPlot(pbmc, features = c("MS4A1", "CD79A"))+scale_color_npg()
# you can plot raw counts as well
plot2<- VlnPlot(pbmc, features = c("MS4A1", "CD79A"),ncol=1, same.y.lims=T,slot = "counts", log = TRUE)+scale_color_npg()
CombinePlots(plots = list(plot1, plot2))
plot1<-FeaturePlot(pbmc, features = c("MS4A1", "GNLY", "CD3E", "CD14", "FCER1A", "FCGR3A", "LYZ", "PPBP",
"CD8A"),min.cutoff = 0, max.cutoff = 4)
top10 <- pbmc.markers %>% group_by(cluster) %>% top_n(n = 10, wt = avg_logFC)
plot2<-DoHeatmap(pbmc, features = top10$gene) + NoLegend()+scale_color_npg()
#CombinePlots(plots = list(plot1, plot2))
library(gridExtra)
grid.arrange(plot1,plot2,ncol = 2, nrow = 1)
gridExtra 也是可以用來組合seurat繪制的圖片的,不足為奇均抽,seurat本身用的ggplot2語法嫁赏。
細胞周期分析
與細胞周期相關(guān)的基因。
> cc.genes
$`s.genes`
[1] "MCM5" "PCNA" "TYMS" "FEN1" "MCM2" "MCM4" "RRM1" "UNG" "GINS2" "MCM6"
[11] "CDCA7" "DTL" "PRIM1" "UHRF1" "MLF1IP" "HELLS" "RFC2" "RPA2" "NASP" "RAD51AP1"
[21] "GMNN" "WDR76" "SLBP" "CCNE2" "UBR7" "POLD3" "MSH2" "ATAD2" "RAD51" "RRM2"
[31] "CDC45" "CDC6" "EXO1" "TIPIN" "DSCC1" "BLM" "CASP8AP2" "USP1" "CLSPN" "POLA1"
[41] "CHAF1B" "BRIP1" "E2F8"
$g2m.genes
[1] "HMGB2" "CDK1" "NUSAP1" "UBE2C" "BIRC5" "TPX2" "TOP2A" "NDC80" "CKS2" "NUF2" "CKS1B"
[12] "MKI67" "TMPO" "CENPF" "TACC3" "FAM64A" "SMC4" "CCNB2" "CKAP2L" "CKAP2" "AURKB" "BUB1"
[23] "KIF11" "ANP32E" "TUBB4B" "GTSE1" "KIF20B" "HJURP" "CDCA3" "HN1" "CDC20" "TTK" "CDC25C"
[34] "KIF2C" "RANGAP1" "NCAPD2" "DLGAP5" "CDCA2" "CDCA8" "ECT2" "KIF23" "HMMR" "AURKA" "PSRC1"
[45] "ANLN" "LBR" "CKAP5" "CENPE" "CTCF" "NEK2" "G2E3" "GAS2L3" "CBX5" "CENPA"
?CellCycleScoring
## Not run:
# pbmc_small doesn't have any cell-cycle genes
# To run CellCycleScoring, please use a dataset with cell-cycle genes
# An example is available at http://satijalab.org/seurat/cell_cycle_vignette.html
pbmc <- CellCycleScoring(
object = pbmc,
g2m.features = cc.genes$g2m.genes,
s.features = cc.genes$s.genes
)
head(x = pbmc@meta.data)[,c(7,8,9,10,11)] # 我是為了好看取了幾列來看油挥,你大可不必潦蝇。
seurat_clusters groups S.Score G2M.Score Phase
AAACATACAACCAC 1 g1 0.10502807 -0.04507596 S
AAACATTGAGCTAC 3 g1 -0.02680010 -0.04986215 G1
AAACATTGATCAGC 1 g2 -0.01387173 0.07792135 G2M
AAACCGTGCTTCCG 2 g2 0.04595348 0.01140204 S
AAACCGTGTATGCG 6 g1 -0.02602771 0.04413069 G2M
AAACGCACTGGTAC 1 g2 -0.03692587 -0.08341568 G1
VlnPlot(pbmc, features = c("nFeature_RNA", "nCount_RNA", "percent.mt","percent.HB","G2M.Score","S.Score"), ncol = 6)#+scale_color_npg()
在UMAP空間中繪制細胞周期信息款熬。
umapem<-pbmc@reductions$umap@cell.embeddings
umapem<-as.data.frame(umapem)
metada= pbmc@meta.data
dim(umapem);dim(metada)
metada$bar<-rownames(metada)
umapem$bar<-rownames(umapem)
ccdata<-merge(umapem,metada,by="bar")
head(ccdata)
library(ggplot2)
plot<-ggplot(ccdata, aes(UMAP_1, UMAP_2,label=Phase))+geom_point(aes(colour = factor(Phase)))+
#plot<-plot+scale_colour_manual(values=c("#CC33FF","Peru","#660000","#660099","#990033","black","red", "#666600", "green","#6699CC","#339900","#0000FF","#FFFF00","#808080"))+
labs("@yunlai",x = "", y="")
plot=plot+scale_color_aaas() +
theme_bw()+theme(panel.grid=element_blank(),legend.title=element_blank(),legend.text = element_text(color="black", size = 10, face = "bold"))
plot<-plot+guides(colour = guide_legend(override.aes = list(size=5))) +theme(plot.title = element_text(hjust = 0.5))
plot
探索
我們可以用SingleR或者celaref來定義每一個細胞的類型,而不只是cluster某某某了护蝶。其中SingleR與seurat是無縫銜接的华烟,seurat對象可以讀到這個里面。這里先不寫持灰,假定我們已經(jīng)知道了各個類群:
# singleR
#Assigning cell type identity to clusters
new.cluster.ids <- c("Naive CD4 T", "Memory CD4 T", "CD14+ Mono", "B", "CD8 T", "FCGR3A+ Mono",
"NK", "DC", "Platelet")
names(new.cluster.ids) <- levels(pbmc)
pbmc <- RenameIdents(pbmc, new.cluster.ids)
plot1<-DimPlot(pbmc, reduction = "umap", label = TRUE, pt.size = 0.5) + NoLegend()
plot2<-DimPlot(pbmc, reduction = "tsne", label = TRUE, pt.size = 0.5) + NoLegend()
grid.arrange(plot1,plot2,ncol = 2, nrow = 1)
平均表達譜函數(shù)AverageExpression
?AverageExpression
AverageExp<-AverageExpression(pbmc,features=unique(top10$gene))
Finished averaging RNA for cluster Naive CD4 T
Finished averaging RNA for cluster Memory CD4 T
Finished averaging RNA for cluster CD14+ Mono
Finished averaging RNA for cluster B
Finished averaging RNA for cluster CD8 T
Finished averaging RNA for cluster FCGR3A+ Mono
Finished averaging RNA for cluster NK
Finished averaging RNA for cluster DC
Finished averaging RNA for cluster Platelet
> typeof(AverageExp)
[1] "list"
> head(AverageExp$RNA)
Naive CD4 T Memory CD4 T CD14+ Mono B CD8 T FCGR3A+ Mono NK DC Platelet
RPS3A 80.173486 61.905672 24.1161656 65.4588054 53.2413307 26.3218430 38.9542301 39.4926725 8.2843982
LDHB 13.697489 13.663320 2.5790368 2.8923463 7.1405019 1.3868494 4.4117033 3.1508971 2.0904628
CCR7 2.984692 1.293789 0.1020109 1.0788038 0.1631841 0.1413904 0.1196927 0.1473077 0.0000000
CD3D 10.724878 11.342980 0.4632153 0.3310177 13.9581981 0.2767605 1.1144081 0.2506665 0.0000000
CD3E 7.320622 7.807142 0.4356602 0.5741410 7.6701063 0.4992164 3.5389591 0.4322447 1.6960081
NOSIP 5.912196 5.196203 1.2965789 0.8373659 2.4063675 2.0503487 2.1314856 1.0916285 0.0804829
library(psych)
library(pheatmap)
coorda<-corr.test(AverageExp$RNA,AverageExp$RNA,method="spearman")
pheatmap(coorda$r)
記得保存數(shù)據(jù)以便下次使用盔夜。
saveRDS(pbmc, file = "D:\\Users\\Administrator\\Desktop\\RStudio\\single_cell\\filtered_gene_bc_matrices\\hg19pbmc_tutorial.rds")
富集分析
有了基因列表其實就可以做富集分析了借助enrichplot等R包可以做出很多好的探索。
require(org.Hs.eg.db)
library(topGO)
library(DOSE)
x=as.list(org.Hs.egALIAS2EG)
geneList<-rep(0,nrow(pbmc))
names(geneList)<-row.names(pbmc)
geneList<-geneList[intersect(names(geneList),names(x))]
newwallgenes=names(geneList)
for (ii in 1:length(geneList)){
names(geneList)[ii]<-x[[names(geneList)[ii]]][1]
}
gene_erichment_results=list()
for (c1 in as.character(unique(levels(pbmc.markers$cluster)))){
print(paste0("RUN ", c1))
testgenes<-subset(pbmc.markers,cluster==c1)$gene
gene_erichment_results[[c1]]=list()
testgeneList=geneList
testgeneList[which(newwallgenes %in% testgenes)]= 1
#gene_erichment_results=list()
tab1=c()
for(ont in c("BP","MF","CC")){
sampleGOdata<-suppressMessages(new("topGOdata",description="Simple session",ontology=ont,allGenes=as.factor(testgeneList),
nodeSize=10,annot=annFUN.org,mapping="org.Hs.eg.db",ID="entrez"))
resultTopGO.elim<-suppressMessages(runTest(sampleGOdata,algorithm="elim",statistic="Fisher"))
resultTopGO.classic<-suppressMessages(runTest(sampleGOdata,algorithm="classic",statistic="Fisher"))
tab1<-rbind(tab1,GenTable(sampleGOdata,Fisher.elim=resultTopGO.elim,Fisher.classic=resultTopGO.classic,orderBy="Fisher.elim",
topNodes=200))
}
gene_erichment_results[[c1]][["topGO"]]=tab1
x<-suppressMessages(enrichDO(gene=names(testgeneList)[testgeneList==1],ont="DO",pvalueCutoff=1,pAdjustMethod="BH",universe=names(testgeneList),
minGSSize=5,maxGSSize=500,readable=T))
gene_erichment_results[[c1]][["DO"]]=x
dgn<-suppressMessages(enrichDGN(names(testgeneList)[testgeneList==1]))
gene_erichment_results[[c1]][["DGN"]]=dgn
}
gene_erichment_results[["8"]][["topGO"]][1:5,]
gene_erichment_results[["1"]][["topGO"]][1:5,]
GO.ID Term Annotated Significant Expected Rank in Fisher.classic Fisher.elim Fisher.classic
1 GO:0019221 cytokine-mediated signaling pathway 516 22 4.15 3 4.5e-05 1.4e-10
2 GO:0045059 positive thymic T cell selection 11 3 0.09 55 7.9e-05 8.7e-05
3 GO:0050850 positive regulation of calcium-mediated ... 30 4 0.24 61 9.1e-05 0.00010
4 GO:0033209 tumor necrosis factor-mediated signaling... 98 6 0.79 70 0.00013 0.00016
5 GO:0002250 adaptive immune response 301 11 2.42 45 0.00040 3.8e-05
可視化
library(enrichplot)
dotplot(gene_erichment_results[[1]][["DGN"]], showCategory=30)
## categorySize can be scaled by 'pvalue' or 'geneNum'
p1<-cnetplot(gene_erichment_results[[1]][["DGN"]], categorySize="pvalue", foldChange=geneList)
p2<-cnetplot(gene_erichment_results[[1]][["DGN"]], foldChange=geneList, circular = TRUE, colorEdge = TRUE)
plot_grid(p1, p2, ncol=2)
seurat v3
SeqGeq Plugins
essential_commands
R語言中S4類對象的輸出方法
Salmon
Seurat - Guided Clustering Tutorial
第六章 scRNA-seq數(shù)據(jù)分析 · 生物信息學生R入門教程
computational-methods-for-single-cell-data-analysis
enrichplot