目前主流的免疫浸潤(rùn)計(jì)算方法是CIBERSORT和ssgsea悯辙,今天介紹CIBERSORT。
1.輸入數(shù)據(jù)要什么
下面這段話摘自CIBERSORT的介紹
Importantly, all expression data should be non-negative, devoid of missing values, and represented in non-log linear space.
For Affymetrix microarrays, a custom chip definition file (CDF) is recommended (see Subheading 3.2.2) and should be normalized with MAS5 or RMA.
Illumina Beadchip and single color Agilent arrays should be processed as described in the limma package.
Standard RNA-Seq expression quantification metrics, such as frag- ments per kilobase per million (FPKM) and transcripts per kilobase million (TPM), are suitable for use with CIBERSORT. –《Profiling Tumor Infiltrating Immune Cells with CIBERSORT》
非常清楚的寫(xiě)出了輸入數(shù)據(jù)的要求:
1.不可以有負(fù)值和缺失值
2.不要取log
3.如果是芯片數(shù)據(jù)睦尽,昂飛芯片使用RMA標(biāo)準(zhǔn)化刹缝,Illumina 的Beadchip 和Agilent的單色芯片糕殉,用limma處理。
4.如果是RNA-Seq表達(dá)量驹针,使用FPKM和TPM都很合適烘挫。
芯片的要求可能把你唬住了,GEO常規(guī)的表達(dá)矩陣都是這樣得到的,直接下載使用即可饮六。注意有的表達(dá)矩陣下載下來(lái)就已經(jīng)取過(guò)log其垄,需要逆轉(zhuǎn)回去。有的經(jīng)過(guò)了標(biāo)準(zhǔn)化或者有負(fù)值卤橄,需要處理原始數(shù)據(jù)绿满,前面寫(xiě)過(guò)介紹文了。
2.來(lái)一個(gè)示例
找了一個(gè)轉(zhuǎn)錄組fpkm數(shù)據(jù)窟扑。
a = read.delim("GSE201050_4_genes_fpkm_expression.txt.gz",check.names = F)
library(stringr)
exp = a[,str_starts(colnames(a),"FPKM")]
k = !duplicated(a$gene_name);table(k)
## k
## FALSE TRUE
## 7089 43030
exp = exp[k,]
rownames(exp) = unique(a$gene_name)
colnames(exp) = str_remove(colnames(exp),"FPKM.")
exp[1:4,1:4]
## CK_1 CK_2 CK_3 D_2_1
## TSPAN6 1.931311 2.10492200 2.6443970 3.6337456
## TNMD 0.000000 0.03177236 0.0000000 0.0000000
## DPM1 61.199326 60.15538342 63.9972350 67.9992979
## SCYL3 1.218088 1.09913176 0.8558017 0.9523167
2.3 做成cibersort要求的輸入文件
需要兩個(gè)輸入文件:
一個(gè)是表達(dá)矩陣文件
一個(gè)是R包里的內(nèi)置數(shù)據(jù)LM22.txt喇颁,記錄了22種免疫細(xì)胞的基因表達(dá)特征數(shù)據(jù)。
由于讀取文件的代碼比較粗暴嚎货,為了適應(yīng)它橘霎,導(dǎo)出文件之前需要把行名變成一列。不然后面就會(huì)有報(bào)錯(cuò)殖属。
library(tidyverse)
exp2 = as.data.frame(exp)
exp2 = rownames_to_column(exp2)
write.table(exp2,file = "exp.txt",row.names = F,quote = F,sep = "\t")
2.4. 運(yùn)行CIBERSORT
f = "ciber_GSE201050.Rdata"
if(!file.exists(f)){
#devtools:: install_github ("Moonerss/CIBERSORT")
library(CIBERSORT)
lm22f = system.file("extdata", "LM22.txt", package = "CIBERSORT")
TME.results = cibersort(lm22f,
"exp.txt" ,
perm = 1000,
QN = T)
save(TME.results,file = f)
}
load(f)
TME.results[1:4,1:4]
## B cells naive B cells memory Plasma cells T cells CD8
## CK_1 0.047478089 0.000000000 0.013099049 0.035400078
## CK_2 0.004488084 0.000000000 0.004812423 0.025224919
## CK_3 0.000000000 0.003064788 0.001037858 0.013915937
## D_2_1 0.003727559 0.000000000 0.000000000 0.001672263
re <- TME.results[,-(23:25)]
運(yùn)行有些慢茎毁。計(jì)算出來(lái)的結(jié)果包含了22種免疫細(xì)胞的豐度,還有三列其他統(tǒng)計(jì)量忱辅,不管它們七蜘。
2.5. 經(jīng)典的免疫細(xì)胞豐度熱圖
那些在一半以上樣本里豐度為0的免疫細(xì)胞,就不展示在熱圖里了墙懂。我看了一下這個(gè)熱圖橡卤,從聚類的情況來(lái)看,不同分組之間沒(méi)有很好的分開(kāi)损搬。
library(pheatmap)
k <- apply(re,2,function(x) {sum(x == 0) < nrow(TME.results)/2})
table(k)
## k
## FALSE TRUE
## 14 8
re2 <- as.data.frame(t(re[,k]))
Group = str_sub(colnames(exp),1,str_length(colnames(exp))-2)
table(Group)
## Group
## CK D_2 D_3 DP_3 DP_3_1 YS
## 3 3 3 3 3 3
an = data.frame(group = Group,
row.names = colnames(exp))
pheatmap(re2,scale = "row",
show_colnames = F,
cluster_cols = F,
annotation_col = an,
color = colorRampPalette(c("navy", "white", "firebrick3"))(50))
2.6. 經(jīng)典柱狀圖
可以展示出每個(gè)樣本的免疫細(xì)胞比例
library(RColorBrewer)
mypalette <- colorRampPalette(brewer.pal(8,"Set1"))
dat <- re %>%
as.data.frame() %>%
rownames_to_column("Sample") %>%
mutate(group = Group) %>%
gather(key = Cell_type,value = Proportion,-Sample,-group) %>%
arrange(group)
dat$Sample = factor(dat$Sample,ordered = T,levels = unique(dat$Sample)) #定橫坐標(biāo)順序
# 先把group排序碧库,然后將sample設(shè)為了因子,確定排序后的順序?yàn)樗角汕冢詢蓤D的順序是對(duì)應(yīng)的嵌灰。
dat2 = data.frame(a = 1:ncol(exp),
b = 1,
group = sort(Group))
p1 = ggplot(dat2,aes(x = a, y = b)) +
geom_tile(aes(fill = group)) +
scale_fill_manual(values = mypalette(22)[1:length(unique(Group))]) +
theme(panel.grid = element_blank(),
panel.background = element_blank(),
axis.line = element_blank(),
axis.ticks = element_blank(),
axis.text = element_blank(),
axis.title = element_blank()) +
scale_x_continuous(expand = c(0, 0)) +
labs(fill = "Group")
p2 = ggplot(dat,aes(Sample, Proportion,fill = Cell_type)) +
geom_bar(stat = "identity") +
labs(fill = "Cell Type",x = "",y = "Estiamted Proportion") +
theme_bw() +
theme(#axis.text.x = element_blank(),
axis.ticks.x = element_blank()
) +
scale_y_continuous(expand = c(0.01,0)) +
scale_fill_manual(values = mypalette(22))
library(patchwork)
p1 / p2 + plot_layout(heights = c(1,10),guides = "collect" ) &
theme(legend.position = "bottom")
3.7 箱線圖
展示免疫細(xì)胞之間的比較。
ggplot(dat,aes(Cell_type,Proportion,fill = Cell_type)) +
geom_boxplot() +
theme_bw() +
labs(x = "Cell Type", y = "Estimated Proportion") +
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
legend.position = "bottom") +
scale_fill_manual(values = mypalette(22))
每種免疫細(xì)胞不同分組的結(jié)果做個(gè)比較:
畫(huà)出全部在組間差異顯著的細(xì)胞
# 全是0的行去掉
k = colSums(re)>0;table(k)
## k
## FALSE TRUE
## 2 20
re = re[,k]
library(tinyarray)
draw_boxplot(t(re),factor(Group),
drop = T,
color = mypalette(length(unique(Group))))+
labs(x = "Cell Type", y = "Estimated Proportion")
單畫(huà)某一個(gè)感興趣的免疫細(xì)胞
dat2 = dat[dat$Cell_type=="Eosinophils",]
library(ggpubr)
ggplot(dat2,aes(Group,Proportion,fill = Group)) +
geom_boxplot() +
theme_bw() +
labs(x = "Group", y = "Estimated Proportion") +
theme(legend.position = "top") +
scale_fill_manual(values = mypalette(length(unique(Group))))+ stat_compare_means(aes(group = Group,label = ..p.signif..),method = "kruskal.test")
搞掂~