免疫組庫分析實戰(zhàn)/mixcr+vdjtools+R實現(xiàn)

使用《OncoImmunology》一篇paper中的數(shù)據(jù)進行實戰(zhàn)


1.下載數(shù)據(jù)

1.SRR_Acc_List.txt (所有測序數(shù)據(jù)的樣本號魁淳,直接下載得到)
2.SraRunTable.txt (樣本的分組信息低散,后面數(shù)據(jù)分析會用到)

#linux
prefetch -O ./rawdata --option-file ./SRR_Acc_List.txt #批量下載數(shù)據(jù)
cat SRR_Acc_List.txt | while read id; do fastq-dump --split-files --gzip -O ./rawdata/ ./rawdata/${id}/${id}.sra;done # sra to fastq
fastqc -o ./qc ./rawdata/*gz #質控

詳細方法參考:RNA-seq轉錄組差異分析及可視化

2.使用MIXCR構建免疫組庫

#只分析TRB
cat SRR_Acc_List.txt|while read name
do 
mixcr analyze amplicon \
-s hsa \
--starting-material dna \
--5-end v-primers \
--3-end c-primers \
--adapters adapters-present \
--only-productive \
--receptor-type trb \
--report ./reports/${name}.report \
./rawdata/${name}_1.fastq.gz ./rawdata/${name}_2.fastq.gz ./results/${name}
done

詳細方法參考:使用mixcr構建免疫組庫及下游分析

3.VDJTOOLS分析

這一步需要用到之前下載的SraRunTable.txt蜒程,處理得到想要的分組文件级零。

分組文件

前兩列必須為“file_name”,"sample_id",后面列可根據(jù)需求進行分組
第一列是MIXCR恢復出來數(shù)據(jù)的路徑囤踩,第二列為樣本名稱

vdjtools=~/project/biosoft/vdjtools/vdjtools-1.2.1/vdjtools-1.2.1.jar
mkdir ./results/TRB/VDJconvert/
mkdir ./results/vdjtools-results/
java -jar $vdjtools Convert -S mixcr -m ./results/TRB/metadata.txt ./results/TRB/VDJconvert/ 
java -jar $vdjtools CalcBasicStats -m ./results/TRB/VDJconvert/metadata.txt ./results/vdjtools-results/
java -jar $vdjtools CalcSegmentUsage -m ./results/TRB/VDJconvert/metadata.txt ./results/vdjtools-results/
java -jar $vdjtools RarefactionPlot -m ./results/TRB/VDJconvert/metadata.txt ./results/vdjtools-results/
java -jar $vdjtools CalcDiversityStats -m ./results/TRB/VDJconvert/metadata.txt ./results/vdjtools-results/

轉換格式后在目標文件夾會生成一個新的metadata.txt珍特,后續(xù)分析使用此分組文件祝峻。
共得到6個結果文件分別為basicstats.txtdiversity.strict.exact.txt扎筒、diversity.strict.resampled.txt呼猪、rarefaction.strict.txtsegments.wt.J.txt砸琅、segments.wt.V.txt

4.使用R包進行可視化處理

4.1 clonality diversity richness

inputpath <- "./results/vdjtools-results/"
outputpath <- "./R-results/"
library(ggplot2)
library(reshape2)
library(ggpubr)
df <- read.csv(paste0(inputpath,"diversity.strict.exact.txt"), sep = "\t",stringsAsFactors = FALSE)
df$real_diversity <- log(df$shannonWienerIndex_mean) 
df$clonality <- 1 - df$normalizedShannonWienerIndex_mean 
work_df <- df
sample_name <- work_df$sample_id
Group <- work_df$sample_type
richness<- work_df$observedDiversity_mean
diversity <- work_df$real_diversity
clonality <- work_df$clonality
work_df <- data.frame(Group,richness,diversity,clonality)
work_df$Group <- factor(work_df$Group)  
work_res<- melt(work_df, id.vars = "Group")
work_richness <- work_res[which(work_res$variable=="richness"),]
work_diversity <- work_res[which(work_res$variable=="diversity"),]
work_clonality <- work_res[which(work_res$variable=="clonality"),]
p1 <- ggboxplot(work_diversity, x = "Group", y = "value",
                legend = "right",color = "black", palette = "jama",
                add = "jitter",add.params=list(color = "Group",size=3),
                title = "Diversity",  xlab = "",size = 0.5,width =0.5,
                ylab = "Shannon Diversity Index of \n TCR repertoire") +
  theme(axis.text.x =element_text(size=13), axis.title.y=element_text(size=16,face="bold"))
p1 + stat_compare_means(aes(group = work_df$Group), 
                        label.x = 2,label = "p.signif",
                        method = 'wilcox.test',hide.ns = TRUE)
ggsave(paste0(outputpath,"Diversity.png"))
dev.off()
p2 <- ggboxplot(work_richness, x = "Group", y = "value",
                legend = "right",color = "black", palette = "jama",
                add = "jitter",add.params=list(color = "Group",size=3),
                title = "Richness",  xlab = "",size = 0.5,width =0.5,
                ylab = "Richness of \n TCR repertoire")+
  theme(axis.text.x =element_text(size=13), axis.title.y=element_text(size=16,face="bold"))
p2 + stat_compare_means(aes(group =work_df$Group), 
                        label.x = 2,label = "p.signif",
                        method = 'wilcox.test',hide.ns = TRUE)
ggsave(paste0(outputpath,"Richness.png"))
dev.off()
p3 <- ggboxplot(work_clonality, x = "Group", y = "value",
                legend = "right",color = "black", palette = "jama",
                add = "jitter",add.params=list(color = "Group",size=3),
                title = "Clonality",  xlab = "",size = 0.5,width =0.5,
                ylab = "Clonality Index of \n TCR repertoire")+
  theme(axis.text.x =element_text(size=13), axis.title.y=element_text(size=16,face="bold"))
p3 + stat_compare_means(aes(group =work_df$Group),  
                        label = "p.signif",label.x = 2,
                        method = 'wilcox.test',hide.ns = TRUE)
ggsave(paste0(outputpath,"Clonality.png"))
dev.off()
Clonality

Diversity

Richness

4.2Gene Usage熱圖繪制

inputpath <- "./results/vdjtools-results/"
outputpath <- "./R-results/"
library(pheatmap)
file_input <- c(paste0(inputpath,"segments.wt.V.txt"),
                paste0(inputpath,"segments.wt.J.txt"))
file_output <- c(paste0(outputpath,"V_heatmap.png"),paste0(outputpath,"J_heatmap.png")) 
for (i in 1:length(file_input)){
  df <- read.csv(file_input[i],sep="\t",row.names = 1)
  annotation_col <- data.frame(df$sample_type)
  colnames(annotation_col) <- "Type"
  rownames(annotation_col) <- rownames(df)
  work_df <- df
  work_df <- work_df[,-1:-2] # check this and change
  work_df <- t(work_df)
  work_df <- work_df[rowSums(work_df)!=0,]
  pheatmap(work_df,filename = file_output[i],
           annotation_col = annotation_col,
           cluster_cols = FALSE, #height = 10,
           display_numbers = FALSE,labels_col ="" )# check this and change
}
V_heatmap

J_heatmap

4.3 VJ基因的差異分析及火山圖繪制

mymetadata <- read.csv("./results/TRB/VDJconvert/metadata.txt",sep="\t",stringsAsFactors = F)
sample_name <- mymetadata$sample_id
vgene_matrix <- function(df){
  vgene_name <- c()
  for (i in 1:nrow(df)){
    if (df[i,"v"] %in% names(vgene_name)){
      vgene_name[df[i,"v"]] <- vgene_name[df[i,"v"]] + df[i,"count"]
    } else{
      vgene_name[df[i,"v"]] <- df[i,"count"]
    }
  }
  return(vgene_name)
}
jgene_matrix <- function(df){
  jgene_name <- c()
  for (i in 1:nrow(df)){
    if (df[i,"j"] %in% names(jgene_name)){
      jgene_name[df[i,"j"]] <- jgene_name[df[i,"j"]] + df[i,"count"]
    } else{
      jgene_name[df[i,"j"]] <- df[i,"count"]
    }
  }
  return(jgene_name)
}
VGeneMatrix <- data.frame()
JGeneMatrix <- data.frame()
matrix_V <- function(df,V,sample_name){
  for (i in 1:nrow(df)){
    if (df[i,"v"] %in% colnames(V)){
      V[is.na(V)] <- 0
      V[sample_name[n],df[i,"v"]] <- V[sample_name[n],df[i,"v"]] + df[i,"count"]
    } else{
      V[sample_name[n],df[i,"v"]] <- df[i,"count"]
    }
  }
  return(V)
}
matrix_J <- function(df,J,sample_name){
  for (i in 1:nrow(df)){
    if (df[i,"j"] %in% colnames(J)){
      J[is.na(J)] <- 0
      J[sample_name[n],df[i,"j"]] <- J[sample_name[n],df[i,"j"]] + df[i,"count"]
    } else{
      J[sample_name[n],df[i,"j"]] <- df[i,"count"]
    }
  }
  return(J)
}

for (n in 1:length(sample_name)){
  df <- read.csv(paste0("./results/TRB/VDJconvert/",sample_name[n],".txt"),sep="\t",stringsAsFactors = F)
  # assign(paste0(sample_name[n],"_Vgene"),vgene_matrix(df))
  # assign(paste0(sample_name[n],"_Jgene"),jgene_matrix(df)) 每個樣本的表達情況
  VGeneMatrix <- matrix_V(df,VGeneMatrix,sample_name)
  JGeneMatrix <- matrix_J(df,JGeneMatrix,sample_name)
}
write.table(VGeneMatrix,file = paste0(outputpath,"matrix_V.txt"),sep="\t",quote = F)
write.table(JGeneMatrix,file = paste0(outputpath,"matrix_J.txt"),sep="\t",quote = F)

#########DESeq2#######

library(DESeq2)
library(ggpubr)
library(ggthemes)
mycounts <- t(JGeneMatrix)
mycounts <- mycounts[rowSums(mycounts)!=0,]
mymetas <- mymetadata[,2:3]
colnames(mycounts) == mymetas$sample_id
mymetas$sample_type <- factor(mymetas$sample_type)

dds <- DESeqDataSetFromMatrix(countData = mycounts, colData = mymetas,desig=~sample_type)
dds <- DESeq(dds)
res <- results(dds,contrast = c("sample_type","healthy","hematological_cancer"))
res <- data.frame(res)
DEG <- res[order(res$padj),] 


DEG$change =as.factor(ifelse(DEG$pvalue < 0.05 & DEG$log2FoldChange > 0.5,'UP',
                             
                             ifelse(DEG$pvalue < 0.05 & DEG$log2FoldChange < -0.5,'DOWN',
                                    
                                    "NOT")))
DEG$logp = -log10(DEG$pvalue)


table(DEG$change)

DEG <- DEG[order(DEG$pvalue),]
up_genes <- head(row.names(DEG)[which(DEG$change == "UP")],10)
down_genes <- head(row.names(DEG)[which(DEG$change =="DOWN")],10)
DEG_top <- c(as.character(up_genes),as.character(down_genes))
DEG$label  <- ""
DEG$label[match(DEG_top,row.names(DEG))] <- DEG_top

ggscatter(DEG, x = "log2FoldChange", y = "logp",
          color = "change",
          palette = c("#2f5688","#BBBBBB","#CC0000"),
          size = 6,
          label = DEG$label,
          font.label = 10,
          repel = T,
          xlab = "log2FoldChange",
          ylab ="-log10(p value)") + theme_base() +
  geom_hline(yintercept = -log10(0.05),linetype="dashed")+
  geom_vline(xintercept = c(-0.5,0.5),linetype="dashed")

ggsave("./results/J_gene_valcano.png",dpi=600)
dev.off()
V_gene_valcano

J_gene_valcano
最后編輯于
?著作權歸作者所有,轉載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末轴踱,一起剝皮案震驚了整個濱河市症脂,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌,老刑警劉巖诱篷,帶你破解...
    沈念sama閱讀 217,277評論 6 503
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件壶唤,死亡現(xiàn)場離奇詭異,居然都是意外死亡棕所,警方通過查閱死者的電腦和手機闸盔,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 92,689評論 3 393
  • 文/潘曉璐 我一進店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來琳省,“玉大人迎吵,你說我怎么就攤上這事≌氡幔” “怎么了击费?”我有些...
    開封第一講書人閱讀 163,624評論 0 353
  • 文/不壞的土叔 我叫張陵,是天一觀的道長桦他。 經(jīng)常有香客問我蔫巩,道長,這世上最難降的妖魔是什么快压? 我笑而不...
    開封第一講書人閱讀 58,356評論 1 293
  • 正文 為了忘掉前任圆仔,我火速辦了婚禮,結果婚禮上蔫劣,老公的妹妹穿的比我還像新娘坪郭。我一直安慰自己,他們只是感情好拦宣,可當我...
    茶點故事閱讀 67,402評論 6 392
  • 文/花漫 我一把揭開白布截粗。 她就那樣靜靜地躺著,像睡著了一般鸵隧。 火紅的嫁衣襯著肌膚如雪绸罗。 梳的紋絲不亂的頭發(fā)上,一...
    開封第一講書人閱讀 51,292評論 1 301
  • 那天豆瘫,我揣著相機與錄音珊蟀,去河邊找鬼。 笑死外驱,一個胖子當著我的面吹牛育灸,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播昵宇,決...
    沈念sama閱讀 40,135評論 3 418
  • 文/蒼蘭香墨 我猛地睜開眼磅崭,長吁一口氣:“原來是場噩夢啊……” “哼!你這毒婦竟也來了瓦哎?” 一聲冷哼從身側響起砸喻,我...
    開封第一講書人閱讀 38,992評論 0 275
  • 序言:老撾萬榮一對情侶失蹤柔逼,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后割岛,有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體愉适,經(jīng)...
    沈念sama閱讀 45,429評論 1 314
  • 正文 獨居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 37,636評論 3 334
  • 正文 我和宋清朗相戀三年癣漆,在試婚紗的時候發(fā)現(xiàn)自己被綠了维咸。 大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點故事閱讀 39,785評論 1 348
  • 序言:一個原本活蹦亂跳的男人離奇死亡惠爽,死狀恐怖癌蓖,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情疆股,我是刑警寧澤费坊,帶...
    沈念sama閱讀 35,492評論 5 345
  • 正文 年R本政府宣布,位于F島的核電站旬痹,受9級特大地震影響附井,放射性物質發(fā)生泄漏。R本人自食惡果不足惜两残,卻給世界環(huán)境...
    茶點故事閱讀 41,092評論 3 328
  • 文/蒙蒙 一永毅、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧人弓,春花似錦沼死、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 31,723評論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽。三九已至健芭,卻和暖如春县钥,著一層夾襖步出監(jiān)牢的瞬間,已是汗流浹背慈迈。 一陣腳步聲響...
    開封第一講書人閱讀 32,858評論 1 269
  • 我被黑心中介騙來泰國打工若贮, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留,地道東北人痒留。 一個月前我還...
    沈念sama閱讀 47,891評論 2 370
  • 正文 我出身青樓谴麦,卻偏偏與公主長得像,于是被迫代替她去往敵國和親伸头。 傳聞我的和親對象是個殘疾皇子匾效,可洞房花燭夜當晚...
    茶點故事閱讀 44,713評論 2 354

推薦閱讀更多精彩內(nèi)容