復(fù)現(xiàn)1:AD的GWAS位點關(guān)聯(lián)基因挖掘

  • Title:Mapping Alzheimer's Disease Variants to Their Target Genes Using Computational Analysis of Chromatin Configuration
  • PMID:31984958
  • Data/Code availability:Yes
  • IF:1.163
  • 其它:paper來自JOVE的視頻期刊,第一次了解,感覺挺有意思的
  • 分析思路:如下屁倔,思路較為清晰明了,簡單易行庆聘。首先根據(jù)其它文章里的AD GWAS數(shù)據(jù)找到關(guān)聯(lián)基因(外顯子、啟動子勺卢、hiC的enhancer)伙判,其中通過hiC的enhancer,尋找關(guān)聯(lián)基因是本文的亮點所在黑忱。然后就是根據(jù)關(guān)聯(lián)基因進(jìn)行富集分析宴抚、不同AD分組差異表達(dá)分析、細(xì)胞類型差異表達(dá)分析甫煞。


    paper思路
  • 代碼實操(R version 4.0.3 )

0菇曲、安裝包、下載數(shù)據(jù)

(1) 相關(guān)R包

BiocManager::install("GenomicRanges")
BiocManager::install("biomaRt")
BiocManager::install("WGCNA")
install.packages("reshape")
install.packages("ggplot2")
install.packages("corrplot")
install.packages("gProfileR")
install.packages("tidyverse")
install.packages("ggpubr")

(2)所有數(shù)據(jù)均可下載

1、制作GRanges對象(SNP萧朝、exon岔留、promoter、HiC-enhancer)

  • GenomicRanges包的GRanges()函數(shù)創(chuàng)建
library(GenomicRanges) 
#SNP
credSNP = read.delim("Supplementary_Table_8_Jansen.txt", header=T,encoding = "UTF-8") 
#2357   19
credSNP = credSNP[credSNP$Credible.Causal=="Yes",]
#800    19
credranges = GRanges(credSNP$Chr, IRanges(credSNP$bp, credSNP$bp), rsid=credSNP$SNP, P=credSNP$P) 

#promoter and exonic region
exon = read.table("Gencode19_exon.bed")
exonranges = GRanges(exon[,1],IRanges(exon[,2],exon[,3]),gene=exon[,4])
promoter = read.table("Gencode19_promoter.bed")
promoterranges = GRanges(promoter[,1], IRanges(promoter[,2], promoter[,3]), gene=promoter[,4])

#HiC region
hic = read.table("Promoter-anchored_chromatin_loops.bed", skip=1)
colnames(hic) = c("chr", "TSS_start", "TSS_end", "Enhancer_start", "Enhancer_end")
hicranges = GRanges(hic$chr, IRanges(hic$TSS_start, hic$TSS_end), enhancer=hic$Enhancer_start)

2检柬、尋找SNP與其它三種feature的Overlap

  • GenomicRanges包的findOverlaps()函數(shù)找overlap
  • findOverlaps(A, B)献联。其中A對應(yīng)結(jié)果的queryHits,B對應(yīng)結(jié)果的subjectHits
#(1)Overlap credible SNPs with exonic regions
olap = findOverlaps(credranges, exonranges)
credexon = credranges[queryHits(olap)]
#add gene info
mcols(credexon) = cbind(mcols(credexon), mcols(exonranges[subjectHits(olap)]))
#GRanges object with 42 ranges and 3 metadata columns:
#       seqnames    ranges strand |        rsid         P            gene
#          <Rle> <IRanges>  <Rle> | <character> <numeric>     <character>
#   [1]        1 161155392      * |   rs4575098  2.05e-10 ENSG00000158859
#   [2]        1 161156033      * |  rs11585858  5.58e-10 ENSG00000158859
#   [3]        1 207795320      * |   rs2296160  1.66e-17 ENSG00000203710


#(2)Overlap credible SNPs with promoter regions
olap = findOverlaps(credranges, promoterranges)
credpromoter = credranges[queryHits(olap)]
# add gene info
mcols(credpromoter) = cbind(mcols(credpromoter), 
mcols(promoterranges[subjectHits(olap)]))
#GRanges object with 236 ranges and 3 metadata columns:
#        seqnames    ranges strand |        rsid         P            gene
#           <Rle> <IRanges>  <Rle> | <character> <numeric>     <character>
#    [1]        2 234068476      * |  rs35349669  4.85e-09 ENSG00000168918
#    [2]        2 234068476      * |  rs35349669  4.85e-09 ENSG00000168918
#    [3]        2 234068705      * |  rs28669088  8.84e-09 ENSG00000168918

#(3)Overlap credible SNPs with hic-enhancer
## 先找hic interaction里與promoter存在overlap的loop
olap = findOverlaps(hicranges, promoterranges) 
hicpromoter = hicranges[queryHits(olap)]
mcols(hicpromoter) = cbind(mcols(hicpromoter), mcols(promoterranges[subjectHits(olap)]))
hicenhancer = GRanges(seqnames(hicpromoter), IRanges(hicpromoter$enhancer, hicpromoter$enhancer+10000), gene=hicpromoter$gene)
## 再基于上一步結(jié)果尋找與含promoter的hic的另一端"enhancer"存在重疊的variant
olap = findOverlaps(credranges, hicenhancer)
credhic = credranges[queryHits(olap)]
#add gene info
mcols(credhic) = cbind(mcols(credhic), mcols(hicenhancer[subjectHits(olap)]))
  • Integrate the 3 types GWAS targeted gene
ADgenes = Reduce(union, list(credhic$gene, credexon$gene, credpromoter$gene))
### to convert Ensembl Gene ID to HGNC symbol load("geneAnno.rda")
load("geneAnno2.rda")
ADhgnc = geneAnno1[match(ADgenes, geneAnno1$ensembl_gene_id), "hgnc_symbol"]
ADhgnc = ADhgnc[ADhgnc!=""]
save(ADgenes, ADhgnc, file="ADgenes.rda")
write.table(ADhgnc, file="ADgenes.txt", row.names=F, col.names=F, quote=F, sep="\t")
112 genes
library(reshape2)
library(ggplot2)
library(GenomicRanges)
library(biomaRt) 
library(WGCNA)

3何址、112個AD risk基因在Bulk RNA-seq的分組比較

#(1)整理表達(dá)矩陣
datExpr = read.csv("expression_matrix.csv", header = FALSE) 
datExpr = datExpr[,-1]  #17604  492
datMeta = read.csv("columns_metadata.csv") 
datProbes = read.csv("rows_metadata.csv")
datExpr = datExpr[datProbes$ensembl_gene_id!="",]   #17085  492
datProbes = datProbes[datProbes$ensembl_gene_id!="",]

#利用WGCGA包的collapseRows函數(shù)酱固,針對重復(fù)基因的測序數(shù)據(jù),取代表性的一個
datExpr.cr = collapseRows(datExpr, rowGroup = datProbes$ensembl_gene_id, rowID= rownames(datExpr))
class(datExpr.cr)
datExpr = datExpr.cr$datETcollapsed
gename = data.frame(datExpr.cr$group2row)
rownames(datExpr) = gename$group
#16279  492

# (2)分組信息
#Specify developmental stages
datMeta$Unit = "Postnatal"
idx = grep("pcw", datMeta$age) #postconceptional weeks
datMeta$Unit[idx] = "Prenatal" #產(chǎn)前

idx = grep("yrs", datMeta$age)
datMeta$Unit[idx] = "Postnatal" #產(chǎn)后
datMeta$Unit = factor(datMeta$Unit, levels=c("Prenatal", "Postnatal"))

# (3)選擇大腦皮層區(qū)域的測序樣本
datMeta$Region = "SubCTX"
r = c("A1C", "STC", "ITC", "TCx", "OFC", "DFC", "VFC", "MFC", "M1C", "S1C", "IPC", "M1C-S1C", "PCx", "V1C", "Ocx")
datMeta$Region[datMeta$structure_acronym %in% r] = "CTX"
datExpr = datExpr[,which(datMeta$Region=="CTX")]
#16279  345
datMeta = datMeta[which(datMeta$Region=="CTX"),]
#345  10

# (4)觀察112個risk gene在兩組的平均表達(dá)差異
load("ADgenes.rda")
#計算兩組的均值(postnatal cortex與prenatal cortex)
exprdat = apply(datExpr[match(ADgenes, rownames(datExpr)),],2,mean,na.rm=T)
dat = data.frame(Region=datMeta$Region, Unit=datMeta$Unit, Expr=exprdat)

ggplot(dat,aes(x=Unit, y=Expr, fill=Unit, alpha=Unit)) + 
    geom_boxplot(outlier.size = NA) + 
    ggtitle("Brain Expression") + ylab("Normalized expression") + 
    xlab("") + scale_alpha_manual(values=c(0.2, 1)) + 
    theme_classic() + theme(legend.position="na")

4头朱、觀察112個risk gene在不同細(xì)胞類型的表達(dá)差異

#scRNA-seq 表達(dá)矩陣整理
cellexp = read.table("DER-20_Single_cell_expression_processed_TPM_backup.tsv",header=T,fill=T) 
cellexp[1121,1] = cellexp[1120,1] 
cellexp = cellexp[-1120,] 
rownames(cellexp) = cellexp[,1] 
cellexp = cellexp[,-1] 
datExpr = scale(cellexp,center=T, scale=F) 
datExpr = datExpr[,789:ncol(datExpr)] #15086    3461

#抽取112個gene的表達(dá)信息
targetname = "AD"
targetgene = ADhgnc
exprdat = apply(datExpr[match(targetgene, rownames(datExpr)),],2,mean,na.rm=T)
dat = data.frame(Group=targetname, cell=names(exprdat), Expr=exprdat)

#整理細(xì)胞類型
dat = dat[-grep("Ex|In",dat$celltype),] #不要神經(jīng)元細(xì)胞
dat$celltype = gsub("Dev","Fetal",dat$celltype) #Dev替換為Fetal
dat$celltype = factor(dat$celltype, levels=c("Neurons","Astrocytes","Microglia","Endothelial", "Oligodendrocytes","OPC","Fetal"))

ggplot(dat,aes(x=celltype, y=Expr, fill=celltype)) +
    ylab("Normalized expression") + xlab("") + 
    geom_violin() + 
    theme(axis.text.x=element_text(angle = 90, hjust=1)) + 
    theme(legend.position="none") +
    ggtitle(paste0("Cellular expression profiles of AD risk genes"))
#AD risk genes are highly expressed in microglia
  • 如下圖結(jié)果展示,有多種多樣的基因富集思路可供選擇龄减。


4项钮、112個AD risk基因的富集分析(homer軟件)

  • 4.1 安裝、下載、分析
#用conda安裝
conda install -c bioconda homer
#下載背景data
perl /home/shensuo/miniconda3/envs/tmp/share/homer/.//configureHomer.pl -install
perl /home/shensuo/miniconda3/envs/tmp/share/homer/.//configureHomer.pl -install human-p 
perl /home/shensuo/miniconda3/envs/tmp/share/homer/.//configureHomer.pl -install human-o
#富集分析
findGO.pl ~/work/ADgenes.txt human ./go/ -cpu 5
富集分析結(jié)果
  • 4.2 可視化
library(ggpubr)
plot_barplot = function(dbname,name,color){
    input = read.delim(paste0(dbname,".txt"),header=T) 
    input = input[,c(-1,-10,-11)] 
    input = unique(input)
    input$FDR = p.adjust(exp(input$logP)) 
    input_sig = input[input$FDR < 0.1,] 
    input_sig$FDR = -log10(input_sig$FDR) 
    input_sig = input_sig[order(input_sig$FDR),]
    p = ggbarplot(input_sig, x = "Term", y = "FDR", fill = color, 
        color = "white", sort.val = "asc", 
        ylab = expression(-log[10](italic(FDR))), xlab = paste0(name," Terms"), 
        rotate = TRUE, label = paste0(input_sig$Target.Genes.in.Term,"/",input_sig$Genes.in.Term), 
        font.label = list(color = "white", size = 9), lab.vjust = 0.5, lab.hjust = 1)
    p = p+geom_hline(yintercept = -log10(0.05), linetype = 2, color = "lightgray")
    return(p)
}
#以GO結(jié)果為例
plot_barplot("biological_process","GO Biological Process","#00AFBB") 
GO enrichment

以上就是本篇paper全部的分析流程烁巫,的確很簡單署隘,對于我來說的收獲有

  • 1、GWAS的SNP與HiC的聯(lián)合分析思路
  • 2亚隙、Homer軟件進(jìn)行基因富集分析的強大之處(進(jìn)一步學(xué)習(xí))
  • 3磁餐、WGCNA包里的collapseRows()函數(shù)處理重復(fù)基因測序的方法
  • 4、GenomicRanges包找重疊區(qū)域的方法
    .......
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末阿弃,一起剝皮案震驚了整個濱河市诊霹,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌渣淳,老刑警劉巖脾还,帶你破解...
    沈念sama閱讀 206,013評論 6 481
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場離奇詭異入愧,居然都是意外死亡鄙漏,警方通過查閱死者的電腦和手機,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 88,205評論 2 382
  • 文/潘曉璐 我一進(jìn)店門棺蛛,熙熙樓的掌柜王于貴愁眉苦臉地迎上來怔蚌,“玉大人,你說我怎么就攤上這事旁赊¤胗唬” “怎么了?”我有些...
    開封第一講書人閱讀 152,370評論 0 342
  • 文/不壞的土叔 我叫張陵彤恶,是天一觀的道長钞钙。 經(jīng)常有香客問我,道長声离,這世上最難降的妖魔是什么芒炼? 我笑而不...
    開封第一講書人閱讀 55,168評論 1 278
  • 正文 為了忘掉前任,我火速辦了婚禮术徊,結(jié)果婚禮上本刽,老公的妹妹穿的比我還像新娘。我一直安慰自己赠涮,他們只是感情好子寓,可當(dāng)我...
    茶點故事閱讀 64,153評論 5 371
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著笋除,像睡著了一般斜友。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上垃它,一...
    開封第一講書人閱讀 48,954評論 1 283
  • 那天鲜屏,我揣著相機與錄音烹看,去河邊找鬼。 笑死洛史,一個胖子當(dāng)著我的面吹牛惯殊,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播也殖,決...
    沈念sama閱讀 38,271評論 3 399
  • 文/蒼蘭香墨 我猛地睜開眼土思,長吁一口氣:“原來是場噩夢啊……” “哼!你這毒婦竟也來了忆嗜?” 一聲冷哼從身側(cè)響起己儒,我...
    開封第一講書人閱讀 36,916評論 0 259
  • 序言:老撾萬榮一對情侶失蹤,失蹤者是張志新(化名)和其女友劉穎霎褐,沒想到半個月后址愿,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 43,382評論 1 300
  • 正文 獨居荒郊野嶺守林人離奇死亡冻璃,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 35,877評論 2 323
  • 正文 我和宋清朗相戀三年响谓,在試婚紗的時候發(fā)現(xiàn)自己被綠了。 大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片省艳。...
    茶點故事閱讀 37,989評論 1 333
  • 序言:一個原本活蹦亂跳的男人離奇死亡娘纷,死狀恐怖,靈堂內(nèi)的尸體忽然破棺而出跋炕,到底是詐尸還是另有隱情赖晶,我是刑警寧澤,帶...
    沈念sama閱讀 33,624評論 4 322
  • 正文 年R本政府宣布辐烂,位于F島的核電站遏插,受9級特大地震影響,放射性物質(zhì)發(fā)生泄漏纠修。R本人自食惡果不足惜胳嘲,卻給世界環(huán)境...
    茶點故事閱讀 39,209評論 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望扣草。 院中可真熱鬧了牛,春花似錦、人聲如沸辰妙。這莊子的主人今日做“春日...
    開封第一講書人閱讀 30,199評論 0 19
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽密浑。三九已至蛙婴,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間尔破,已是汗流浹背街图。 一陣腳步聲響...
    開封第一講書人閱讀 31,418評論 1 260
  • 我被黑心中介騙來泰國打工背传, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留,地道東北人台夺。 一個月前我還...
    沈念sama閱讀 45,401評論 2 352
  • 正文 我出身青樓,卻偏偏與公主長得像痴脾,于是被迫代替她去往敵國和親颤介。 傳聞我的和親對象是個殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點故事閱讀 42,700評論 2 345

推薦閱讀更多精彩內(nèi)容