HPA數(shù)據(jù)庫(kù)有豐富的切片數(shù)據(jù)和免疫組化的切片,可以從網(wǎng)站一個(gè)個(gè)下載保存欢嘿,然后把對(duì)應(yīng)的信息保存衰琐,再重命名,再次比對(duì)信息炼蹦。
但是羡宙,這樣的操作,可謂耗時(shí)耗力掐隐,還有可能在對(duì)比的時(shí)候出錯(cuò)狗热。
然后這個(gè)時(shí)候使用R專(zhuān)門(mén)爬下來(lái),就會(huì)很方便虑省,下面就是整個(gè)流程的代碼
下載并加載所需要的包
加載所需要的包匿刮,沒(méi)有哪一個(gè),下載相應(yīng)的包即可
library(BiocStyle)
library(HPAanalyze)
library(dplyr)
library(tibble)
library(readr)
library(tidyr)
創(chuàng)建存儲(chǔ)位置并確定需要下載基因和組織
dir.create("./output_data/step11_img/")
gene="IGF2BP2"
tissue="Kidney"
獲取相關(guān)的信息
#獲得HPA網(wǎng)站中該基因的xml文件
hpa_target_gene<-hpaXmlGet(gene)
#將xml中組織染色的信息提取出來(lái)
hpa_target_gene_fig_url<-hpaXmlTissueExpr(hpa_target_gene)
hpa_target_gene_fig_url_1<-as.data.frame(hpa_target_gene_fig_url[[1]])
hpa_target_gene_fig_url_1[1:6,1:18]
hpa_target_gene_fig_url_2<-as.data.frame(hpa_target_gene_fig_url[[2]])
hpa_target_gene_fig_url_2[1:6,1:18]
選擇需要下載的目標(biāo)內(nèi)容
#選擇自己感興趣的組織
hpa_target_gene_fig_url_tissue<-hpa_target_gene_fig_url_1[hpa_target_gene_fig_url_1$tissueDescription2==tissue,]
hpa_target_gene_fig_url_tissue<-hpa_target_gene_fig_url_2[hpa_target_gene_fig_url_2$tissueDescription2==tissue,]
創(chuàng)建下載位置并保存圖片
#為該組織該基因單獨(dú)建個(gè)文件夾儲(chǔ)存
picDir <- paste('./output_data/step11_img/',gene, tissue,"IHC-2/", sep = "_")
if (!dir.exists(picDir)) {
dir.create(picDir)
}
for (i in 1:nrow(hpa_target_gene_fig_url_tissue)) {
file_url<-hpa_target_gene_fig_url_tissue$imageUrl[i]
file_dir<-paste(picDir,gene,tissue,hpa_target_gene_fig_url_tissue$patientId[i],hpa_target_gene_fig_url_tissue$tissueDescription1[i],hpa_target_gene_fig_url_tissue$tissueDescription2[i],".tiff",sep = "_")
download.file(url = file_url,destfile = file_dir,mode = "wb")
}
最后保存圖片的所有信息
write.csv(hpa_target_gene_fig_url_tissue,paste(picDir,gene,"IHC-2_result_tab.csv",sep = "_"))