KEGG 是了解高級功能和生物系統(tǒng)(如細(xì)胞育拨、 生物和生態(tài)系統(tǒng))谨履,從分子水平信息,尤其是大型分子數(shù)據(jù)集生成的基因組測序和其他高通量實(shí)驗(yàn)技術(shù)的實(shí)用程序數(shù)據(jù)庫資源至朗, 由日本京都大學(xué)生物信息學(xué)中心的Kanehisa實(shí)驗(yàn)室于1995年建立屉符。是國際最常用的生物信息數(shù)據(jù)庫之一剧浸,以“理解生物系統(tǒng)的高級功能和實(shí)用程序資源庫”著稱锹引。
小練習(xí):如何拿到 KEGG數(shù)據(jù)庫的 hsa04650 Natural killer cell mediated cytotoxicity(自然殺傷細(xì)胞介導(dǎo)的細(xì)胞毒性)這個(gè)通路的所有基因名字矗钟。(hsa04650:Homo sapiens智人)
兩種辦法,第一谷歌嫌变,通過網(wǎng)頁方式瀏覽得到吨艇,第二種辦法,使用R包和代碼來做腾啥。
第一種辦法:網(wǎng)頁瀏覽
1东涡、谷歌直接搜索:hsa04650
2、點(diǎn)開此條網(wǎng)址(https://www.genome.jp/dbget-bin/www_bget?hsa04650)
3倘待、直接翻到gene這個(gè)條目下即可看到答案疮跑。
第二種方法:使用R包和代碼:
思路:看一下網(wǎng)頁答案可知,我們的目標(biāo)是得到Gene條目形成的一個(gè)矩陣凸舵,并提取出第二列的基因(縮寫)
參考文章: http://www.bio-info-trainee.com/3533.html
看一下這篇文章:
library(clusterProfiler) #加載這個(gè)包祖娘,這個(gè)包有什么用呢?
# https://www.kegg.jp/dbget-bin/www_bget?pathway+hsa05169
# library(KEGG.db) library(KEGGREST) #這兩個(gè)包有什么用呢啊奄?
?
kg=download_KEGG('hsa') #直接提取渐苏,并未提示用哪個(gè)命令獲得。
head(kg[[1]])
head(kg[[2]])
ps=c('hsa04660','hsa04659',
'hsa04658','hsa04657','hsa04662',
'hsa04650')
- clusterProfiler :This package implements methods to analyze and visualize functional profiles (GO and KEGG) of gene and gene clusters.(該軟件包是實(shí)現(xiàn)了分析和可視化基因和基因簇的功能譜(GO和KEGG)的方法菇夸。)
- KEGGREST :A package that provides a client interface to the KEGG REST server. (一個(gè)為KEGG REST服務(wù)器提供客戶端接口的包琼富。)
確定方向,先安裝包:
老規(guī)矩三部曲(安裝bioconductor內(nèi)的包):
1庄新、source("http://bioconductor.org/biocLite.R")
安裝BiocInstaller
2鞠眉、options(BioC_mirror="http://mirrors.ustc.edu.cn/bioc/")
切換鏡像
3、BiocInstaller::biocLite('KEGGREST')
安裝bioconductor內(nèi)的包(KEGGREST就是bioconductor的包)
> source("http://bioconductor.org/biocLite.R")
Bioconductor version 3.7 (BiocInstaller 1.30.0), ?biocLite for help
A newer version of Bioconductor is available for this version of R, ?BiocUpgrade for
help
> options(BioC_mirror="http://mirrors.ustc.edu.cn/bioc/")
> BiocInstaller::biocLite('KEGGREST')
BioC_mirror: http://mirrors.ustc.edu.cn/bioc/
Using Bioconductor 3.7 (BiocInstaller 1.30.0), R 3.5.2 (2018-12-20).
Installing package(s) ‘KEGGREST’
also installing the dependency ‘png’
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.5/png_0.1-7.zip'
Content type 'application/zip' length 292639 bytes (285 KB)
downloaded 285 KB
trying URL 'http://mirrors.ustc.edu.cn/bioc//packages/3.7/bioc/bin/windows/contrib/3.5/KEGGREST_1.20.2.zip'
Content type 'application/zip' length 124626 bytes (121 KB)
downloaded 121 KB
package ‘png’ successfully unpacked and MD5 sums checked
package ‘KEGGREST’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\300S\AppData\Local\Temp\Rtmp4wKPRV\downloaded_packages
Old packages: 'gplots', 'purrr'
Update all/some/none? [a/s/n]:
a
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.5/gplots_3.0.1.1.zip'
Content type 'application/zip' length 657011 bytes (641 KB)
downloaded 641 KB
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.5/purrr_0.3.0.zip'
Content type 'application/zip' length 413820 bytes (404 KB)
downloaded 404 KB
package ‘gplots’ successfully unpacked and MD5 sums checked
package ‘purrr’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\300S\AppData\Local\Temp\Rtmp4wKPRV\downloaded_packages
了解包的使用:
命令:
> ?KEGGREST
No documentation for ‘KEGGREST’ in specified packages and libraries:
you could try ‘??KEGGREST’
> ??KEGGREST
點(diǎn)擊查看择诈,了解基本命令:
- KEGG exposes a number of databases. To get an idea of what is available,
run listDatabases()
顯示KEGGREST所包含的數(shù)據(jù)內(nèi)容 - You can obtain the list of organisms available in KEGG with the
keggList()
function 得到可用的生物列表
> gs<-keggGet('hsa04650')
> View(gs)
目錄和網(wǎng)頁一樣凡蚜,但是可以明顯看出gs目前不是矩陣。把其變成矩陣再提取出來即可吭从。
光標(biāo)放在目錄旁朝蜘,發(fā)現(xiàn)一個(gè)圖標(biāo),點(diǎn)擊出現(xiàn)一行代碼涩金,enter運(yùn)行谱醇,得到該目錄內(nèi)容。
與網(wǎng)頁對比正確:
- strsplit(x, split, fixed = FALSE, perl= FALSE, useBytes = FALSE)
參數(shù)x是要處理的字符串步做,
參數(shù)split是分割點(diǎn)副渴。
參數(shù)fixed為TRUE時(shí)采用精確查找;
參數(shù)perl為TRUE時(shí)采用Perl正則表達(dá)式全度;
參數(shù)fixed和perl都為FALSE時(shí)煮剧,使用POSIX1003.2擴(kuò)展正則表達(dá)式;
參數(shù)useBytes為TRUE時(shí),匹配過程是逐字節(jié)進(jìn)行的勉盅;
lapply(X, FUN, ...)
lapply的返回值是和一個(gè)和X有相同的長度的list對象佑颇,這個(gè)list對象中的每個(gè)元素是將函數(shù)FUN應(yīng)用到X的每一個(gè)元素。其中X為List對象(該list的每個(gè)元素都是一個(gè)向量)草娜,其他類型的對象會被R通過函數(shù)as.list()自動轉(zhuǎn)換為list類型挑胸。unlist就是把里面不同的類型的數(shù)據(jù)分解出來,在此將數(shù)字與字符分隔開宰闰。unlist(x)生成一個(gè)包含x所有元素的向量茬贵,作用是展平數(shù)據(jù)列表。
> lapply(a,function(x) strsplit(x,';'))
[[1]]
[[1]][[1]]
[1] "3105"
[[2]]
[[2]][[1]]
[1] "HLA-A"
[2] " major histocompatibility complex, class I, A [KO:K06751]"
...
> unlist(lapply(a,function(x) strsplit(x,';')[[1]][1]))
[1] "3105" "HLA-A" "3106" "HLA-B" "3107" "HLA-C"
[7] "3135" "HLA-G" "3133" "HLA-E" "3812" "KIR3DL2"
[13] "3811" "KIR3DL1" "3803" "KIR2DL2" "3802" "KIR2DL1"
> b<- unlist(lapply(a,function(x) strsplit(x,';')[[1]][1]))
> b[1:length(b)%%2 ==0] #length(b)為基因所在位置,取出位置為偶數(shù)的字符即基因名
[1] "HLA-A" "HLA-B" "HLA-C" "HLA-G" "HLA-E" "KIR3DL2"
[7] "KIR3DL1" "KIR2DL2" "KIR2DL1" "KIR2DL3" "KIR2DL4" "KIR2DL5A"
[13] "KLRC1" "KLRC2" "KLRC3" "KLRD1" "PTPN6" "PTPN11"
[19] "ICAM1" "ICAM2" "ITGAL" "ITGB2" "PTK2B" "VAV3"
[25] "VAV1" "VAV2" "RAC1" "RAC2" "RAC3" "PAK1"
[31] "MAP2K1" "MAP2K2" "MAPK1" "MAPK3" "TNF" "CSF2"
[37] "IFNG" "KIR2DS1" "KIR2DS3" "KIR2DS4" "KIR2DS5" "KIR2DS2"
[43] "NCR2" "TYROBP" "LCK" "IGH" "FCGR3A" "FCGR3B"
[49] "NCR1" "NCR3" "FCER1G" "CD247" "ZAP70" "SYK"
[55] "LCP2" "LAT" "PLCG1" "PLCG2" "SH3BP2" "PIK3CA"
[61] "PIK3CD" "PIK3CB" "PIK3R1" "PIK3R2" "PIK3R3" "FYN"
[67] "SHC1" "SHC2" "SHC3" "SHC4" "GRB2" "SOS1"
[73] "SOS2" "HRAS" "KRAS" "NRAS" "ARAF" "BRAF"
[79] "RAF1" "MICB" "MICA" "ULBP1" "ULBP2" "ULBP3"
[85] "RAET1G" "RAET1L" "RAET1E" "KLRK1" "KLRC4-KLRK1" "HCST"
[91] "CD48" "CD244" "PPP3CA" "PPP3CB" "PPP3CC" "PPP3R1"
[97] "PPP3R2" "NFATC1" "NFATC2" "PRKCA" "PRKCB" "PRKCG"
[103] "SH2D1B" "SH2D1A" "IFNGR1" "IFNGR2" "IFNA1" "IFNA2"
[109] "IFNA4" "IFNA5" "IFNA6" "IFNA7" "IFNA8" "IFNA10"
[115] "IFNA13" "IFNA14" "IFNA16" "IFNA17" "IFNA21" "IFNB1"
[121] "IFNAR1" "IFNAR2" "TNFSF10" "TNFRSF10A" "TNFRSF10B" "FASLG"
[127] "FAS" "GZMB" "PRF1" "CASP3" "BID"
友情閱讀推薦:
- 強(qiáng)烈推薦參加生信技能樹(爆款入門培訓(xùn)課)全國巡講 移袍,課程詳情見:https://mp.weixin.qq.com/s/Z9sdxgvFj0XJjYaW_5yHXg 各大城市均有開課解藻,隨時(shí)隨地報(bào)名。
- 生信技能樹公益視頻合輯:學(xué)習(xí)順序是linux葡盗,r舆逃,軟件安裝,geo戳粒,小技巧路狮,ngs組學(xué)!
B站鏈接:https://m.bilibili.com/space/338686099 - 學(xué)徒培養(yǎng)詳見:https://mp.weixin.qq.com/s/3jw3_PgZXYd7FomxEMxFmw