title: GDAS009-Bioconductor中的基因組注釋
date: 2019-09-06 12:0:00
type: "tags"
tags:
- Bioconductor
- 基因組注釋
- KEGG
- GO
categories: - Genomics Data Analysis Series
前言
這一部分內(nèi)容涉及R中使用人類基因且,內(nèi)含子,外顯子,轉(zhuǎn)錄本式塌,AnnotationHub乙嘀,基因組的注釋包飒筑,GO分析挂洛,KEGG分析等,筆記末尾的參考文獻(xiàn)是原文第步。
基礎(chǔ)注釋資源與發(fā)現(xiàn)
在這一部分里,我們將回顧Bioconductor中用于處理和注釋基因組序列的一些工具缘琅。我們將研究參考基因組序列粘都,轉(zhuǎn)錄本和基因,并以基因通路(gene pathway)作為結(jié)束刷袍。我們學(xué)習(xí)這一部分的最終目標(biāo)就是使用注釋信息來(lái)幫助我們對(duì)基因組實(shí)驗(yàn)進(jìn)行可靠的解釋翩隧。Bioconductor的基本目標(biāo)就是更加方便地有關(guān)基因組結(jié)構(gòu)和功能的信息統(tǒng)計(jì)統(tǒng)計(jì)分析程序。
注釋概念的層次結(jié)構(gòu)
Bioconductor包括許多不同類型的基因組注釋呻纹。我們可以在層次結(jié)構(gòu)中來(lái)理解這些注釋資源堆生。
- 最基因的注釋就是某個(gè)物種的參考基因組序列专缠。它總是按照核苷酸的線性方式排列成染色體(例如參考基因組。
- 在此之上就是將染色體序列排列到感興趣的區(qū)域中顽频。最感興趣的區(qū)域就是基因藤肢,但是注釋中也含有其它的信息,例如SNP或CpG位點(diǎn)糯景∴胰Γ基因具有內(nèi)部結(jié)構(gòu),即被轉(zhuǎn)錄的部分和未被轉(zhuǎn)錄的部分蟀淮∽钭。“基因模式”定義了在基因組坐標(biāo)中的標(biāo)記和布置這些結(jié)構(gòu)的方式。
- 在感興趣的區(qū)域(regions of interest)的理念下怠惶,我們還定義了面向平臺(tái)的注釋(platform-oriented annotation)涨缚。這處類型的注釋通常首先是由廠家提供的,但隨著研究的進(jìn)行策治,對(duì)這些平臺(tái)中最初有歧義信息進(jìn)行了確認(rèn)和更新脓魏,從而完善了這些注釋內(nèi)容。密歇根大學(xué)的brainarray project 說(shuō)明了affymetrix陣列注釋的過(guò)程通惫。我們將在本節(jié)最后討論面向平臺(tái)注釋的問(wèn)題茂翔。
- 在此之是是將區(qū)域(通常是基因或基因的產(chǎn)物)組成成具有共同結(jié)構(gòu)或功能特性的組。例如在細(xì)胞中共同被發(fā)現(xiàn)的履腋,或者是被鑒定為在生物學(xué)過(guò)程中協(xié)同作用的基因組(我的理解就是GO分析珊燎,KEGG分析這一類)。
發(fā)現(xiàn)可用的參考基因組
Bioconductor已經(jīng)包含了注釋包的合成遵湖,將它這一層次結(jié)構(gòu)上的所有元素都帶了可編程環(huán)境中悔政。參考基因組序列是使用Biostrings和BSgenome包中的工具進(jìn)行管理的,available.genomes
函數(shù)能夠列出構(gòu)建好的人和現(xiàn)在各種模式生物的參考基因組延旧,如下所示:
library(Biostrings)
ag = available.genomes()
length(ag)
## [1] 87
head(ag)
## [1] "BSgenome.Alyrata.JGI.v1"
## [2] "BSgenome.Amellifera.BeeBase.assembly4"
## [3] "BSgenome.Amellifera.UCSC.apiMel2"
## [4] "BSgenome.Amellifera.UCSC.apiMel2.masked"
## [5] "BSgenome.Athaliana.TAIR.04232008"
## [6] "BSgenome.Athaliana.TAIR.TAIR9"
參考基因組的版本很重要
不同物種的參考基因組是從頭構(gòu)建的谋国,然后隨著算法和測(cè)序數(shù)據(jù)的不斷改進(jìn)而進(jìn)一步完善。對(duì)人類而言迁沫,基因組研究聯(lián)盟(Genome Research Consortium)于2009年構(gòu)建了37號(hào)版本烹卒,并于2013年構(gòu)建了38號(hào)版本。
一旦參考基因組構(gòu)建完成弯洗,就哦可以很輕松地對(duì)某個(gè)物種進(jìn)行信息豐富的基因組序列分析旅急,因?yàn)槿藗兛梢詫W⒂谀且鹨阎械任换蚨鄻有缘膮^(qū)域。
The reference build for an organism is created de novo and then refined as algorithms and sequenced data improve. For humans, the Genome Research Consortium signed off on build 37 in 2009, and on build 38 in 2013.
需要注意的是牡整,基因組序列包含有很長(zhǎng)的名稱藐吮,這個(gè)名稱里包括版本信息。這樣命名的方式就是為了避免與不同版本的參考基因組混淆。在LiftOver這節(jié)視頻里谣辞,我們就展示了如何使用UCSC的liftOver
工具與rtracklayer
包中的接口對(duì)接迫摔,從而實(shí)現(xiàn)不同版本的基因組坐標(biāo)轉(zhuǎn)化的過(guò)程。
為了幫助用戶避免混淆從不同參考基因組坐標(biāo)上收集分析來(lái)的數(shù)據(jù)泥从,我們提供了一個(gè)”基因組“標(biāo)簽句占,這個(gè)標(biāo)簽填充了大多關(guān)于序列的信息。在隨后的部分里躯嫉,我們會(huì)看到一些案例纱烘。用于序列比對(duì)的軟件可以檢查被比對(duì)上的序列的兼容標(biāo)簽,從而有助于確保有意義的結(jié)果祈餐。
H. sapiens的參考基因序列
通過(guò)安裝并添加一個(gè)單獨(dú)的R包就能獲取智人(Homo sapiens)的參考序列擂啥。這個(gè)程序包定義了一個(gè)Hsapiens
對(duì)象,試劑公司對(duì)象是染色體序列的來(lái)源帆阳,但是當(dāng)對(duì)其進(jìn)行單獨(dú)顯示時(shí)哺壶,它會(huì)提供相關(guān)序列數(shù)據(jù)來(lái)源的信息,如下所示:
library(BSgenome.Hsapiens.UCSC.hg19)
Hsapiens
## Human genome:
## # organism: Homo sapiens (Human)
## # provider: UCSC
## # provider version: hg19
## # release date: Feb. 2009
## # release name: Genome Reference Consortium GRCh37
## # 93 sequences:
## # chr1 chr2 chr3
## # chr4 chr5 chr6
## # chr7 chr8 chr9
## # chr10 chr11 chr12
## # chr13 chr14 chr15
## # ... ... ...
## # chrUn_gl000235 chrUn_gl000236 chrUn_gl000237
## # chrUn_gl000238 chrUn_gl000239 chrUn_gl000240
## # chrUn_gl000241 chrUn_gl000242 chrUn_gl000243
## # chrUn_gl000244 chrUn_gl000245 chrUn_gl000246
## # chrUn_gl000247 chrUn_gl000248 chrUn_gl000249
## # (use 'seqnames()' to see all the sequence names, use the '$' or '[['
## # operator to access a given sequence)
head(genome(Hsapiens)) # see the tag
## chr1 chr2 chr3 chr4 chr5 chr6
## "hg19" "hg19" "hg19" "hg19" "hg19" "hg19"
我們使用 $
符號(hào)來(lái)獲取17號(hào)染色體的序列蜒谤,如下所示:
Hsapiens$chr17
## 81195210-letter "DNAString" instance
## seq: AAGCTTCTCACCCTGTTCCTGCATAGATAATTGC...GGTGTGGGTGTGGTGTGTGGGTGTGGGTGTGGT
參考序列的轉(zhuǎn)錄本和基因
UCSC注釋
TxDb
包家族和數(shù)據(jù)對(duì)象管理了轉(zhuǎn)錄本和基因模式信息山宾。我們可以認(rèn)為這些信息來(lái)源于UCSC基因組瀏覽器的注釋表,如下所示:
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb = TxDb.Hsapiens.UCSC.hg19.knownGene # abbreviate
txdb
## TxDb object:
## # Db type: TxDb
## # Supporting package: GenomicFeatures
## # Data source: UCSC
## # Genome: hg19
## # Organism: Homo sapiens
## # Taxonomy ID: 9606
## # UCSC Table: knownGene
## # Resource URL: http://genome.ucsc.edu/
## # Type of Gene ID: Entrez Gene ID
## # Full dataset: yes
## # miRBase build ID: GRCh37
## # transcript_nrow: 82960
## # exon_nrow: 289969
## # cds_nrow: 237533
## # Db created by: GenomicFeatures package from Bioconductor
## # Creation time: 2015-10-07 18:11:28 +0000 (Wed, 07 Oct 2015)
## # GenomicFeatures version at creation time: 1.21.30
## # RSQLite version at creation time: 1.0.0
## # DBSCHEMAVERSION: 1.1
我們使用 genes()
來(lái)獲取Entrez Gene ID的地址鳍徽,如下所示:
ghs = genes(txdb)
ghs
## GRanges object with 23056 ranges and 1 metadata column:
## seqnames ranges strand | gene_id
## <Rle> <IRanges> <Rle> | <character>
## 1 chr19 [ 58858172, 58874214] - | 1
## 10 chr8 [ 18248755, 18258723] + | 10
## 100 chr20 [ 43248163, 43280376] - | 100
## 1000 chr18 [ 25530930, 25757445] - | 1000
## 10000 chr1 [243651535, 244006886] - | 10000
## ... ... ... ... . ...
## 9991 chr9 [114979995, 115095944] - | 9991
## 9992 chr21 [ 35736323, 35743440] + | 9992
## 9993 chr22 [ 19023795, 19109967] - | 9993
## 9994 chr6 [ 90539619, 90584155] + | 9994
## 9997 chr22 [ 50961997, 50964905] - | 9997
## -------
## seqinfo: 93 sequences (1 circular) from hg19 genome
我們也可以使用合適的標(biāo)識(shí)符進(jìn)行信息過(guò)濾∷担現(xiàn)在我們提取兩個(gè)不同基因的外顯子,這些外顯子由其Entrez基因ID標(biāo)明旬盯,如下所示:
exons(txdb, columns=c("EXONID", "TXNAME", "GENEID"),
filter=list(gene_id=c(100, 101)))
## GRanges object with 39 ranges and 3 metadata columns:
## seqnames ranges strand | EXONID
## <Rle> <IRanges> <Rle> | <integer>
## [1] chr10 [135075920, 135076737] - | 144421
## [2] chr10 [135077192, 135077269] - | 144422
## [3] chr10 [135080856, 135080921] - | 144423
## [4] chr10 [135081433, 135081570] - | 144424
## [5] chr10 [135081433, 135081622] - | 144425
## ... ... ... ... . ...
## [35] chr20 [43254210, 43254325] - | 256371
## [36] chr20 [43255097, 43255240] - | 256372
## [37] chr20 [43257688, 43257810] - | 256373
## [38] chr20 [43264868, 43264929] - | 256374
## [39] chr20 [43280216, 43280376] - | 256375
## TXNAME GENEID
## <CharacterList> <CharacterList>
## [1] uc009ybi.3,uc010qva.2,uc021qbe.1 101
## [2] uc009ybi.3,uc021qbe.1 101
## [3] uc009ybi.3,uc010qva.2,uc021qbe.1 101
## [4] uc009ybi.3 101
## [5] uc010qva.2,uc021qbe.1 101
## ... ... ...
## [35] uc002xmj.3 100
## [36] uc002xmj.3 100
## [37] uc002xmj.3 100
## [38] uc002xmj.3 100
## [39] uc002xmj.3 100
## -------
## seqinfo: 93 sequences (1 circular) from hg19 genome
ENSEMBL注釋
Ensembl home主頁(yè)上寫道:Ensembl創(chuàng)建,整合和發(fā)布研究基因組的參考數(shù)據(jù)庫(kù)和工具翎猛。該項(xiàng)目位于 歐洲分子生物學(xué)實(shí)驗(yàn)室胖翰,該實(shí)驗(yàn)室的數(shù)據(jù)庫(kù)支持其注釋資源可以與Bioconductor兼容。
ensembldb
包含有一個(gè)簡(jiǎn)要說(shuō)明切厘,其內(nèi)容如下所示:
ensembldb包提供了一些函數(shù)萨咳,這些函數(shù)用于創(chuàng)建和使用以轉(zhuǎn)錄本為中心的注釋數(shù)據(jù)庫(kù)/包。使用注釋數(shù)據(jù)庫(kù)的Perl API可以從Ensembl 1中直接獲取這些數(shù)據(jù)疫稿。TxDb
包的功能和數(shù)據(jù)類似于GenomicFeatures
包培他,另外,除了從數(shù)據(jù)庫(kù)檢索所有的基因/轉(zhuǎn)錄本模型和注釋外遗座,ensembldb包還提供了一個(gè)過(guò)濾框架舀凛,用于檢索特定條目的注釋,例如位于染色體區(qū)域上的某編碼基因或某LincRNA轉(zhuǎn)錄模式的特定條目途蒋。從1.7版本開(kāi)始猛遍,由ensembldb創(chuàng)建的EnsDb數(shù)據(jù)庫(kù)還包含蛋白質(zhì)注釋數(shù)據(jù)庫(kù)(參考第11節(jié):數(shù)據(jù)庫(kù)而已和可用屬性/列的概述)。有關(guān)蛋白質(zhì)注釋的信息請(qǐng)參考蛋白質(zhì)的vignette,如下所示:
library(ensembldb)
library(EnsDb.Hsapiens.v75)
names(listTables(EnsDb.Hsapiens.v75))
## [1] "gene" "tx" "tx2exon" "exon"
## [5] "chromosome" "protein" "uniprot" "protein_domain"
## [9] "entrezgene" "metadata"
舉例說(shuō)明如下:
edb = EnsDb.Hsapiens.v75 # abbreviate
txs <- transcripts(edb, filter = GenenameFilter("ZBTB16"),
columns = c("protein_id", "uniprot_id", "tx_biotype"))
txs
## GRanges object with 20 ranges and 5 metadata columns:
## seqnames ranges strand | protein_id
## <Rle> <IRanges> <Rle> | <character>
## ENST00000335953 11 [113930315, 114121398] + | ENSP00000338157
## ENST00000335953 11 [113930315, 114121398] + | ENSP00000338157
## ENST00000335953 11 [113930315, 114121398] + | ENSP00000338157
## ENST00000335953 11 [113930315, 114121398] + | ENSP00000338157
## ENST00000335953 11 [113930315, 114121398] + | ENSP00000338157
## ... ... ... ... . ...
## ENST00000392996 11 [113931229, 114121374] + | ENSP00000376721
## ENST00000539918 11 [113935134, 114118066] + | ENSP00000445047
## ENST00000545851 11 [114051488, 114118018] + | <NA>
## ENST00000535379 11 [114107929, 114121279] + | <NA>
## ENST00000535509 11 [114117512, 114121198] + | <NA>
## uniprot_id tx_biotype tx_id
## <character> <character> <character>
## ENST00000335953 ZBT16_HUMAN protein_coding ENST00000335953
## ENST00000335953 Q71UL7_HUMAN protein_coding ENST00000335953
## ENST00000335953 Q71UL6_HUMAN protein_coding ENST00000335953
## ENST00000335953 Q71UL5_HUMAN protein_coding ENST00000335953
## ENST00000335953 F5H6C3_HUMAN protein_coding ENST00000335953
## ... ... ... ...
## ENST00000392996 F5H5Y7_HUMAN protein_coding ENST00000392996
## ENST00000539918 <NA> nonsense_mediated_decay ENST00000539918
## ENST00000545851 <NA> processed_transcript ENST00000545851
## ENST00000535379 <NA> processed_transcript ENST00000535379
## ENST00000535509 <NA> retained_intron ENST00000535509
## gene_name
## <character>
## ENST00000335953 ZBTB16
## ENST00000335953 ZBTB16
## ENST00000335953 ZBTB16
## ENST00000335953 ZBTB16
## ENST00000335953 ZBTB16
## ... ...
## ENST00000392996 ZBTB16
## ENST00000539918 ZBTB16
## ENST00000545851 ZBTB16
## ENST00000535379 ZBTB16
## ENST00000535509 ZBTB16
## -------
## seqinfo: 1 sequence from GRCh37 genome
你的數(shù)據(jù)將會(huì)成他人的注釋:導(dǎo)入/導(dǎo)出
ENCODE項(xiàng)目很地說(shuō)明了今天的實(shí)驗(yàn)是明天的注釋懊烤。你應(yīng)該以同樣的方式考慮自己的實(shí)驗(yàn)(當(dāng)然梯醒,要使實(shí)驗(yàn)成為可靠且持久的注釋,它必須解決有關(guān)基因組結(jié)構(gòu)或功能的重要問(wèn)題腌紧,并且必須使用適當(dāng)?shù)娜紫埃苷_執(zhí)行的實(shí)驗(yàn)流程。需要注意壁肋,ENCODE能夠非常明確地將實(shí)驗(yàn)流程鏈接到數(shù)據(jù))号胚。
例如,我們來(lái)看一個(gè)雌激素受體結(jié)合數(shù)據(jù)墩划,它是由ENCODE發(fā)布的一個(gè)narrowPeak 數(shù)據(jù)涕刚。它的堿基是用ascii文本表示的,因此可以很容易地導(dǎo)入為一組文本數(shù)據(jù)乙帮。如果記錄的字段有一定的規(guī)律性杜漠,則可以將文件作為表格導(dǎo)入。
但是察净,我們不僅是想導(dǎo)入數(shù)據(jù)驾茴,還想將導(dǎo)入的數(shù)據(jù)作為可計(jì)算的對(duì)象。我們認(rèn)識(shí)到arrowePeak和bedGraph格式之間的聯(lián)系后氢卡,我們就可以立即將其導(dǎo)入GRanges中锈至。
為了說(shuō)明這一點(diǎn),我們?cè)贓RBS包中找到narrowPeak原始數(shù)據(jù)文件的路徑译秦,如下所示:
f1 = dir(system.file("extdata",package="ERBS"), full=TRUE)[1]
readLines(f1, 4) # look at a few lines
## [1] "chrX\t1509354\t1512462\t5\t0\t.\t157.92\t310\t32.000000\t1991"
## [2] "chrX\t26801421\t26802448\t6\t0\t.\t147.38\t310\t32.000000\t387"
## [3] "chr19\t11694101\t11695359\t1\t0\t.\t99.71\t311.66\t32.000000\t861"
## [4] "chr19\t4076892\t4079276\t4\t0\t.\t84.74\t310\t32.000000\t1508"
使用import
命令非常簡(jiǎn)單峡捡,如下所示:
library(rtracklayer)
imp = import(f1, format="bedGraph")
imp
## GRanges object with 1873 ranges and 7 metadata columns:
## seqnames ranges strand | score NA.
## <Rle> <IRanges> <Rle> | <numeric> <integer>
## [1] chrX [ 1509355, 1512462] * | 5 0
## [2] chrX [26801422, 26802448] * | 6 0
## [3] chr19 [11694102, 11695359] * | 1 0
## [4] chr19 [ 4076893, 4079276] * | 4 0
## [5] chr3 [53288568, 53290767] * | 9 0
## ... ... ... ... . ... ...
## [1869] chr19 [11201120, 11203985] * | 8701 0
## [1870] chr19 [ 2234920, 2237370] * | 990 0
## [1871] chr1 [94311336, 94313543] * | 4035 0
## [1872] chr19 [45690614, 45691210] * | 10688 0
## [1873] chr19 [ 6110100, 6111252] * | 2274 0
## NA.1 NA.2 NA.3 NA.4 NA.5
## <logical> <numeric> <numeric> <numeric> <integer>
## [1] <NA> 157.92 310 32 1991
## [2] <NA> 147.38 310 32 387
## [3] <NA> 99.71 311.66 32 861
## [4] <NA> 84.74 310 32 1508
## [5] <NA> 78.2 299.505 32 1772
## ... ... ... ... ... ...
## [1869] <NA> 8.65 7.281 0.26576 2496
## [1870] <NA> 8.65 26.258 1.995679 1478
## [1871] <NA> 8.65 12.511 1.47237 1848
## [1872] <NA> 8.65 6.205 0 298
## [1873] <NA> 8.65 17.356 2.013228 496
## -------
## seqinfo: 23 sequences from an unspecified genome; no seqlengths
genome(imp) # genome identifier tag not set, but you should set it
## chrX chr19 chr3 chr17 chr8 chr11 chr16 chr1 chr2 chr6 chr9 chr7
## NA NA NA NA NA NA NA NA NA NA NA NA
## chr5 chr12 chr20 chr21 chr22 chr18 chr10 chr14 chr15 chr4 chr13
## NA NA NA NA NA NA NA NA NA NA NA
我們可以通過(guò)一次獲取GRanges。元數(shù)據(jù)列中還有一些其他字段用于指定名稱筑悴,但是如果我們只對(duì)范圍感興趣们拙,除了添加基因組元數(shù)據(jù)以防止與不兼容的坐標(biāo)中記錄的數(shù)據(jù)非法組合外,我們就完成了這個(gè)任務(wù)(這一段不太理解阁吝,原文如下):
We obtain a GRanges in one stroke. There are some additional fields in the metadata columns whose names should be specified, but if we are interested only in the ranges, we are done, with the exception of adding the genome metadata to protect against illegitimate combination with data recorded in an incompatible coordinate system.
為了與其他得養(yǎng)家或系統(tǒng)進(jìn)行交流砚婆,我們有兩個(gè)主要選擇。我們可以將GRanges保存為RData
對(duì)象突勇,輕松地傳遞給另外一個(gè)R用戶使用装盯。或者甲馋,我們歌詞采用其他標(biāo)準(zhǔn)格式進(jìn)行導(dǎo)出埂奈。例如,如果我們僅對(duì)間隔地址和綁定的得分感興趣定躏,則僅保存為“bed”格式就足夠了挥转,如下所示:
export(imp, "demoex.bed") # implicit format choice
cat(readLines("demoex.bed", n=5), sep="\n")
## chrX 1509354 1512462 . 5 .
## chrX 26801421 26802448 . 6 .
## chr19 11694101 11695359 . 1 .
## chr19 4076892 4079276 . 4 .
## chr3 53288567 53290767 . 9 .
我們已經(jīng)進(jìn)行了導(dǎo)入海蔽,建模和導(dǎo)入實(shí)驗(yàn)數(shù)據(jù)之間的“往返”,該實(shí)驗(yàn)數(shù)據(jù)可以與其他數(shù)據(jù)集成在一起绑谣,從而增進(jìn)生物學(xué)的理解党窜。
我們需要注意的是,注釋在某種程度上是永久正確的借宵,它與在知識(shí)邊界上的研究進(jìn)展乏味地隔離開(kāi)來(lái)幌衣。我們已經(jīng)看到了,甚至人類染色體的參考序列也受到了修訂壤玫。在使用ERBS包時(shí)豁护,我們將未知的實(shí)驗(yàn)結(jié)果視為定義ER結(jié)合位點(diǎn)從而進(jìn)入潛在的生物學(xué)解釋。不確定性欲间,峰鑒定的可變質(zhì)量楚里,尚未得到明確估計(jì),但應(yīng)該是這個(gè)樣子猎贴。
Bioconductor已經(jīng)盡力致力于這種情況的多個(gè)方面班缎。我們維護(hù)軟件先前版本和注釋的存檔,以便可以檢查或修改過(guò)去的工作她渴。我們每年會(huì)兩次更新中疏注釋資源达址,以確保正在進(jìn)行的工作以及獲得新知識(shí)的穩(wěn)定性。而且趁耗,我們已經(jīng)簡(jiǎn)化了導(dǎo)入和創(chuàng)建實(shí)驗(yàn)數(shù)據(jù)和注釋數(shù)據(jù)的表示形式沉唠。
AnnotationHub
AnnotationHub
包用于獲取GRanges或其它的合適設(shè)計(jì)的容器,用于機(jī)構(gòu)設(shè)計(jì)的容器苛败,如下所示:
library(AnnotationHub)
##
## Attaching package: 'AnnotationHub'
## The following object is masked from 'package:Biobase':
##
## cache
ah = AnnotationHub()
## snapshotDate(): 2017-10-27
ah
## AnnotationHub with 42282 records
## # snapshotDate(): 2017-10-27
## # $$dataprovider: BroadInstitute, Ensembl, UCSC, ftp://ftp.ncbi.nlm.nih....
## # $$species: Homo sapiens, Mus musculus, Drosophila melanogaster, Bos ta...
## # $$rdataclass: GRanges, BigWigFile, FaFile, TwoBitFile, Rle, ChainFile,...
## # additional mcols(): taxonomyid, genome, description,
## # coordinate_1_based, maintainer, rdatadateadded, preparerclass,
## # tags, rdatapath, sourceurl, sourcetype
## # retrieve records with, e.g., 'object[["AH2"]]'
##
## title
## AH2 | Ailuropoda_melanoleuca.ailMel1.69.dna.toplevel.fa
## AH3 | Ailuropoda_melanoleuca.ailMel1.69.dna_rm.toplevel.fa
## AH4 | Ailuropoda_melanoleuca.ailMel1.69.dna_sm.toplevel.fa
## AH5 | Ailuropoda_melanoleuca.ailMel1.69.ncrna.fa
## AH6 | Ailuropoda_melanoleuca.ailMel1.69.pep.all.fa
## ... ...
## AH58988 | org.Flavobacterium_piscicida.eg.sqlite
## AH58989 | org.Bacteroides_fragilis_YCH46.eg.sqlite
## AH58990 | org.Pseudomonas_mendocina_ymp.eg.sqlite
## AH58991 | org.Salmonella_enterica_subsp._enterica_serovar_Typhimurium...
## AH58992 | org.Acinetobacter_baumannii.eg.sqlite
我們可以通過(guò)AnnotationHub獲得許多與HepG2細(xì)胞系相關(guān)的實(shí)驗(yàn)數(shù)據(jù)對(duì)象满葛,如下所示:
query(ah, "HepG2")
## AnnotationHub with 440 records
## # snapshotDate(): 2017-10-27
## # $$dataprovider: UCSC, BroadInstitute, Pazar
## # $$species: Homo sapiens, NA
## # $$rdataclass: GRanges, BigWigFile
## # additional mcols(): taxonomyid, genome, description,
## # coordinate_1_based, maintainer, rdatadateadded, preparerclass,
## # tags, rdatapath, sourceurl, sourcetype
## # retrieve records with, e.g., 'object[["AH22246"]]'
##
## title
## AH22246 | pazar_CEBPA_HEPG2_Schmidt_20120522.csv
## AH22249 | pazar_CTCF_HEPG2_Schmidt_20120522.csv
## AH22273 | pazar_HNF4A_HEPG2_Schmidt_20120522.csv
## AH22309 | pazar_STAG1_HEPG2_Schmidt_20120522.csv
## AH22348 | wgEncodeAffyRnaChipFiltTransfragsHepg2CytosolLongnonpolya.b...
## ... ...
## AH41564 | E118-H4K5ac.imputed.pval.signal.bigwig
## AH41691 | E118-H4K8ac.imputed.pval.signal.bigwig
## AH41818 | E118-H4K91ac.imputed.pval.signal.bigwig
## AH46971 | E118_15_coreMarks_mnemonics.bed.gz
## AH49484 | E118_RRBS_FractionalMethylation.bigwig
query
方法可以使用過(guò)濾字符串的向量。要限制對(duì)尋址組蛋白H4K5的注釋資源的響應(yīng)罢屈,只需要添加該標(biāo)簽嘀韧,如下所示(To limit response to annotation resources addressing the histone H4K5, simply add that tag):
query(ah, c("HepG2", "H4K5"))
## AnnotationHub with 1 record
## # snapshotDate(): 2017-10-27
## # names(): AH41564
## # $$dataprovider: BroadInstitute
## # $$species: Homo sapiens
## # $$rdataclass: BigWigFile
## # $$rdatadateadded: 2015-05-08
## # $$title: E118-H4K5ac.imputed.pval.signal.bigwig
## # $$description: Bigwig File containing -log10(p-value) signal tracks fr...
## # $$taxonomyid: 9606
## # $$genome: hg19
## # $$sourcetype: BigWig
## # $$sourceurl: http://egg2.wustl.edu/roadmap/data/byFileType/signal/cons...
## # $$sourcesize: 226630905
## # $$tags: c("EpigenomeRoadMap", "signal", "consolidatedImputed",
## # "H4K5ac", "E118", "ENCODE2012", "LIV.HEPG2.CNCR", "HepG2
## # Hepatocellular Carcinoma Cell Line")
## # retrieve record with 'object[["AH41564"]]'
The OrgDb基因注釋圖
那些命名為org.*.ge.db
的包含在基因水平上鏈接到位置,蛋白產(chǎn)物標(biāo)識(shí)符儡遮,KEGG途徑和GO term,PMIDs以及其它注釋資源的標(biāo)識(shí)符的信息暗赶,如下所示:
library(org.Hs.eg.db)
keytypes(org.Hs.eg.db) # columns() gives same answer
## [1] "ACCNUM" "ALIAS" "ENSEMBL" "ENSEMBLPROT"
## [5] "ENSEMBLTRANS" "ENTREZID" "ENZYME" "EVIDENCE"
## [9] "EVIDENCEALL" "GENENAME" "GO" "GOALL"
## [13] "IPI" "MAP" "OMIM" "ONTOLOGY"
## [17] "ONTOLOGYALL" "PATH" "PFAM" "PMID"
## [21] "PROSITE" "REFSEQ" "SYMBOL" "UCSCKG"
## [25] "UNIGENE" "UNIPROT"
head(select(org.Hs.eg.db, keys="ORMDL3", keytype="SYMBOL",
columns="PMID"))
## 'select()' returned 1:many mapping between keys and columns
## SYMBOL PMID
## 1 ORMDL3 11042152
## 2 ORMDL3 12093374
## 3 ORMDL3 12477932
## 4 ORMDL3 14702039
## 5 ORMDL3 15489334
## 6 ORMDL3 16169070
基因集和通路資源
基因本體論
Gene Ontology (GO)是一種廣泛使用的結(jié)構(gòu)化詞匯鄙币,它組織了基因和基因產(chǎn)物在以下方面的內(nèi)容:
- 生物過(guò)程
- 分子功能
- 細(xì)胞組分。
這套詞匯本身旨在與所有生物有關(guān)蹂随。它采用有向無(wú)環(huán)圖的形式十嘿,其中term作為節(jié)點(diǎn),使用is-a
和part-of
關(guān)系作構(gòu)成了大多數(shù)鏈接岳锁。
將生物體特定基因鏈接到基因本體中的術(shù)語(yǔ)的注釋與詞匯表本身是分開(kāi)的绩衷,并且涉及不同類型的證據(jù)。這些記錄都在Bioconductor的注釋包中。
我們可以使用GO.db
包來(lái)快速地訪問(wèn)GO詞匯咳燕,如下所示:
library(GO.db)
GO.db # metadata
## GODb object:
## | GOSOURCENAME: Gene Ontology
## | GOSOURCEURL: ftp://ftp.geneontology.org/pub/go/godatabase/archive/latest-lite/
## | GOSOURCEDATE: 2017-Nov01
## | Db type: GODb
## | package: AnnotationDbi
## | DBSCHEMA: GO_DB
## | GOEGSOURCEDATE: 2017-Nov6
## | GOEGSOURCENAME: Entrez Gene
## | GOEGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
## | DBSCHEMAVERSION: 2.1
##
## Please see: help('select') for usage information
使用AnnotationDbi
包中的keys
勿决,columns
和select
函數(shù)也很容易在地id與不同terms之間進(jìn)行映射,如下所示:
k5 = keys(GO.db)[1:5]
cgo = columns(GO.db)
select(GO.db, keys=k5, columns=cgo[1:3])
## 'select()' returned 1:1 mapping between keys and columns
## GOID
## 1 GO:0000001
## 2 GO:0000002
## 3 GO:0000003
## 4 GO:0000006
## 5 GO:0000007
## DEFINITION
## 1 The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cytoskeleton.
## 2 The maintenance of the structure and integrity of the mitochondrial genome; includes replication and segregation of the mitochondrial chromosome.
## 3 The production of new individuals that contain some portion of genetic material inherited from one or more parent organisms.
## 4 Enables the transfer of zinc ions (Zn2+) from one side of a membrane to the other, probably powered by proton motive force. In high-affinity transport the transporter is able to bind the solute even if it is only present at very low concentrations.
## 5 Enables the transfer of a solute or solutes from one side of a membrane to the other according to the reaction: Zn2+ = Zn2+, probably powered by proton motive force. In low-affinity transport the transporter is able to bind the solute only if it is present at very high concentrations.
## ONTOLOGY
## 1 BP
## 2 BP
## 3 BP
## 4 MF
## 5 MF
詞匯表的圖形結(jié)構(gòu)被編碼在SQLite
數(shù)據(jù)庫(kù)的表中招盲。我們可以使用RSQLite
接口對(duì)此進(jìn)行查詢低缩,如下所示:
con = GO_dbconn()
dbListTables(con)
## [1] "go_bp_offspring" "go_bp_parents" "go_cc_offspring"
## [4] "go_cc_parents" "go_mf_offspring" "go_mf_parents"
## [7] "go_obsolete" "go_ontology" "go_synonym"
## [10] "go_term" "map_counts" "map_metadata"
## [13] "metadata" "sqlite_stat1"
以下查詢提示了一些內(nèi)部標(biāo)識(shí)符:
dbGetQuery(con, "select _id, go_id, term from go_term limit 5")
## _id go_id term
## 1 30 GO:0000001 mitochondrion inheritance
## 2 32 GO:0000002 mitochondrial genome maintenance
## 3 33 GO:0000003 reproduction
## 4 37 GO:0042254 ribosome biogenesis
## 5 38 GO:0044183 protein binding involved in protein folding
我們可以將 mitochondrion inheritance
term追溯到父項(xiàng)和祖父母項(xiàng),如下所示:
dbGetQuery(con, "select * from go_bp_parents where _id=30")
## _id _parent_id relationship_type
## 1 30 26537 is_a
## 2 30 26540 is_a
dbGetQuery(con, "select _id, go_id, term from go_term where _id=26616")
## _id go_id
## 1 26616 GO:0048387
## term
## 1 negative regulation of retinoic acid receptor signaling pathway
dbGetQuery(con, "select * from go_bp_parents where _id=26616")
## _id _parent_id relationship_type
## 1 26616 8389 is_a
## 2 26616 26614 is_a
## 3 26616 26613 negatively_regulates
dbGetQuery(con, "select _id, go_id, term from go_term where _id=5932")
## _id go_id term
## 1 5932 GO:0019237 centromeric DNA binding
將 “mitochondrion inheritance” 視為過(guò)程“mitochondrion distribution”和 “organelle inheritance”在概念上的精練是有意義的曹货,這兩個(gè)term在數(shù)據(jù)庫(kù)中被為父項(xiàng)咆繁。
可以使用 GO_dbschema()
來(lái)查看整個(gè)數(shù)據(jù)庫(kù)模式。
KEGG
自Bioconductor誕生以來(lái)顶籽,KEGG的注釋就能在Bioconductor中人使用了玩般,但KEGG的數(shù)據(jù)庫(kù)使用權(quán)限已經(jīng)進(jìn)行了更改。當(dāng)我們使用KEGG.db
加載后會(huì)出現(xiàn)以下信息礼饱,如下所示:
> library(KEGG.db)
KEGG.db contains mappings based on older data because the original
resource was removed from the the public domain before the most
recent update was produced. This package should now be considered
deprecated and future versions of Bioconductor may not have it
available. Users who want more current data are encouraged to look
at the KEGGREST or reactome.db packages
因此我們可以關(guān)注KEGGREST這個(gè)包坏为,它需要聯(lián)網(wǎng)。這是一個(gè)非常有用的慨仿,基于Entrez標(biāo)識(shí)符的工具【酶現(xiàn)在我們查詢一下BRCA2的信息(它的EntrezID為675),如下所示:
library(KEGGREST)
brca2K = keggGet("hsa:675")
names(brca2K[[1]])
## [1] "ENTRY" "NAME" "DEFINITION" "ORTHOLOGY" "ORGANISM"
## [6] "PATHWAY" "DISEASE" "BRITE" "POSITION" "MOTIF"
## [11] "DBLINKS" "STRUCTURE" "AASEQ" "NTSEQ"
我們也可以通過(guò)keggGet
函數(shù)來(lái)獲取構(gòu)成通路模式的基因列表镰吆,如下所示:
brpat = keggGet("path:hsa05212")
names(brpat[[1]])
## [1] "ENTRY" "NAME" "DESCRIPTION" "CLASS" "PATHWAY_MAP"
## [6] "DISEASE" "DRUG" "ORGANISM" "GENE" "COMPOUND"
## [11] "KO_PATHWAY" "REFERENCE"
brpat[[1]]$GENE[seq(1,132,2)] # entrez gene ids
## [1] "3845" "5290" "5293" "5291" "5295" "5296" "8503" "9459"
## [9] "5879" "5880" "5881" "4790" "5970" "207" "208" "10000"
## [17] "1147" "3551" "8517" "572" "598" "842" "369" "673"
## [25] "5894" "5604" "5594" "5595" "5599" "5602" "5601" "5900"
## [33] "5898" "5899" "10928" "998" "7039" "1950" "1956" "2064"
## [41] "2475" "6198" "6199" "3716" "6774" "6772" "7422" "1029"
## [49] "1019" "1021" "595" "5925" "1869" "1870" "1871" "7157"
## [57] "1026" "1647" "4616" "10912" "581" "578" "1643" "51426"
## [65] "7040" "7042"
KEGGREST還有許多值得研究的地方帘撰,例如還可以查詢BRCA2(人類)關(guān)于胰腺癌途徑的靜態(tài)圖像,如下所示:
library(png)
library(grid)
brpng = keggGet("hsa05212", "image")
grid.raster(brpng)
其它本體
rols
包含有與EMBL-EBI連接的接口 Ontology Lookup Service.
library(rols)
oo = Ontologies()
oo
## Object of class 'Ontologies' with 198 entries
## GENEPIO, MP ... SEPIO, SIBO
oo[[1]]
## Ontology: Genomic Epidemiology Ontology (genepio)
## The Genomic Epidemiology Ontology (GenEpiO) covers vocabulary
## necessary to identify, document and research foodborne pathogens
## and associated outbreaks.
## Loaded: 2017-04-10 Updated: 2017-10-20 Version: 2017-04-09
## 4351 terms 137 properties 38 individuals
為了控制查詢檢索中涉及的網(wǎng)絡(luò)流量万皿,搜索分為幾個(gè)階段摧找,如下所示:
glis = OlsSearch("glioblastoma")
glis
## Object of class 'OlsSearch':
## query: glioblastoma
## requested: 20 (out of 502)
## response(s): 0
res = olsSearch(glis)
dim(res)
## NULL
resdf = as(res, "data.frame") # get content
resdf[1:4,1:4]
## id
## 1 ncit:class:http://purl.obolibrary.org/obo/NCIT_C3058
## 2 omit:http://purl.obolibrary.org/obo/OMIT_0007102
## 3 ordo:class:http://www.orpha.net/ORDO/Orphanet_360
## 4 hp:class:http://purl.obolibrary.org/obo/HP_0100843
## iri short_form label
## 1 http://purl.obolibrary.org/obo/NCIT_C3058 NCIT_C3058 Glioblastoma
## 2 http://purl.obolibrary.org/obo/OMIT_0007102 OMIT_0007102 Glioblastoma
## 3 http://www.orpha.net/ORDO/Orphanet_360 Orphanet_360 Glioblastoma
## 4 http://purl.obolibrary.org/obo/HP_0100843 HP_0100843 Glioblastoma
resdf[1,5] # full description for one instance
## [[1]]
## [1] "The most malignant astrocytic tumor (WHO grade IV). It is composed of poorly differentiated neoplastic astrocytes and it is characterized by the presence of cellular polymorphism, nuclear atypia, brisk mitotic activity, vascular thrombosis, microvascular proliferation and necrosis. It typically affects adults and is preferentially located in the cerebral hemispheres. It may develop from diffuse astrocytoma WHO grade II or anaplastic astrocytoma (secondary glioblastoma, IDH-mutant), but more frequently, it manifests after a short clinical history de novo, without evidence of a less malignant precursor lesion (primary glioblastoma, IDH- wildtype). (Adapted from WHO)"
ontologyIndex
包支持導(dǎo)入開(kāi)放生物本體(OBO, Open Biological Ontologies)格式的數(shù)據(jù),并含有用于查詢和可視化本體系統(tǒng)高效的工具牢硅。
通用基因集管理
GSEABase
包有一個(gè)用于管理基因集和集合的優(yōu)秀工具蹬耘。我們可以從MSigDb中導(dǎo)入膠質(zhì)母細(xì)胞瘤相關(guān)的基因集來(lái)說(shuō)明一下,如下所示:
library(GSEABase)
glioG = getGmt(system.file("gmt/glioSets.gmt", package="ph525x"))
## Warning in readLines(con, ...): incomplete final line found on '/
## Library/Frameworks/R.framework/Versions/3.4/Resources/library/ph525x/gmt/
## glioSets.gmt'
glioG
## GeneSetCollection
## names: BALDWIN_PRKCI_TARGETS_UP, BEIER_GLIOMA_STEM_CELL_DN, ..., ZHENG_GLIOBLASTOMA_PLASTICITY_UP (47 total)
## unique identifiers: ADA, AQP9, ..., ZFP28 (3671 total)
## types in collection:
## geneIdType: NullIdentifier (1 total)
## collectionType: NullCollection (1 total)
head(geneIds(glioG[[1]]))
## [1] "ADA" "AQP9" "ATP2B4" "ATP6V1G1" "CBX6" "CCDC165"
模式生物的統(tǒng)一减余,自我描述方法
OrganismDb
包簡(jiǎn)化了對(duì)注釋的訪問(wèn)综苔。還可以針對(duì)TxDb
和org.[Nn].eg.db
進(jìn)行直接查詢,如下所示:
library(Homo.sapiens)
class(Homo.sapiens)
## [1] "OrganismDb"
## attr(,"package")
## [1] "OrganismDbi"
Homo.sapiens
## OrganismDb Object:
## # Includes GODb Object: GO.db
## # With data about: Gene Ontology
## # Includes OrgDb Object: org.Hs.eg.db
## # Gene data about: Homo sapiens
## # Taxonomy Id: 9606
## # Includes TxDb Object: TxDb.Hsapiens.UCSC.hg19.knownGene
## # Transcriptome data about: Homo sapiens
## # Based on genome: hg19
## # The OrgDb gene id ENTREZID is mapped to the TxDb gene id GENEID .
tx = transcripts(Homo.sapiens)
## 'select()' returned 1:1 mapping between keys and columns
keytypes(Homo.sapiens)
## [1] "ACCNUM" "ALIAS" "CDSID" "CDSNAME"
## [5] "DEFINITION" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS"
## [9] "ENTREZID" "ENZYME" "EVIDENCE" "EVIDENCEALL"
## [13] "EXONID" "EXONNAME" "GENEID" "GENENAME"
## [17] "GO" "GOALL" "GOID" "IPI"
## [21] "MAP" "OMIM" "ONTOLOGY" "ONTOLOGYALL"
## [25] "PATH" "PFAM" "PMID" "PROSITE"
## [29] "REFSEQ" "SYMBOL" "TERM" "TXID"
## [33] "TXNAME" "UCSCKG" "UNIGENE" "UNIPROT"
columns(Homo.sapiens)
## [1] "ACCNUM" "ALIAS" "CDSCHROM" "CDSEND"
## [5] "CDSID" "CDSNAME" "CDSSTART" "CDSSTRAND"
## [9] "DEFINITION" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS"
## [13] "ENTREZID" "ENZYME" "EVIDENCE" "EVIDENCEALL"
## [17] "EXONCHROM" "EXONEND" "EXONID" "EXONNAME"
## [21] "EXONRANK" "EXONSTART" "EXONSTRAND" "GENEID"
## [25] "GENENAME" "GO" "GOALL" "GOID"
## [29] "IPI" "MAP" "OMIM" "ONTOLOGY"
## [33] "ONTOLOGYALL" "PATH" "PFAM" "PMID"
## [37] "PROSITE" "REFSEQ" "SYMBOL" "TERM"
## [41] "TXCHROM" "TXEND" "TXID" "TXNAME"
## [45] "TXSTART" "TXSTRAND" "TXTYPE" "UCSCKG"
## [49] "UNIGENE" "UNIPROT"
面向平臺(tái)的注釋
通過(guò)在NCBI GEO的GPL信息頁(yè)面 上對(duì)信息進(jìn)行排序位岔,我們就可以看到最常用的寡核苷陣列平臺(tái)(數(shù)據(jù)庫(kù)中有4760個(gè)系列)就是Affy Human Genome U133 plus 2.0 array (GPL 570)如筛。我們可以使用hgu133plus2.db
對(duì)這些數(shù)據(jù)進(jìn)行注釋,如下所示:
library(hgu133plus2.db)
##
hgu133plus2.db
## ChipDb object:
## | DBSCHEMAVERSION: 2.1
## | Db type: ChipDb
## | Supporting package: AnnotationDbi
## | DBSCHEMA: HUMANCHIP_DB
## | ORGANISM: Homo sapiens
## | SPECIES: Human
## | MANUFACTURER: Affymetrix
## | CHIPNAME: Human Genome U133 Plus 2.0 Array
## | MANUFACTURERURL: http://www.affymetrix.com/support/technical/byproduct.affx?product=hg-u133-plus
## | EGSOURCEDATE: 2015-Sep27
## | EGSOURCENAME: Entrez Gene
## | EGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
## | CENTRALID: ENTREZID
## | TAXID: 9606
## | GOSOURCENAME: Gene Ontology
## | GOSOURCEURL: ftp://ftp.geneontology.org/pub/go/godatabase/archive/latest-lite/
## | GOSOURCEDATE: 20150919
## | GOEGSOURCEDATE: 2015-Sep27
## | GOEGSOURCENAME: Entrez Gene
## | GOEGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
## | KEGGSOURCENAME: KEGG GENOME
## | KEGGSOURCEURL: ftp://ftp.genome.jp/pub/kegg/genomes
## | KEGGSOURCEDATE: 2011-Mar15
## | GPSOURCENAME: UCSC Genome Bioinformatics (Homo sapiens)
## | GPSOURCEURL: ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19
## | GPSOURCEDATE: 2010-Mar22
## | ENSOURCEDATE: 2015-Jul16
## | ENSOURCENAME: Ensembl
## | ENSOURCEURL: ftp://ftp.ensembl.org/pub/current_fasta
## | UPSOURCENAME: Uniprot
## | UPSOURCEURL: http://www.uniprot.org/
## | UPSOURCEDATE: Thu Oct 1 23:31:58 2015
##
## Please see: help('select') for usage information
這個(gè)資源(以及ChipDb類的所有實(shí)例)的基本目的是在探針集(probeset)識(shí)別符和更高層次的基因組注釋之間進(jìn)行映射抒抬。
有關(guān)探針的詳細(xì)信息(探針集的組成部分)已經(jīng)由那些后綴為probe
的文件包提供杨刨,如下所示:
library(hgu133plus2probe)
head(hgu133plus2probe)
## sequence x y Probe.Set.Name
## 1 CACCCAGCTGGTCCTGTGGATGGGA 718 317 1007_s_at
## 2 GCCCCACTGGACAACACTGATTCCT 1105 483 1007_s_at
## 3 TGGACCCCACTGGCTGAGAATCTGG 584 901 1007_s_at
## 4 AAATGTTTCCTTGTGCCTGCTCCTG 192 205 1007_s_at
## 5 TCCTTGTGCCTGCTCCTGTACTTGT 844 979 1007_s_at
## 6 TGCCTGCTCCTGTACTTGTCCTCAG 537 971 1007_s_at
## Probe.Interrogation.Position Target.Strandedness
## 1 3330 Antisense
## 2 3443 Antisense
## 3 3512 Antisense
## 4 3563 Antisense
## 5 3570 Antisense
## 6 3576 Antisense
dim(hgu133plus2probe)
## [1] 604258 6
將探針集標(biāo)識(shí)符映射到基因水平的信息可以提示一些有意思的歧視,如下所示:
select(hgu133plus2.db, keytype="PROBEID",
columns=c("SYMBOL", "GENENAME", "PATH", "MAP"), keys="1007_s_at")
## 'select()' returned 1:many mapping between keys and columns
## PROBEID SYMBOL GENENAME PATH
## 1 1007_s_at DDR1 discoidin domain receptor tyrosine kinase 1 <NA>
## 2 1007_s_at MIR4640 microRNA 4640 <NA>
## MAP
## 1 6p21.33
## 2 6p21.33
顯然擦剑,該探針集合可以用于mRNA和miRNA豐度的定量妖胀。作為穩(wěn)定的檢查芥颈,我們可以看到伶唯,不同的符號(hào)映射到了相同的細(xì)胞帶(最后一句不懂呐舔,原文為: As a sanity check we see that the distinct symbols map to the same cytoband)。
總結(jié)
我們現(xiàn)在已經(jīng)擁有了含有從核酸到通路水平的許多數(shù)據(jù)递礼。通過(guò)Bioconductor.org上的View就可以查看現(xiàn)有的一些資源怕品。