GDAS008-IRanges和GRanges

title: GDAS008-IRanges和GRanges
date: 2019-09-078 12:0:00
type: "tags"
tags:

Bioconductor
categories:
Genomics Data Analysis Series

前言

原始的Rmd文檔可以參考Github上逻族。

IRanges 和 GRanges 對象是Bioconductor基礎(chǔ)數(shù)據(jù)類型的核心成員嚎卫，其中 IRanges 定義了 integer ranges 售担，它用于解決基因組上的位置問題逢勾，而 GRanges 則包含了染色體和DNA鏈的信息酷宵。這里我們簡介紹一下這兩個對象荆陆，以及如何操作 IRanges 和 GRanges 。

IRanges

第一步是加載IRanges包腐芍，如下所示：

library(IRanges)

IRanges 函數(shù)定義了區(qū)間范圍(interval ranges)差导。如果你輸入2個數(shù)據(jù)试躏，那么就表示一個區(qū)間猪勇，例如 [5,10]={5,6,7,8,9,10}，它的寬度是6颠蕴。另外泣刹，如果我們要查看一個 IRanges 定義的區(qū)間寬度，我們使用的width()函數(shù)犀被，而非 length()函數(shù)椅您，如下所示：

ir <- IRanges(5,10)
ir
## IRanges object with 1 range and 0 metadata columns:
##           start       end     width
##       <integer> <integer> <integer>
##   [1]         5        10         6
start(ir)
## [1] 5
end(ir)
## [1] 10
width(ir)
## [1] 6
# for detailed information on the IRanges class:
# ?IRanges
length(ir)
# [1] 1

一個單獨(dú)的IRanges對象可以含有多個范圍。現(xiàn)在我們可以在一個向量中指定start寡键，在另外一個向量中指定end掀泳，如下所示：

IRanges(start=c(3,5,17), end=c(10,8,20))
## IRanges object with 3 ranges and 0 metadata columns:
##           start       end     width
##       <integer> <integer> <integer>
##   [1]         3        10         8
##   [2]         5         8         4
##   [3]        17        20         4

內(nèi)部范圍(Intra-range)操作

現(xiàn)在我們繼續(xù)使用單個范圍[5,10]，我們查看內(nèi)部范圍(intra-range){…,?2,?1,0,1,2,…}的一個數(shù)西轩，如下所示：

# full details on the intra-range methods:
# ?"intra-range-methods"
ir
## IRanges object with 1 range and 0 metadata columns:
##           start       end     width
##       <integer> <integer> <integer>
##   [1]         5        10         6
shift(ir, -2)
## IRanges object with 1 range and 0 metadata columns:
##           start       end     width
##       <integer> <integer> <integer>
##   [1]         3         8         6

這里我們展示了應(yīng)用于ir的多個不同操作的結(jié)果员舵，結(jié)果如下所示：

ir
## IRanges object with 1 range and 0 metadata columns:
##           start       end     width
##       <integer> <integer> <integer>
##   [1]         5        10         6
narrow(ir, start=2)
## IRanges object with 1 range and 0 metadata columns:
##           start       end     width
##       <integer> <integer> <integer>
##   [1]         6        10         5
narrow(ir, end=5)
## IRanges object with 1 range and 0 metadata columns:
##           start       end     width
##       <integer> <integer> <integer>
##   [1]         5         9         5
flank(ir, width=3, start=TRUE, both=FALSE)
## IRanges object with 1 range and 0 metadata columns:
##           start       end     width
##       <integer> <integer> <integer>
##   [1]         2         4         3
flank(ir, width=3, start=FALSE, both=FALSE)
## IRanges object with 1 range and 0 metadata columns:
##           start       end     width
##       <integer> <integer> <integer>
##   [1]        11        13         3
flank(ir, width=3, start=TRUE, both=TRUE)
## IRanges object with 1 range and 0 metadata columns:
##           start       end     width
##       <integer> <integer> <integer>
##   [1]         2         7         6
ir * 2
## IRanges object with 1 range and 0 metadata columns:
##           start       end     width
##       <integer> <integer> <integer>
##   [1]         6         8         3
ir * -2
## IRanges object with 1 range and 0 metadata columns:
##           start       end     width
##       <integer> <integer> <integer>
##   [1]         2        13        12
ir + 2
## IRanges object with 1 range and 0 metadata columns:
##           start       end     width
##       <integer> <integer> <integer>
##   [1]         3        12        10
ir - 2
## IRanges object with 1 range and 0 metadata columns:
##           start       end     width
##       <integer> <integer> <integer>
##   [1]         7         8         2
resize(ir, 1)
## IRanges object with 1 range and 0 metadata columns:
##           start       end     width
##       <integer> <integer> <integer>
##   [1]         5         5         1

我們使用下圖展示了上面操作的結(jié)果。紅色柱狀部分顯示了最初的ir范圍藕畔。掌握這些操作的最好方法是在自己定義的范圍內(nèi)在控制臺中親自操作一下马僻。

plot of chunk unnamed-chunk-6

內(nèi)部范圍方法

針對intra-range有著一系列的方法。還有一些方法是針對一系列范圍的操作注服，它的輸出取決于所有的范圍韭邓，因此這些方法與intra-range方法不同，后者不會改變輸出∪艿埽現(xiàn)在我們一些案例來說明一下女淑。range()函數(shù)提供了從最左側(cè)到最右側(cè)的整數(shù)輸出，如下所示：

注：原文我不太理解辜御，翻譯的不精準(zhǔn)诗力，原文如下：

There are also a set of inter-range methods. These are
operations which work on a set of ranges, and the output depends on all
the ranges, thus distinguishes these methods from the intra-range methods, for which the other ranges in the set do not change the output. This is best explained with some examples. The range function gives the integer range from the start of the leftmost range to the end of the rightmost range:

# full details on the inter-range methods:
# ?"inter-range-methods"
(ir <- IRanges(start=c(3,5,17), end=c(10,8,20)))
## IRanges object with 3 ranges and 0 metadata columns:
##           start       end     width
##       <integer> <integer> <integer>
##   [1]         3        10         8
##   [2]         5         8         4
##   [3]        17        20         4
range(ir)
## IRanges object with 1 range and 0 metadata columns:
##           start       end     width
##       <integer> <integer> <integer>
##   [1]         3        20        18

reduce 函數(shù)能折疊這些范圍，以便于整個可以只被輸出范圍中的一個范圍覆蓋，如下所示：

reduce(ir)
## IRanges object with 2 ranges and 0 metadata columns:
##           start       end     width
##       <integer> <integer> <integer>
##   [1]         3        10         8
##   [2]        17        20         4

注：ir原來有3個范圍苇本，其中[5,8]范圍是在[3,10]之間袜茧，因此前者被折疊到了后者之中。

gaps 函數(shù)能夠給出上面兩個范圍的區(qū)間瓣窄，但是它不覆蓋任何ir中的范圍笛厦，如下所示：

gaps(ir)
## IRanges object with 1 range and 0 metadata columns:
##           start       end     width
##       <integer> <integer> <integer>
##   [1]        11        16         6

disjoin 會打斷 ir的范圍，并它們分成不同的范圍俺夕，如下所示：

disjoin返回一個disjoint對象裳凸，它會查看參數(shù)中端點的并集，返回不相交的對象劝贸。換話句講姨谷，結(jié)果中由區(qū)間的最大長度范圍組成，在該范圍上映九，對象x中的重疊范圍集合是相同的梦湘，并且至少為1。

disjoin(ir)
## IRanges object with 4 ranges and 0 metadata columns:
##           start       end     width
##       <integer> <integer> <integer>
##   [1]         3         4         2
##   [2]         5         8         4
##   [3]         9        10         2
##   [4]        17        20         4

現(xiàn)在我們使用示意圖來說明一下：

image

第一二行是原始的范圍件甥，第四行是經(jīng)過disjoin處理后的范圍捌议。

GRanges

GRanges對象包含了Iranges，它含有2個方面的重要信息：

染色體信息（Bioconductor稱為 seqnames）引有。
DNA鏈瓣颅。

鏈的信息可以指定為 + 號或 - ，或者是保留未指定的 * 譬正。正義鏈的特征則是在編號線上從左到右的生物學(xué)方面宫补，而負(fù)鏈特征則是從右到左。就IRanges而言曾我，正鏈特征是從start到 end ,負(fù)鏈則是從 end 到 start 粉怕。這初看起來比較混亂，但這是必需的您单，因此 width 被定義為 end - start + 1斋荞，并且不允許有負(fù)的寬度范圍。因為DNA有兩條鏈虐秦，這兩條鏈具有相反的方向性平酿，因此當(dāng)我們說DNA時，可以只指明其中的一條鏈悦陋。

通過IRnage蜈彼、染色體名稱和一條鏈，我們可以確定我們與另外一個研究人員所指的DNA分子具有相同的區(qū)域和鏈俺驶，因為我們使用的相同的基因組構(gòu)建的幸逆。GRanges對象中還可以包括其它部分的信息棍辕，不過上述的兩個信息則是最重要的。

library(GenomicRanges)

現(xiàn)在我們創(chuàng)建一個虛構(gòu)染色體chrZ 上的兩個范圍还绘。我們會說這兩個范圍指的基因組hg19上的范圍楚昭。因為我們沒有將我們的基因組鏈接到數(shù)據(jù)庫，因此我們可以指定一個在hg19中并不存在的染色體拍顷，如下所示：

gr <- GRanges("chrZ", IRanges(start=c(5,10),end=c(35,45)),
              strand="+", seqlengths=c(chrZ=100L))
gr
## GRanges object with 2 ranges and 0 metadata columns:
##       seqnames    ranges strand
##          <Rle> <IRanges>  <Rle>
##   [1]     chrZ  [ 5, 35]      +
##   [2]     chrZ  [10, 45]      +
##   -------
##   seqinfo: 1 sequence from an unspecified genome
genome(gr) <- "hg19"
gr
## GRanges object with 2 ranges and 0 metadata columns:
##       seqnames    ranges strand
##          <Rle> <IRanges>  <Rle>
##   [1]     chrZ  [ 5, 35]      +
##   [2]     chrZ  [10, 45]      +
##   -------
##   seqinfo: 1 sequence from hg19 genome

我們可以調(diào)用 seqnames 和 seqlengths 來查看上面的信息：

seqnames(gr)
## factor-Rle of length 2 with 1 run
##   Lengths:    2
##   Values : chrZ
## Levels(1): chrZ
seqlengths(gr)
## chrZ 
##  100

我們可以使用 shift 函數(shù)（就像在IRanges中使用的一樣）來處理GRanges抚太。但是，當(dāng)我們嘗試超過染色體長度之外的范圍時昔案，就會出現(xiàn)警告信息尿贫，如下所示：

shift(gr, 10)
## GRanges object with 2 ranges and 0 metadata columns:
##       seqnames    ranges strand
##          <Rle> <IRanges>  <Rle>
##   [1]     chrZ  [15, 45]      +
##   [2]     chrZ  [20, 55]      +
##   -------
##   seqinfo: 1 sequence from hg19 genome
shift(gr, 80)
## Warning in valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE): GRanges object contains 2 out-of-bound ranges located on sequence
##   chrZ. Note that only ranges located on a non-circular sequence
##   whose length is not NA can be considered out-of-bound (use
##   seqlengths() and isCircular() to get the lengths and circularity
##   flags of the underlying sequences). You can use trim() to trim
##   these ranges. See ?`trim,GenomicRanges-method` for more
##   information.
## GRanges object with 2 ranges and 0 metadata columns:
##       seqnames    ranges strand
##          <Rle> <IRanges>  <Rle>
##   [1]     chrZ [85, 115]      +
##   [2]     chrZ [90, 125]      +
##   -------
##   seqinfo: 1 sequence from hg19 genome

如果我們使用 trim 處理這個范圍，我們就能獲取剩下的范圍踏揣，而不考慮超出染色體長的部分：

trim(shift(gr, 80))
## Warning in valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE): GRanges object contains 2 out-of-bound ranges located on sequence
##   chrZ. Note that only ranges located on a non-circular sequence
##   whose length is not NA can be considered out-of-bound (use
##   seqlengths() and isCircular() to get the lengths and circularity
##   flags of the underlying sequences). You can use trim() to trim
##   these ranges. See ?`trim,GenomicRanges-method` for more
##   information.
## GRanges object with 2 ranges and 0 metadata columns:
##       seqnames    ranges strand
##          <Rle> <IRanges>  <Rle>
##   [1]     chrZ [85, 100]      +
##   [2]     chrZ [90, 100]      +
##   -------
##   seqinfo: 1 sequence from hg19 genome

我們可以使用 mcols 函數(shù)（鏈表示元數(shù)據(jù)列）向每個范圍添加列信息庆亡。需要注意的是，這個操作對IRanges也是可行的捞稿。我們還可以通過指定 NULL 來移除列信息又谋，如下所示：

mcols(gr)
## DataFrame with 2 rows and 0 columns
mcols(gr)$value <- c(-1,4)
gr
## GRanges object with 2 ranges and 1 metadata column:
##       seqnames    ranges strand |     value
##          <Rle> <IRanges>  <Rle> | <numeric>
##   [1]     chrZ  [ 5, 35]      + |        -1
##   [2]     chrZ  [10, 45]      + |         4
##   -------
##   seqinfo: 1 sequence from hg19 genome
mcols(gr)$value <- NULL
> gr
# GRanges object with 2 ranges and 0 metadata columns:
#      seqnames    ranges strand
#         <Rle> <IRanges>  <Rle>
#  [1]     chrZ      5-35      +
#  [2]     chrZ     10-45      +
#  -------
#  seqinfo: 1 sequence from hg19 genome

GRangesList

當(dāng)我們涉及基因時，我們通過創(chuàng)建一個GRanges列表是很有用的括享，例如用于表示分組信息（比如每個基因的外顯子）搂根。該列表的元素是基因珍促，并且在每個元素中铃辖，外顯子的范圍被定義為GRanges。

gr2 <- GRanges("chrZ",IRanges(11:13,51:53))
grl <- GRangesList(gr, gr2)
grl
## GRangesList object of length 2:
## [[1]] 
## GRanges object with 2 ranges and 0 metadata columns:
##       seqnames    ranges strand
##          <Rle> <IRanges>  <Rle>
##   [1]     chrZ  [ 5, 35]      +
##   [2]     chrZ  [10, 45]      +
## 
## [[2]] 
## GRanges object with 3 ranges and 0 metadata columns:
##       seqnames   ranges strand
##   [1]     chrZ [11, 51]      *
##   [2]     chrZ [12, 52]      *
##   [3]     chrZ [13, 53]      *
## 
## -------
## seqinfo: 1 sequence from hg19 genome

GRangesList 對象中的長度就是其中的GRanges 對象個數(shù)猪叙。為了獲取每個GRanges的長度娇斩，我們可以調(diào)用 elementNROWS。我們可以通過兩個方括號的典型列表索引方法來建立索引穴翩，如下所示：

length(grl)
## [1] 2
elementNROWS(grl)
## [1] 2 3
grl[[1]]
## GRanges object with 2 ranges and 0 metadata columns:
##       seqnames    ranges strand
##          <Rle> <IRanges>  <Rle>
##   [1]     chrZ  [ 5, 35]      +
##   [2]     chrZ  [10, 45]      +
##   -------
##   seqinfo: 1 sequence from hg19 genome

width函數(shù)會返回一個 integerList犬第。如果使用sum，我們會得到列表中每個GRanges對象的寬度向量芒帕，如下所示：

width(grl)
## IntegerList of length 2
## [[1]] 31 36
## [[2]] 41 41 41
sum(width(grl))
## [1]  67 123

我們可以像以前那樣添加元數(shù)據(jù)(metadata)歉嗓，現(xiàn)在每個GRange對象都有一行元數(shù)據(jù)。當(dāng)我們輸出GRangesList時它們并不業(yè)顯示出來背蟆，但我們可以通過 mcols 進(jìn)行訪問鉴分，如下所示：

mcols(grl)$value <- c(5,7)
grl
## GRangesList object of length 2:
## [[1]] 
## GRanges object with 2 ranges and 0 metadata columns:
##       seqnames    ranges strand
##          <Rle> <IRanges>  <Rle>
##   [1]     chrZ  [ 5, 35]      +
##   [2]     chrZ  [10, 45]      +
## 
## [[2]] 
## GRanges object with 3 ranges and 0 metadata columns:
##       seqnames   ranges strand
##   [1]     chrZ [11, 51]      *
##   [2]     chrZ [12, 52]      *
##   [3]     chrZ [13, 53]      *
## 
## -------
## seqinfo: 1 sequence from hg19 genome
mcols(grl)
## DataFrame with 2 rows and 1 column
##       value
##   <numeric>
## 1         5
## 2         7

findOverlaps與%over%

現(xiàn)在我們來看兩個比較GRanges對象的常用方法，首先我們要創(chuàng)建兩組范圍如下所示：

(gr1 <- GRanges("chrZ",IRanges(c(1,11,21,31,41),width=5),strand="*"))
## GRanges object with 5 ranges and 0 metadata columns:
##       seqnames    ranges strand
##          <Rle> <IRanges>  <Rle>
##   [1]     chrZ  [ 1,  5]      *
##   [2]     chrZ  [11, 15]      *
##   [3]     chrZ  [21, 25]      *
##   [4]     chrZ  [31, 35]      *
##   [5]     chrZ  [41, 45]      *
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths
(gr2 <- GRanges("chrZ",IRanges(c(19,33),c(38,35)),strand="*"))
## GRanges object with 2 ranges and 0 metadata columns:
##       seqnames    ranges strand
##          <Rle> <IRanges>  <Rle>
##   [1]     chrZ  [19, 38]      *
##   [2]     chrZ  [33, 35]      *
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths

findOverlaps 會返回一個 Hits 對象带膀，這個對象含有關(guān)于檢索范圍的信息（第一個參數(shù)）志珍，這個范圍與主題（第二個參數(shù)）有哪些地方重疊。還有許多參數(shù)用于指定計算哪種類型的重疊垛叨，如下所示：

fo <- findOverlaps(gr1, gr2)
fo
## Hits object with 3 hits and 0 metadata columns:
##       queryHits subjectHits
##       <integer>   <integer>
##   [1]         3           1
##   [2]         4           1
##   [3]         4           2
##   -------
##   queryLength: 5 / subjectLength: 2
queryHits(fo)
## [1] 3 4 4
subjectHits(fo)
## [1] 1 1 2

得到重疊信息的另外一種方式就是使用 %over% 伦糯，它返回一個邏輯向量，這個向量表示了第一個參數(shù)中的范圍與第二個參數(shù)中任何范圍重疊的信息，如下所示：

gr1 %over% gr2
## [1] FALSE FALSE  TRUE  TRUE FALSE
gr1[gr1 %over% gr2]
## GRanges object with 2 ranges and 0 metadata columns:
##       seqnames    ranges strand
##          <Rle> <IRanges>  <Rle>
##   [1]     chrZ  [21, 25]      *
##   [2]     chrZ  [31, 35]      *
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths

需要注意的是敛纲，這兩種操作方式都是鏈特異性的喂击，雖然 findOverlaps 有一個 ignore.strand 選擇參數(shù)，如下所示：

gr1 <- GRanges("chrZ",IRanges(1,10),strand="+")
gr2 <- GRanges("chrZ",IRanges(1,10),strand="-")
gr1 %over% gr2
## [1] FALSE

Rle 和Views

最后淤翔，我們來簡單了解一下由IRanges定義的兩個相關(guān)類惭等，即Rle類和Views類。Rle全稱為run-length encoding办铡，它表示的是重復(fù)數(shù)據(jù)的壓縮工辞做，用于代替類似于[1, 1, 1, 1]這樣的儲存。

我們只用儲存數(shù)據(jù)1寡具，以及重復(fù)次數(shù)4即可秤茅。數(shù)據(jù)越重復(fù)，使用Rle對象的壓縮效果就越明顯童叠。

我們使用 str 來查看 Rle的內(nèi)部結(jié)構(gòu)框喳，用于顯示它的儲存數(shù)據(jù)和重復(fù)次數(shù)，如下所示：

(r <- Rle(c(1,1,1,0,0,-2,-2,-2,rep(-1,20))))
## numeric-Rle of length 28 with 4 runs
##   Lengths:  3  2  3 20
##   Values :  1  0 -2 -1
str(r)
## Formal class 'Rle' [package "S4Vectors"] with 4 slots
##   ..@ values         : num [1:4] 1 0 -2 -1
##   ..@ lengths        : int [1:4] 3 2 3 20
##   ..@ elementMetadata: NULL
##   ..@ metadata       : list()
as.numeric(r)
##  [1]  1  1  1  0  0 -2 -2 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
## [24] -1 -1 -1 -1 -1

Views對象可以被為一個“窗口”用于查看一個序列信息厦坛，如下所示：

A Views object can be thought of as “windows” looking into a sequence.

(v <- Views(r, start=c(4,2), end=c(7,6)))
## Views on a 28-length Rle subject
## 
## views:
##     start end width
## [1]     4   7     4 [ 0  0 -2 -2]
## [2]     2   6     5 [ 1  1  0  0 -2]

需要注意的是五垮，Veiws對象的內(nèi)部結(jié)構(gòu)只是原始對象，以及指定窗口的IRangs杜秸。使用Views的最好好處就是放仗，當(dāng)原始對象沒有儲存在內(nèi)存中時，在這種情況下撬碟，View對象就是一個輕量級的類诞挨，它能幫助我們引用子序列，而不必將整個序列都加載到內(nèi)存中呢蛤，如下所示：

str(v)
## Formal class 'RleViews' [package "IRanges"] with 5 slots
##   ..@ subject        :Formal class 'Rle' [package "S4Vectors"] with 4 slots
##   .. .. ..@ values         : num [1:4] 1 0 -2 -1
##   .. .. ..@ lengths        : int [1:4] 3 2 3 20
##   .. .. ..@ elementMetadata: NULL
##   .. .. ..@ metadata       : list()
##   ..@ ranges         :Formal class 'IRanges' [package "IRanges"] with 6 slots
##   .. .. ..@ start          : int [1:2] 4 2
##   .. .. ..@ width          : int [1:2] 4 5
##   .. .. ..@ NAMES          : NULL
##   .. .. ..@ elementType    : chr "integer"
##   .. .. ..@ elementMetadata: NULL
##   .. .. ..@ metadata       : list()
##   ..@ elementType    : chr "ANY"
##   ..@ elementMetadata: NULL
##   ..@ metadata       : list()

基因組元件的應(yīng)用：鏈特異性操作

在這一部分惶傻，我們使用一個小型的范圍來說明一下基因的范圍內(nèi)(intra-range)操作，包括reduce, disjoin與gaps其障。我們添加鏈和seqname信息银室，并顯示如想調(diào)整大小和側(cè)翼(flank)用于識別TSS和啟動子區(qū)域，如下所示：

一組簡單的范圍

ir <- IRanges(c(3, 8, 14, 15, 19, 34, 40),
  width = c(12, 6, 6, 15, 6, 2, 7))

現(xiàn)在我們使用圖片形式查看一下ir励翼，以及內(nèi)部范圍(intra-range)操作蜈敢，如下所示：

par(mfrow=c(4,1), mar=c(4,2,2,2))
plotRanges(ir, xlim=c(0,60))
plotRanges(reduce(ir), xlim=c(0,60))
plotRanges(disjoin(ir), xlim=c(0,60))
plotRanges(gaps(ir), xlim=c(0,60))

plot of chunk lkir

注：plotRange()函數(shù)似乎不在這個包中，網(wǎng)上找到了它的原代碼抚笔，如下所示：

plotRanges <- function(x, xlim = x, main = deparse(substitute(x)),
                       col = "black", sep = 0.5, ...)
{
  height <- 1
  if (is(xlim, "Ranges"))
    xlim <- c(min(start(xlim)), max(end(xlim)))
  bins <- disjointBins(IRanges(start(x), end(x) + 1))
  plot.new()
  plot.window(xlim, c(0, max(bins)*(height + sep)))
  ybottom <- bins * (sep + height) - height
  rect(start(x)-0.5, ybottom, end(x)+0.5, ybottom + height, col = col, ...)
  title(main)
  axis(1)
}

reduce(x)函數(shù)生成一組不重疊的范圍扶认，它覆蓋了x中的所有位點。這種操作用于生了你許多轉(zhuǎn)錄本的基因模式的復(fù)雜性殊橙，其中我們可能只想要轉(zhuǎn)錄出已知區(qū)間的地址辐宾，不考慮transcript of residence狱从。

disjoin(x)生成一組覆蓋x的所有位置的范圍，使得分離輸出中的任何范圍都不會與x中的任何間隔的端點重疊叠纹。這為我們提供了連續(xù)間隔最大可能的集合季研，這些連續(xù)間隔在原始間隔集中具有端點的地方被分割開。

gaps(x)產(chǎn)生一組覆蓋[start(X)誉察，end(X)]中未被x中任何范圍覆蓋的位置的范圍与涡。它能給出編碼序列位置和外顯子間隔，這可用于枚舉內(nèi)含子持偏。

GRanges的擴(kuò)展

現(xiàn)在我們添加上染色體與鏈信息驼卖，如下所示：

library(GenomicRanges)
gir = GRanges(seqnames="chr1", ir, strand=c(rep("+", 4), rep("-",3)))

讓我們假設(shè)間隔代表基因。下圖說明了轉(zhuǎn)錄起始位點(綠色)鸿秆、上游啟動子區(qū)域(紫色)酌畜、下游啟動子區(qū)域(棕色)的位置，如下所示：

par(mfrow=c(4,1), mar=c(4,2,2,2))
plotGRanges(gir, xlim=c(0,60))
# 
plotGRanges(resize(gir,1), xlim=c(0,60),col="green")
plotGRanges(flank(gir,3), xlim=c(0,60), col="purple")
plotGRanges(flank(gir,2,start=FALSE), xlim=c(0,60), col="brown")

plot of chunk dopr

注：plotGRanges的源代碼如下所示：

plotGRanges = function (x, xlim = x, col = "black", sep = 0.5, xlimits = c(0, 
    60), ...) 
{
    main = deparse(substitute(x))
    ch = as.character(seqnames(x)[1])
    x = ranges(x)
    height <- 1
    if (is(xlim, "Ranges")) 
        xlim <- c(min(start(xlim)), max(end(xlim)))
    bins <- disjointBins(IRanges(start(x), end(x) + 1))
    plot.new()
    plot.window(xlim = xlimits, c(0, max(bins) * (height + sep)))
    ybottom <- bins * (sep + height) - height
    rect(start(x) - 0.5, ybottom, end(x) + 0.5, ybottom + height, 
        col = col, ...)
    title(main, xlab = ch)
    axis(1)
}

請注意卿叽，我們不需要采取特殊步驟來處理鏈中的差異

甲基化微陣列數(shù)據(jù)的可視化

在我們討論 SummarizedExperiment applications時桥胞，我們輸入了由Illumina 450k甲基化微陣列生成的數(shù)據(jù)。

在這一部分中考婴，我們將介紹如何使用GenomicRanges和Gviz來研究一下在基因水平注釋下的甲基化模型贩虾。這個思路非常簡單：只需從所有樣本中提取M值（甲基化與總DNA 比值的基因座特定的估計的log轉(zhuǎn)換），并在選定基因的基因模型中繪制出來沥阱，如下所示：

我們回顧一下數(shù)據(jù)是如何獲取和導(dǎo)入的：

library(ArrayExpress)
if (!file.exists("E-MTAB-5797.sdrf.txt")) nano = getAE("E-MTAB-5797")
library(minfi)
pref = unique(substr(dir(patt="idat"),1,17)) # find the prefix strings
raw = read.metharray(pref)
glioMeth = preprocessQuantile(raw) # generate SummarizedExperiment
## [preprocessQuantile] Mapping to genome.
## [preprocessQuantile] Fixing outliers.
## [preprocessQuantile] Quantile normalizing.

這些步驟需要網(wǎng)聯(lián)缎罢，并且需要花幾分鐘。

一旦我們的會話中有 glioMeth 喳钟，需要將以下代碼添加到會話中屁使。我們將會介紹它是如何工作的在岂。

MbyGene = function(mset, symbol="TP53", rad=5000) {
# phase 1: annotated GRanges for the gene
 require(erma)
 require(Gviz)
 gmod = suppressMessages(genemodel(symbol))     # erma utility
 gseq = as.character(seqnames(gmod)[1])
 gmod$transcript = symbol
# phase 2: filter down to the region of interest
 mlim = mset[which(seqnames(mset)==gseq),] # restrict to chromosome
 # now focus the methylation data to vicinity of gene
 d1 = subsetByOverlaps(GRanges(rowRanges(mlim),,, getM(mlim)), 
         range(gmod)+rad)
# phase 3: use Gviz
 plotTracks(list(DataTrack(d1), 
              GeneRegionTrack(gmod, 
               transcriptAnnotation="transcript", name=gseq), 
              GenomeAxisTrack(name=gseq, showTitle=TRUE)))
}

代碼表明了三個階段：獲取基因區(qū)域奔则，并添加轉(zhuǎn)錄本(transcript)注釋用于繪制所有外顯子的結(jié)合；將GenomicRatioSet（繼承自

對代碼的注釋表明了三個階段：獲取基因區(qū)并添加用于所有外顯子聯(lián)合的信息性繪圖的轉(zhuǎn)錄注釋蔽午；將GenomicRatioSet（繼承自RangedSummarizedExperiment）縮小到感興趣的間隔易茬，這個間隔是由基因模式和rad參數(shù)確定；使用Gviz來構(gòu)建可用于繪圖的對象及老，并繪制出來抽莱。

Gviz的詳細(xì)信息在該包的文檔中。現(xiàn)在我們繪制出以基因為中心的圖形骄恶，如下所示：

MbyGene(glioMeth, symbol="TERT")

image

從上圖中我們可以發(fā)現(xiàn)這些信息：

基因組背景（以兆(megabase)為單位的染色體和區(qū)域）食铐。
感興趣的基因在單行中的結(jié)構(gòu)，它們表示了表達(dá)間隔的聯(lián)合僧鲁。
450k針近的位置（藍(lán)色數(shù)據(jù)點的x坐標(biāo)）虐呻。
甲基化估計中的樣本間變異象泵。
所選基因組區(qū)域的甲基化變異。

在6x模塊中斟叼，我們將學(xué)習(xí)如何使用額外的軟件包來創(chuàng)建這種類型的交互式展示偶惠，允許我們選擇基因并隨意放大感興趣的區(qū)域。

腳注

更多的信息可以查看 GenomicRanges 包朗涩，以及查閱PLOS Comp Bio上的文獻(xiàn)忽孽，它是GenomicRanges的作者寫的：

http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003118

使用vignettes也能查看這個包的眾多功能文檔，如下所示：

browseVignettes("GenomicRanges")

或者是使用help函數(shù)：

help(package="GenomicRanges", help_type="html")

對于Bed Tools的用戶來說谢床，HelloRanges包對于在BEd和GRanges概念框架之間轉(zhuǎn)換概念非常有用兄一。

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末，一起剝皮案震驚了整個濱河市识腿，隨后出現(xiàn)的幾起案子瘾腰，更是在濱河造成了極大的恐慌，老刑警劉巖覆履，帶你破解...
沈念sama閱讀 205,874評論 6贊 479
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件蹋盆，死亡現(xiàn)場離奇詭異，居然都是意外死亡硝全，警方通過查閱死者的電腦和手機(jī)栖雾，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 88,151評論 2贊 382
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來伟众，“玉大人析藕，你說我怎么就攤上這事〉氏幔” “怎么了账胧？”我有些...
開封第一講書人閱讀 152,270評論 0贊 342
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵，是天一觀的道長先紫。經(jīng)常有香客問我治泥，道長，這世上最難降的妖魔是什么遮精？我笑而不...
開封第一講書人閱讀 55,137評論 1贊 278
?港島之戀（遺憾婚禮）
正文為了忘掉前任居夹，我火速辦了婚禮，結(jié)果婚禮上本冲，老公的妹妹穿的比我還像新娘准脂。我一直安慰自己，他們只是感情好檬洞，可當(dāng)我...
茶點故事閱讀 64,116評論 5贊 370
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布狸膏。她就那樣靜靜地躺著，像睡著了一般添怔。火紅的嫁衣襯著肌膚如雪湾戳。梳的紋絲不亂的頭發(fā)上闷板，一...
開封第一講書人閱讀 48,935評論 1贊 283
城市分裂傳說
那天，我揣著相機(jī)與錄音院塞，去河邊找鬼遮晚。笑死，一個胖子當(dāng)著我的面吹牛拦止，可吹牛的內(nèi)容都是我干的县遣。我是一名探鬼主播，決...
沈念sama閱讀 38,261評論 3贊 399
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼汹族，長吁一口氣：“原來是場噩夢啊……” “哼萧求！你這毒婦竟也來了？” 一聲冷哼從身側(cè)響起顶瞒，我...
開封第一講書人閱讀 36,895評論 0贊 259
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤夸政，失蹤者是張志新（化名）和其女友劉穎，沒想到半個月后榴徐，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體守问，經(jīng)...
沈念sama閱讀 43,342評論 1贊 300
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點故事閱讀 35,854評論 2贊 323
?白月光啟示錄
正文我和宋清朗相戀三年坑资，在試婚紗的時候發(fā)現(xiàn)自己被綠了耗帕。大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
茶點故事閱讀 37,978評論 1贊 333
活死人
序言：一個原本活蹦亂跳的男人離奇死亡袱贮，死狀恐怖仿便，靈堂內(nèi)的尸體忽然破棺而出，到底是詐尸還是另有隱情攒巍，我是刑警寧澤嗽仪，帶...
沈念sama閱讀 33,609評論 4贊 322
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布，位于F島的核電站柒莉，受9級特大地震影響闻坚，放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜常柄，卻給世界環(huán)境...
茶點故事閱讀 39,181評論 3贊 307
男人毒藥：我在死后第九天來索命
文/蒙蒙一鲤氢、第九天我趴在偏房一處隱蔽的房頂上張望。院中可真熱鬧西潘，春花似錦、人聲如沸哨颂。這莊子的主人今日做“春日...
開封第一講書人閱讀 30,182評論 0贊 19
一樁弒父案，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽威恼。三九已至品姓，卻和暖如春寝并，著一層夾襖步出監(jiān)牢的瞬間，已是汗流浹背腹备。一陣腳步聲響...
開封第一講書人閱讀 31,402評論 1贊 260
情欲美人皮
我被黑心中介騙來泰國打工衬潦，沒想到剛下飛機(jī)就差點兒被人妖公主榨干…… 1. 我叫王不留，地道東北人植酥。一個月前我還...
沈念sama閱讀 45,376評論 2贊 352
代替公主和親
正文我出身青樓镀岛，卻偏偏與公主長得像，于是被迫代替她去往敵國和親友驮。傳聞我的和親對象是個殘疾皇子漂羊，可洞房花燭夜當(dāng)晚...
茶點故事閱讀 42,677評論 2贊 344