簡(jiǎn)單的例子
library("BSgenome.Hsapiens.UCSC.hg19")
str = "ACGCCTCGAGCGTAGCGTAGCGT"
matchPattern("CG", str)
cat("\n")
POS = start(matchPattern("CG", str))
chr = rep("chr1", length(POS))
start = POS
end = POS + 1
strand = rep("+", length(POS))
grList = list()
GRx <- GRanges(seqnames = Rle(chr),
ranges = IRanges(start,end),
strand = Rle(strand) )
grList[[1]] <- GRx
grList[[1]]
cgGR = unlist(GRangesList(grList))
save(cgGR, file = "hg19_CpG_sites.RData")
運(yùn)行結(jié)果如下:
Views on a 23-letter BString subject
subject: ACGCCTCGAGCGTAGCGTAGCGT
views:
start end width
[1] 2 3 2 [CG]
[2] 7 8 2 [CG]
[3] 11 12 2 [CG]
[4] 16 17 2 [CG]
[5] 21 22 2 [CG]
GRanges object with 5 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] 5 2-3 +
[2] 5 7-8 +
[3] 5 11-12 +
[4] 5 16-17 +
[5] 5 21-22 +
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
如此根據(jù)染色體序號(hào)循環(huán)即可得到保存整個(gè)基因組的CpG位點(diǎn)的RData臂外。
Biostrings
BSgenome.Hsapiens.UCSC.hg19
包是基于IRanges槽畔,GenomeInfoDb,GenomicRanges澳淑, Biostrings,XVector這些包所所構(gòu)建的。
Biostrings
是BSgenome.Hsapiens.UCSC.hg19
的一個(gè)基礎(chǔ)包倍啥,其中的matchPattern()
函數(shù)用于根據(jù)設(shè)定的pattern
尋找目標(biāo)string
的起始和結(jié)束位點(diǎn)注服。結(jié)合BSgenome.Hsapiens.UCSC.hg19
包的基因組數(shù)據(jù)可用于建立一些數(shù)據(jù)集韭邓,如CpG位點(diǎn)data;或保存其他的定序列的位點(diǎn)信息祠汇。