1头谜、ATCG含量計算
- 來源我在學(xué)習(xí)Modern Statistics for Modern Biology 此文中的《Modern Statistics for Modern Biology》Chapter 二: 統(tǒng)計建模(2.4-2.5)
讀取fasta序列文件
setwd("e://compute language/NGS/Modern Statistics for Modern Biology/data/")
> library(Biostrings)
> staph = readDNAStringSet("staphsequence.ffn.txt", "fasta") ## 讀取fasta文件
fastat文件格式如下
>a
ATGTCGGAAAAAGAAATTTGGGAAAAAGTGCTTGAAATTGCTCAAGAAAAATTATCAGCTGTAAGTTACT
CAACTTTCCTAAAAGATACTGAGCTTTACACGATCAAAGATGGTGAAGCTATCGTATTATCGAGTATTCC
TTTTAATGCAAATTGGTTAAATCAACAATATGCTGAAATTATCCAAGCAATCTTATTTGATGTTGTAGGC
TATGAAGTAAAACCTCACTTTATTACTACTGAAGAATTAGCAAATTATAGTAATAATGAAACTGCTACTC
CAAAAGAAGCAACAAAACCTTCTACTGAAACAACTGAGGATAATCATGTGCTTGGTAGAGAGCAATTCAA
單個基因計算
> staph[1]
A DNAStringSet instance of length 1
width seq names
[1] 1362 ATGTCGGAAAAAGAAATTTGGGAAAAAGTGCTTGAA...TAGAGAATCTTGAAAAAGAAATAAGAAATGTATAA lcl|NC_002952.2_c...
> letterFrequency(staph[[1]], letters = "ACTG", OR = 0)
A C T G
522 219 392 229
多個基因計算
- 這里活用了
apply
函數(shù),更多參考搜索或者參考這個 掌握R語言中的apply函數(shù)族
> letterFrq = vapply(staph, letterFrequency, FUN.VALUE = numeric(4), letters = "ACGT", OR = 0)
> colnames(letterFrq) = paste0("gene", seq(along = staph))
> computerProportions = function(x){x/sum(x)}
> prop10 = apply(tab10, 2, computerProportions) ## 2表示列
> round(prop10, digits = 2)
gene1 gene2 gene3 gene4 gene5 gene6 gene7 gene8 gene9 gene10
A 0.38 0.36 0.35 0.37 0.35 0.33 0.33 0.34 0.38 0.27
C 0.16 0.16 0.13 0.15 0.15 0.15 0.16 0.16 0.14 0.16
G 0.17 0.17 0.23 0.19 0.22 0.22 0.20 0.21 0.20 0.20
T 0.29 0.31 0.30 0.29 0.27 0.30 0.30 0.29 0.28 0.36
> p0 = rowMeans(prop10)
> p0
A C G T
0.3470531 0.1518313 0.2011442 0.2999714
2巧娱、motif繪制
- 來源我在學(xué)習(xí)Modern Statistics for Modern Biology 此文中的《Modern Statistics for Modern Biology》Chapter 二 統(tǒng)計建模(2.6 - 2.8)
> setwd("e://compute language/NGS/Modern Statistics for Modern Biology/data/")
> library("seqLogo")
載入需要的程輯包:grid
Warning message:
程輯包‘seqLogo’是用R版本3.5.1 來建造的
> load("kozak.RData")
> kozak
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
A 0.33 0.25 0.4 0.15 0.20 1 0 0 0.05
C 0.12 0.25 0.1 0.40 0.40 0 0 0 0.05
G 0.33 0.25 0.4 0.20 0.25 0 0 1 0.90
T 0.22 0.25 0.1 0.25 0.15 0 1 0 0.00
> pwm = makePWM(kozak)
> seqLogo(pwm, ic.scale = FALSE)