系統(tǒng)發(fā)育比較分析—R

系統(tǒng)發(fā)育樹是研究物種進(jìn)化歷史必不可少的信息衰伯，我們可以利用它得到一些重要?dú)v史線索绳锅，如：

生物多樣性（物種形成或滅絕）窟她；
物種性狀進(jìn)化與多樣化（Character evolution and diversification）码泞；
生物地理學(xué)（研究動植物的地理分布）；
測試復(fù)雜進(jìn)化模型刹泄；

Introduction to phylogenies in R

首先，安裝系統(tǒng)發(fā)育分析所需的軟件包

install CRAN Task View for phylogenetics to run, remove the comment character '#'
# install.packages("ctv")
# library("ctv")
# install.views("Phylogenetics")
# update.views("Phylogenetics")
# install.packages("ape")
# install.packages("geiger")
# install.packages("phytools")
# citation("geiger")

如何使用R進(jìn)行系統(tǒng)發(fā)育樹編輯

tree = "(((((((cow, pig),whale),(bat,(lemur,human))),(robin,iguana)),coelacanth),gold_fish),shark);"  # 從字符串中讀取樹文件
vert.tree <- read.tree(text=tree)
plot(vert.tree) # 普通可視化
plot(vert.tree,type="cladogram") 
plot(unroot(vert.tree),type="unrooted")
plot(vert.tree,type="fan")

普通可視化

cladogram

無根樹

環(huán)形圖

其實(shí)怎爵，此處的樹文件就是一個字符串列表（列表還可以是數(shù)字型）特石。

> str(vert.tree)
List of 3
 $ edge     : int [1:20, 1:2] 12 13 14 15 16 17 18 18 17 16 ...
 $ Nnode    : int 10
 $ tip.label: chr [1:11] "cow" "pig" "whale" "bat" ...
 - attr(*, "class")= chr "phylo"
 - attr(*, "order")= chr "cladewise"

接下來，主要是看一下這些對象是如何存儲在變量中的：

tree<-read.tree(text="(((A,B),(C,D)),E);")
plot(tree,type="cladogram",edge.width=2)
> tree$edge
     [,1] [,2]
[1,]    6    7
[2,]    7    8
[3,]    8    1
[4,]    8    2
[5,]    7    9
[6,]    9    3
[7,]    9    4
[8,]    6    5

> tree$tip.label
[1] "A" "B" "C" "D" "E"

> tree$Nnode
[1] 4

plot(tree,edge.width=2,label.offset=0.1,type="cladogram")
nodelabels()
tiplabels()

image.png

我們可以看到鳖链，樹中所有分類單元之間的關(guān)系信息都包含在每條邊的起始節(jié)點(diǎn)和結(jié)束節(jié)點(diǎn)中姆蘸，共享共同起始節(jié)點(diǎn)編號的邊是直接共同祖先的后代。

我們還可以測試樹文件芙委，編輯以及去除一些末端枝：

## (I'm going to first set the seed for repeatability)
set.seed(1)
## simulate a birth-death tree using phytools
tree<-pbtree(b=1,d=0.2,n=40)
## stopping criterion is 40 extant species, in this case
plotTree(tree)
nodelabels()

## ok, now extract the clade descended from node #62
tt62<-extract.clade(tree,62)
plotTree(tt62)

## now drop 10 tips from the tree (I'm going to pick them at random)
dtips<-sample(tree$tip.label,10)
dt<-drop.tip(tree,dtips)
plotTree(dt)

## we could also, say, drop all tips that go extinct before the present
## this is a fun way, but not the only way to do this:
plotTree(et<-drop.tip(tree,getExtinct(tree)),cex=0.7)

## rotating nodes
plotTree(tree,node.numbers=T)
## first, rotate about node #50
rt.50<-rotate(tree,50)
plotTree(rt.50)
## now rotate all nodes
rt.all<-rotateNodes(tree,"all")
plotTree(rt.all)
## ok, now let's re-root the tree at node #67
rr.67<-root(tree,node=67)
plotTree(rr.67)
## this creates a trifurcation at the root
## we could instead re-root at along an edge
rr.62<-reroot(tree,62,position=0.5*tree$edge.length[which(tree$edge[,2]==62)])
plotTree(rr.62)

布朗模型（Brownian Motion）

布朗模型：
（1）. 連續(xù)型性狀進(jìn)化的一個模型逞敷；
（2）. 性狀隨著時間不斷變化；
（3）. 經(jīng)過一段時間后灌侣，預(yù)期的狀態(tài)將服從正態(tài)分布推捐；

1. 什么是布朗運(yùn)動？

（1）. 有時稱為維納過程侧啼；
（2）. 連續(xù)時間隨機(jī)過程牛柒；
（3）. 描述連續(xù)型性狀的“隨機(jī)演化”；

布朗運(yùn)動

2. 布朗運(yùn)動在何時才會發(fā)生痊乾？

（1）. BM可以用來描述由大量獨(dú)立的弱力組合而產(chǎn)生的運(yùn)動皮壁；
（2）. 加入許多小的自變量后，無論原始分布如何(中心極限定理)哪审，都會產(chǎn)生正態(tài)分布蛾魄；

演化過程近似布朗運(yùn)動
Ⅰ. 遺傳漂變
Ⅱ. 隨機(jī)改變
Ⅲ. 相對于考慮的時間間隔弱的自然選擇
Ⅳ. 隨時間隨機(jī)變化的自然選擇

3. 模擬布朗運(yùn)動

模擬布朗運(yùn)動需要從正態(tài)分布中提取值
分布的方差取決于σ2和t
沿著相鄰分支的值從根添加到末端

系統(tǒng)發(fā)育獨(dú)立差（Phylogenetic Independent Contrasts）

PIC主要用來分析來自系統(tǒng)發(fā)育樹的數(shù)據(jù)
檢測性狀之間進(jìn)化相關(guān)性
我們可以將PIC看作是一種創(chuàng)建獨(dú)立數(shù)據(jù)點(diǎn)的統(tǒng)計轉(zhuǎn)換

系統(tǒng)發(fā)育獨(dú)立差（Phylogenetic Independent Contrast, PIC）是去除性狀分析中物種系統(tǒng)發(fā)育關(guān)系的一種方法，是美國進(jìn)化生物學(xué)家 J. Felsenstein于1985年提出的协饲。參考鏈接

當(dāng)時畏腕，有動物學(xué)家檢驗了若干哺乳動物腦容量和體重的相關(guān)性。然而茉稠，所用統(tǒng)計方法都假設(shè)各個種的性狀是相互獨(dú)立的描馅，采集自同一個正態(tài)分布的總體。由于不同種之間存在系統(tǒng)發(fā)育上的聯(lián)系而线，因此不能直接套用常規(guī)統(tǒng)計模型铭污，所以所得結(jié)果也就不可信了。在Felsenstein之前膀篮，一些學(xué)者已經(jīng)意識到這個問題嘹狞，但是都沒能給出理想的解決方案。

Felsenstain認(rèn)真思考了這個問題誓竿，并且認(rèn)為：研究多個不同性狀的相關(guān)性時磅网，必須考慮系統(tǒng)發(fā)育關(guān)系。他提出：假設(shè)物種性狀的進(jìn)化服從布朗運(yùn)動筷屡，那么用系統(tǒng)發(fā)育關(guān)系上最近的分類單元性狀的差值涧偷，再經(jīng)過枝長加權(quán)簸喂，就可以得到一組新的數(shù)據(jù)。這組新的數(shù)據(jù)去掉了進(jìn)化信息的干擾燎潮，因此可以用常規(guī)統(tǒng)計方法檢驗喻鳄。上述方法新獲得的數(shù)據(jù)就稱為系統(tǒng)發(fā)育獨(dú)立差（The Phylogenetic Independent Contrast, PIC）。

舉例來說确封，如果要研究哺乳動物腦容量和體重的關(guān)系除呵，就要先獲得這些種之間的進(jìn)化關(guān)系，也就是經(jīng)過分子鐘校訂的進(jìn)化樹爪喘。然后再計算腦容量的PIC和體重的PIC颜曾，再用腦容量的PIC和體重的PIC建立線性模型，進(jìn)行相關(guān)性檢驗腥放。

現(xiàn)在讓我們來模擬一下布朗運(yùn)動在樹枝上的移動泛啸。堅持離散時間，我們首先需要一個離散時間的系統(tǒng)演化秃症，我們可以用pbtree在phytools中進(jìn)行模擬。

library(phytools)
t<-100 # total time
n<-30 # total taxa
b<-(log(n)-log(2))/t
tree<-pbtree(b=b,n=n,t=t,type="discrete")
## simulating with both taxa-stop (n) and time-stop (t) is
## performed via rejection sampling & may be slow
## 
##   28 trees rejected before finding a tree
plotTree(tree)

現(xiàn)在吕粹，為了在樹的所有分支上進(jìn)行模擬种柑，我們只需要分別在每個分支上模擬，然后根據(jù)每個分支的父分支的結(jié)束狀態(tài)“移動”每個子分支匹耕。我們可以這樣做聚请，記住，因為布朗演化的結(jié)果在每個時間步驟是獨(dú)立于所有其他的時間步驟稳其。

## simulate evolution along each edge
X<-lapply(tree$edge.length,function(x) c(0,cumsum(rnorm(n=x,sd=sqrt(sig2)))))
## reorder the edges of the tree for pre-order traversal
cw<-reorder(tree)
## now simulate on the tree
ll<-tree$edge.length+1
for(i in 1:nrow(cw$edge)){
    pp<-which(cw$edge[,2]==cw$edge[i,1])
    if(length(pp)>0) X[[i]]<-X[[i]]+X[[pp]][ll[pp]]
    else X[[i]]<-X[[i]]+X[[1]][1]
}
## get the starting and ending points of each edge for plotting
H<-nodeHeights(tree)
## plot the simulation
plot(H[1,1],X[[1]][1],ylim=range(X),xlim=range(H),xlab="time",ylab="phenotype")
for(i in 1:length(X)) lines(H[i,1]:H[i,2],X[[i]])
## add tip labels if desired
yy<-sapply(1:length(tree$tip.label),function(x,y) which(x==y),y=tree$edge[,2])
yy<-sapply(yy,function(x,y) y[[x]][length(y[[x]])],y=X)
text(x=max(H),y=yy,tree$tip.label)

## simulate Brownian evolution on a tree with fastBM
x<-fastBM(tree,sig2=sig2,internal=TRUE)
## visualize Brownian evolution on a tree
phenogram(tree,x,spread.labels=TRUE,spread.cost=c(1,0))
x<-sim.rates(tree,setNames(c(1,20),1:2),internal=TRUE)
phenogram(tree,x,colors=setNames(c("blue","red"),1:2),spread.labels=TRUE,spread.cost=c(1,0))

set.seed(100)
## transition matrix for the discrete trait simulation
Q<-matrix(c(-1,1,1,-1),2,2)
## simulated tree
tree<-pbtree(n=30,scale=1)
## simulate discrete character history
tree<-sim.history(tree,Q,anc="1")

## plot discrete character history on the tree
plotSimmap(tree,setNames(c("blue","red"),1:2),pts=F)

## simulate 5 taxon tree
tree<-pbtree(n=5,tip.label=LETTERS[1:5])
## 500 BM simulations
X<-fastBM(tree,nsim=500)
## plot tree
plot(rotateNodes(tree,"all"),edge.width=2,no.margin=TRUE)

## plot distribution across tips from 500 simulations
library(car)
scatterplotMatrix(t(X))

t <- 0:100  # time
sig2 <- 0.01
nsim <- 1000
## we'll simulate the steps from a uniform distribution
## with limits set to have the same variance (0.01) as before
X<-matrix(runif(n=nsim*(length(t)-1),min=-sqrt(3*sig2),max=sqrt(3*sig2)),nsim,length(t)-1)
X <- cbind(rep(0, nsim), t(apply(X, 1, cumsum)))
plot(t, X[1, ], xlab = "time", ylab = "phenotype", ylim = c(-2, 2), type = "l")
apply(X[2:nsim, ], 1, function(x, t) lines(t, x), t = t)

var(X[, length(t)])
hist(X[,length(t)])
plot(density(X[,length(t)]))

nsim=100
X <- matrix(rnorm(mean=0.02,n = nsim * (length(t) - 1), sd = sqrt(sig2/4)), nsim, length(t) - 1)
X <- cbind(rep(0, nsim), t(apply(X, 1, cumsum)))
plot(t, X[1, ], xlab = "time", ylab = "phenotype", ylim = c(-1, 3), type = "l")
apply(X[2:nsim, ], 1, function(x, t) lines(t, x), t = t)

在本教程中驶赏，我們將學(xué)習(xí)應(yīng)用Felsenstein的獨(dú)立對比方法來估計性狀之間的進(jìn)化相關(guān)性。Felsenstein發(fā)現(xiàn)一個以前已經(jīng)認(rèn)識到的問題既鞠，但這一問題沒有得到充分的認(rèn)識：即從統(tǒng)計分析的角度來看煤傍，物種數(shù)據(jù)不能被視為獨(dú)立的數(shù)據(jù)點(diǎn)。

為了學(xué)習(xí)對比方法嘱蛋，我們首先需要學(xué)習(xí)一些用r語言擬合線性模型的基礎(chǔ)知識蚯姆。

## linear regression model is y = b0 + b*x + e
b0<-5.3
b<-0.75
x<-rnorm(n=100,sd=1)
y<-b0+b*x+rnorm(n=100,sd=0.2)
plot(x,y)
fit<-lm(y~x)
abline(fit)

> fit
Call: lm(formula = y ~ x)
Coefficients:
(Intercept)            x  
     5.3123       0.7412  
> summary(fit)

## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.52591 -0.12877 -0.01271  0.12295  0.57436 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.33205    0.02129  250.39   <2e-16 ***
## x            0.73766    0.02103   35.08   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2129 on 98 degrees of freedom
## Multiple R-squared:  0.9263, Adjusted R-squared:  0.9255 
## F-statistic:  1231 on 1 and 98 DF,  p-value: < 2.2e-16

anova(fit)
## Analysis of Variance Table
## 
## Response: y
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## x          1 55.804  55.804  1230.9 < 2.2e-16 ***
## Residuals 98  4.443   0.045                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## coefficients of the fitted model
fit$coefficients
## (Intercept)           x 
##   5.3320548   0.7376578

## p-value of the fitted model
anova(fit)[["Pr(>F)"]][1]
## [1] 2.758908e-57

foo<-function(){
    x<-rnorm(n=100)
    y<-rnorm(n=100)
    fit<-lm(y~x)
    setNames(c(fit$coefficients[2],anova(fit)[["Pr(>F)"]][1]),c("beta","p"))
}
X<-t(replicate(1000,foo()))
hist(X[,1],xlab="fitted beta",main="fitted beta")
hist(X[,2],xlab="p-value",main="p-value") 

## expected value is the nominal type I error rate, 0.05
mean(X[,2]<=0.05)
## [1] 0.056

這個模擬非常簡單地表明，當(dāng)滿足標(biāo)準(zhǔn)統(tǒng)計假設(shè)時洒敏，我們的參數(shù)估計是無偏的龄恋，并且第一類誤差在標(biāo)稱水平或附近。

系統(tǒng)發(fā)育數(shù)據(jù)的困難在于凶伙，對于y的觀測不再是獨(dú)立的和相同的分布:

## load phytools
library(phytools)
## simulate a coalescent shaped tree
tree<-rcoal(n=100)
plotTree(tree,ftype="off")
## simulate uncorrelated Brownian evolution
x<-fastBM(tree)
y<-fastBM(tree)
par(mar=c(5.1,4.1,2.1,2.1))
plot(x,y)
fit<-lm(y~x)
abline(fit)

## this is a projection of the tree into morphospace
phylomorphospace(tree,cbind(x,y),label="off",node.size=c(0.5,0.7))
abline(fit)

從這個例子中我們可以看出郭毕，對于系統(tǒng)發(fā)育來說披粟，誘發(fā)I類型的錯誤并不難淹仑。這是因為緊密相關(guān)的分類群具有高度相似的表型卿操。換句話說排霉，它們并不是關(guān)于樹上x和y的演化過程的獨(dú)立數(shù)據(jù)點(diǎn).

PGLS系統(tǒng)發(fā)育廣義最小二乘法

library(ape)
library(geiger)
library(nlme)
library(phytools)

anoleData <- read.csv("anolisDataAppended.csv", row.names = 1)
anoleTree <- read.tree("anolis.phy")
plot(anoleTree)
name.check(anoleTree, anoleData)

# Extract columns
host <- anoleData[, "hostility"]
awe <- anoleData[, "awesomeness"]

# Give them names
names(host) <- names(awe) <- rownames(anoleData)

# Calculate PICs
hPic <- pic(host, anoleTree)
aPic <- pic(awe, anoleTree)

# Make a model
picModel <- lm(hPic ~ aPic - 1)

# Yes, significant
summary(picModel)

##
## Call:
## lm(formula = hPic ~ aPic - 1)
##
## Residuals:
##    Min     1Q Median     3Q    Max
## -2.105 -0.419  0.010  0.314  4.999
##
## Coefficients:
##      Estimate Std. Error t value Pr(>|t|)
## aPic  -0.9776     0.0452   -21.6   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.897 on 98 degrees of freedom
## Multiple R-squared:  0.827,  Adjusted R-squared:  0.825
## F-statistic:  469 on 1 and 98 DF,  p-value: <2e-16

# plot results
plot(hPic ~ aPic)
abline(a = 0, b = coef(picModel))

# 我們可以直接使用PGLS
pglsModel <- gls(hostility ~ awesomeness, correlation = corBrownian(phy = anoleTree),
    data = anoleData, method = "ML")
summary(pglsModel)

## Generalized least squares fit by maximum likelihood
##   Model: hostility ~ awesomeness
##   Data: anoleData
##   AIC   BIC logLik
##   191 198.8 -92.49
##
## Correlation Structure: corBrownian
##  Formula: ~1
##  Parameter estimate(s):
## numeric(0)
##
## Coefficients:
##               Value Std.Error t-value p-value
## (Intercept)  0.1506   0.26263   0.573  0.5678
## awesomeness -0.9776   0.04516 -21.648  0.0000
##
##  Correlation:
##             (Intr)
## awesomeness -0.042
##
## Standardized residuals:
##      Min       Q1      Med       Q3      Max
## -0.76020 -0.39057 -0.04942  0.19597  1.07374
##
## Residual standard error: 0.8877
## Degrees of freedom: 100 total; 98 residual

coef(pglsModel)
## (Intercept) awesomeness
##      0.1506     -0.9776
plot(host ~ awe)
abline(a = coef(pglsModel)[1], b = coef(pglsModel)[2])

PGLS還可以使用多個預(yù)測變量：

pglsModel2 <- gls(hostility ~ ecomorph, correlation = corBrownian(phy = anoleTree),
    data = anoleData, method = "ML")
anova(pglsModel2)
## Denom. DF: 93
##             numDF F-value p-value
## (Intercept)     1 0.01847  0.8922
## ecomorph        6 0.21838  0.9700
coef(pglsModel2)
## (Intercept)  ecomorphGB   ecomorphT  ecomorphTC  ecomorphTG  ecomorphTW
##      0.4844     -0.6316     -1.0585     -0.8558     -0.4086     -0.4039

pglsModel3 <- gls(hostility ~ ecomorph * awesomeness, correlation = corBrownian(phy = anoleTree),
    data = anoleData, method = "ML")
anova(pglsModel3)
## Denom. DF: 86
##                      numDF F-value p-value
## (Intercept)              1     0.1  0.7280
## ecomorph                 6     1.4  0.2090
## awesomeness              1   472.9  <.0001
## ecomorph:awesomeness     6     3.9  0.0017

##還可以固定不同模型
# Does not converge - and this is difficult to fix!
pglsModelLambda <- gls(hostility ~ awesomeness, correlation = corPagel(1, phy = anoleTree,
    fixed = FALSE), data = anoleData, method = "ML")
# this is a problem with scale. We can do a quick fix by making the branch
# lengths longer. This will not affect the analysis other than rescaling a
# nuisance parameter
tempTree <- anoleTree
tempTree$edge.length <- tempTree$edge.length * 100
pglsModelLambda <- gls(hostility ~ awesomeness, correlation = corPagel(1, phy = tempTree,
    fixed = FALSE), data = anoleData, method = "ML")
summary(pglsModelLambda)
## Generalized least squares fit by maximum likelihood
##   Model: hostility ~ awesomeness
##   Data: anoleData
##     AIC   BIC logLik
##   72.56 82.98 -32.28
##
## Correlation Structure: corPagel
##  Formula: ~1
##  Parameter estimate(s):
##  lambda
## -0.1586
##
## Coefficients:
##               Value Std.Error t-value p-value
## (Intercept)  0.0612   0.01582   3.872   2e-04
## awesomeness -0.8777   0.03104 -28.273   0e+00
##
##  Correlation:
##             (Intr)
## awesomeness -1
##
## Standardized residuals:
##       Min        Q1       Med        Q3       Max
## -1.789463 -0.714775  0.003095  0.785093  2.232151
##
## Residual standard error: 0.371
## Degrees of freedom: 100 total; 98 residual

pglsModelOU <- gls(hostility ~ awesomeness, correlation = corMartins(1, phy = tempTree),
    data = anoleData, method = "ML")
summary(pglsModelOU)
## Generalized least squares fit by maximum likelihood
##   Model: hostility ~ awesomeness
##   Data: anoleData
##     AIC   BIC logLik
##   96.63 107.1 -44.32
##
## Correlation Structure: corMartins
##  Formula: ~1
##  Parameter estimate(s):
## alpha
## 4.442
##
## Coefficients:
##               Value Std.Error t-value p-value
## (Intercept)  0.1084   0.03953   2.743  0.0072
## awesomeness -0.8812   0.03658 -24.091  0.0000
##
##  Correlation:
##             (Intr)
## awesomeness -0.269
##
## Standardized residuals:
##     Min      Q1     Med      Q3     Max
## -1.8665 -0.8133 -0.1104  0.6475  2.0919
##
## Residual standard error: 0.3769
## Degrees of freedom: 100 total; 98 residual

系統(tǒng)發(fā)生信號

Blomberg’s K statistic
- K >1系統(tǒng)發(fā)生信號很強(qiáng)
- K <1系統(tǒng)發(fā)生信號很弱
Pagel’s lambda
- lambda =1 進(jìn)化方向按照布朗模型
- lambda =0 進(jìn)化方向沒有相關(guān)性

library(geiger)
library(picante)
library(phytools)

anoleData <- read.csv("anolisDataAppended.csv", row.names = 1)
anoleTree <- read.tree("anolis.phy")
plot(anoleTree)
name.check(anoleTree, anoleData)

用體型大小來做兩個主要的系統(tǒng)發(fā)育信號測試。第一個檢驗是Blomberg‘s K煮落，它將PIC的方差與布朗運(yùn)動模型下的方差進(jìn)行比較敞峭。K=1表示親屬在BM下的相似程度，K<1表示在BM下存在較少的“系統(tǒng)發(fā)育信號”蝉仇，而K>1表示存在更多的“系統(tǒng)發(fā)育信號”旋讹。從系統(tǒng)信號返回的顯著p值告訴您存在顯著的系統(tǒng)發(fā)育信號。也就是說轿衔，近親比隨機(jī)配對的物種更相似沉迹。
anoleSize <- anoleData[, 1]
names(anoleSize) <- rownames(anoleData)
phylosignal(anoleSize, anoleTree)
##       K PIC.variance.obs PIC.variance.rnd.mean PIC.variance.P
## 1 1.554           0.1389                 0.773          0.001
##   PIC.variance.Z
## 1         -3.913
phylosig(anoleTree, anoleSize, method = "K", test = T)
## $K
## [1] 1.554
##
## $P
## [1] 0.001

另一種檢測系統(tǒng)發(fā)育信號的方法是Pagel的lambda。Lambda是一種相對于內(nèi)部分支延伸尖端分支的樹變換害驹，使樹越來越像一個完整的多段切除法鞭呕。如果我們估計的λ=0，則推斷這些性狀沒有系統(tǒng)發(fā)育信號宛官。λ=1對應(yīng)于布朗運(yùn)動模型葫松，0<lambda<1介于兩者之間。

# First let's look at what lambda does
anoleTreeLambda0 <- rescale(anoleTree, model = "lambda", 0)
anoleTreeLambda5 <- rescale(anoleTree, model = "lambda", 0.5)

par(mfcol = c(1, 3))
plot(anoleTree)
plot(anoleTreeLambda5)
plot(anoleTreeLambda0)


phylosig(anoleTree, anoleSize, method = "lambda", test = T)
## $lambda
## [1] 1.017
##
## $logL
## [1] -3.81
##
## $logL0
## [1] -60.02
##
## $P
## [1] 2.893e-26

lambdaModel <- fitContinuous(anoleTree, anoleSize, model = "lambda")
brownianModel <- fitContinuous(anoleTree, anoleSize)
nosigModel <- fitContinuous(anoleTreeLambda0, anoleSize)

lambdaModel$opt$aicc
## [1] 15.65
brownianModel$opt$aicc
## [1] 13.52
nosigModel$opt$aicc
## [1] 124.2
# Conclusion: Brownian model is best, no signal model is terrible

brownianModel <- fitContinuous(anoleTree, anoleSize)
OUModel <- fitContinuous(anoleTree, anoleSize, model = "OU")
EBModel <- fitContinuous(anoleTree, anoleSize, model = "EB")

# inspect results
brownianModel
## GEIGER-fitted comparative model of continuous data
##  fitted 'BM' model parameters:
##  sigsq = 0.136160
##  z0 = 4.065918
##
##  model summary:
##  log-likelihood = -4.700404
##  AIC = 13.400807
##  AICc = 13.524519
##  free parameters = 2
##
## Convergence diagnostics:
##  optimization iterations = 100
##  failed iterations = 0
##  frequency of best fit = 1.00
##
##  object summary:
##  'lik' -- likelihood function
##  'bnd' -- bounds for likelihood search
##  'res' -- optimization iteration summary
##  'opt' -- maximum likelihood parameter estimates
OUModel
## GEIGER-fitted comparative model of continuous data
##  fitted 'OU' model parameters:
##  alpha = 0.000000
##  sigsq = 0.136160
##  z0 = 4.065918
##
##  model summary:
##  log-likelihood = -4.700404
##  AIC = 15.400807
##  AICc = 15.650807
##  free parameters = 3
##
## Convergence diagnostics:
##  optimization iterations = 100
##  failed iterations = 0
##  frequency of best fit = 0.76
##
##  object summary:
##  'lik' -- likelihood function
##  'bnd' -- bounds for likelihood search
##  'res' -- optimization iteration summary
##  'opt' -- maximum likelihood parameter estimates
EBModel
## GEIGER-fitted comparative model of continuous data
##  fitted 'EB' model parameters:
##  a = -0.736271
##  sigsq = 0.233528
##  z0 = 4.066519
##
##  model summary:
##  log-likelihood = -4.285970
##  AIC = 14.571939
##  AICc = 14.821939
##  free parameters = 3
##
## Convergence diagnostics:
##  optimization iterations = 100
##  failed iterations = 0
##  frequency of best fit = 0.41
##
##  object summary:
##  'lik' -- likelihood function
##  'bnd' -- bounds for likelihood search
##  'res' -- optimization iteration summary
##  'opt' -- maximum likelihood parameter estimates

# calculate AIC weights
bmAICC <- brownianModel$opt$aicc
ouAICC <- OUModel$opt$aicc
ebAICC <- EBModel$opt$aicc

aicc <- c(bmAICC, ouAICC, ebAICC)
aiccD <- aicc - min(aicc)
aw <- exp(-0.5 * aiccD)
aiccW <- aw/sum(aw)
aiccW
## [1] 0.5353 0.1849 0.2798
It is important to realize that measurement error can bias your inferences with fitting these models towards OU. Fortunately, we can easily account for that in fitContinuous.


# We measured 20 anoles per species, and the standard deviation within each
# species was, on average, 0.05
seSize <- 0.05/sqrt(20)

# redo with measurement error
brownianModel <- fitContinuous(anoleTree, anoleSize, SE = seSize)
OUModel <- fitContinuous(anoleTree, anoleSize, model = "OU", SE = seSize)
EBModel <- fitContinuous(anoleTree, anoleSize, model = "EB", SE = seSize)


# calculate AIC weights
bmAICC <- brownianModel$opt$aicc
ouAICC <- OUModel$opt$aicc
ebAICC <- EBModel$opt$aicc

aicc <- c(bmAICC, ouAICC, ebAICC)
aiccD <- aicc - min(aicc)
aw <- exp(-0.5 * aiccD)
aiccW <- aw/sum(aw)
aiccW
## [1] 0.5346 0.1846 0.2808

重構(gòu)祖先狀態(tài)

在祖先特征重建中底洗，目標(biāo)是估算表型性狀的祖先狀態(tài)-通常在內(nèi)部節(jié)點(diǎn)
理想情況下腋么，還應(yīng)該獲得與祖先狀態(tài)估計有關(guān)的不確定性的測量結(jié)果

祖先特征估計的第一步是確定我們對分析感興趣的數(shù)據(jù)類型。例如亥揖，我們可以連續(xù)測量數(shù)據(jù)或離散型數(shù)據(jù)珊擂。
不僅需要考慮數(shù)據(jù)類型，還要考慮適合數(shù)據(jù)的模型费变。不管是連續(xù)測量數(shù)據(jù)或離散數(shù)據(jù)摧扇，大多數(shù)都是遵循布朗運(yùn)動模型（隨著時間上下波動）；或者是通過瞬間跳躍在狀態(tài)之間（Brownian motion is a continuous-time stochastic process）挚歧。

推斷祖先狀態(tài)一直是系統(tǒng)發(fā)育比較生物學(xué)中一個重要的目標(biāo)扛稽，本教程主要介紹如何在BM以及OU模型下重構(gòu)祖先性狀。第一個例子就是蜥蜴祖先體型大小的狀態(tài)重建昼激，這是一個連續(xù)型變量庇绽。

## load libraries
library(phytools)
## read tree from file
tree<-read.tree("anole.tre") ## or
tree<-read.tree("http://www.phytools.org/Ilhabela2015/data/anole.tre")
## plot tree
plotTree(tree,type="fan",ftype="i")
## setEnv=TRUE for this type is experimental. please be patient with bugs

## read data
svl<-read.csv("svl.csv",row.names=1) ## or
svl<-read.csv("http://www.phytools.org/Ilhabela2015/data/svl.csv",
    row.names=1)
## change this into a vector
svl<-as.matrix(svl)[,1]
svl

##            ahli         alayoni         alfaroi        aliniger
##        4.039125        3.815705        3.526655        4.036557
##        allisoni         allogus   altitudinalis         alumina
##        4.375390        4.040138        3.842994        3.588941
##       alutaceus     angusticeps     argenteolus     argillaceus
##        3.554891        3.788595        3.971307        3.757869
##         armouri   bahorucoensis        baleatus        baracoae
##        4.121684        3.827445        5.053056        5.042780
##       barahonae        barbatus        barbouri        bartschi
##        5.076958        5.003946        3.663932        4.280547
##         bremeri        breslini    brevirostris        caudalis
##        4.113371        4.051111        3.874155        3.911743
##       centralis  chamaeleonides    chlorocyanus     christophei
##        3.697941        5.042349        4.275448        3.884652
##       clivicola     coelestinus        confusus           cooki
##        3.758726        4.297965        3.938442        4.091535
##    cristatellus    cupeyalensis         cuvieri    cyanopleurus
##        4.189820        3.462014        4.875012        3.630161
##         cybotes     darlingtoni       distichus dolichocephalus
##        4.210982        4.302036        3.928796        3.908550
##       equestris      etheridgei   eugenegrahami       evermanni
##        5.113994        3.657991        4.128504        4.165605
##         fowleri         garmani         grahami           guafe
##        4.288780        4.769473        4.154274        3.877457
##       guamuhaya         guazuma       gundlachi       haetianus
##        5.036953        3.763884        4.188105        4.316542
##      hendersoni      homolechis           imias    inexpectatus
##        3.859835        4.032806        4.099687        3.537439
##       insolitus        isolepis           jubar           krugi
##        3.800471        3.657088        3.952605        3.886500
##      lineatopus   longitibialis        loysiana          lucius
##        4.128612        4.242103        3.701240        4.198915
##    luteogularis      macilentus        marcanoi          marron
##        5.101085        3.715765        4.079485        3.831810
##         mestrei       monticola          noblei        occultus
##        3.987147        3.770613        5.083473        3.663049
##         olssoni        opalinus      ophiolepis        oporinus
##        3.793899        3.838376        3.637962        3.845670
##        paternus        placidus       poncensis        porcatus
##        3.802961        3.773967        3.820378        4.258991
##          porcus      pulchellus         pumilis quadriocellifer
##        5.038034        3.799022        3.466860        3.901619
##      reconditus        ricordii     rubribarbus          sagrei
##        4.482607        5.013963        4.078469        4.067162
##    semilineatus        sheplani         shrevei      singularis
##        3.696631        3.682924        3.983003        4.057997
##      smallwoodi         strahmi       stratulus     valencienni
##        5.035096        4.274271        3.869881        4.321524
##       vanidicus    vermiculatus        websteri       whitemani
##        3.626206        4.802849        3.916546        4.097479

現(xiàn)在我們可以估計祖先的狀態(tài)，還將計算方差&每個節(jié)點(diǎn)的95%置信區(qū)間：

fit<-fastAnc(tree,svl,vars=TRUE,CI=TRUE)
fit

## $ace
##      101      102      103      104      105      106      107      108
## 4.065918 4.045601 4.047993 4.066923 4.036743 4.049514 4.054528 4.045218
##      109      110      111      112      113      114      115      116
## 3.979493 3.952440 3.926814 3.964414 3.995835 3.948034 3.955203 3.977842
##      117      118      119      120      121      122      123      124
## 4.213995 4.240867 4.248450 4.257574 4.239061 4.004120 4.007024 4.015249
##      125      126      127      128      129      130      131      132
## 4.006720 3.992617 3.925848 3.995759 4.049619 3.930793 3.908343 3.901518
##      133      134      135      136      137      138      139      140
## 3.885086 4.040356 4.060970 3.987789 3.951878 3.814014 3.707733 3.926147
##      141      142      143      144      145      146      147      148
## 3.988745 4.167983 4.157021 4.166146 4.146362 4.146132 4.159052 4.113267
##      149      150      151      152      153      154      155      156
## 4.133069 4.222223 4.461376 4.488496 4.437250 4.512071 4.953146 5.033253
##      157      158      159      160      161      162      163      164
## 4.962377 5.009025 5.018389 3.974900 3.880482 3.859268 3.859265 3.871895
##      165      166      167      168      169      170      171      172
## 3.899914 3.807943 3.826619 4.151598 3.760446 3.654324 3.676713 3.825559
##      173      174      175      176      177      178      179      180
## 3.747726 3.813974 3.803077 3.702349 3.566476 3.678236 3.675620 3.662452
##      181      182      183      184      185      186      187      188
## 3.563181 3.645426 4.017460 4.113689 4.229393 4.381609 4.243153 5.041281
##      189      190      191      192      193      194      195      196
## 5.051563 5.051719 5.057591 4.183653 4.151505 4.113279 3.969795 3.905122
##      197      198      199
## 4.191810 4.161419 4.092141
##
## $var
##         101         102         103         104         105         106
## 0.011775189 0.008409641 0.009452076 0.011870028 0.014459672 0.018112713
##         107         108         109         110         111         112
## 0.011685860 0.007268889 0.014487712 0.012814461 0.010852276 0.010048935
##         113         114         115         116         117         118
## 0.006240382 0.009816255 0.008377720 0.006250414 0.015406961 0.015306088
##         119         120         121         122         123         124
## 0.008914737 0.008485896 0.016998370 0.014850114 0.012880197 0.011964472
##         125         126         127         128         129         130
## 0.010525923 0.010528040 0.013248168 0.012789504 0.013568841 0.013971730
##         131         132         133         134         135         136
## 0.010048937 0.009535820 0.008387455 0.008359158 0.010126605 0.012608132
##         137         138         139         140         141         142
## 0.013951936 0.018502968 0.014105896 0.018182598 0.017745869 0.012861889
##         143         144         145         146         147         148
## 0.014828558 0.011891297 0.008732027 0.008604618 0.010009986 0.007660375
##         149         150         151         152         153         154
## 0.006817879 0.012378867 0.015540878 0.014154484 0.012761730 0.011952420
##         155         156         157         158         159         160
## 0.005064274 0.002463170 0.006924118 0.003953467 0.003509809 0.011102798
##         161         162         163         164         165         166
## 0.011378573 0.011045003 0.011682628 0.011560333 0.012242267 0.011462597
##         167         168         169         170         171         172
## 0.008888509 0.014206315 0.014405121 0.006290501 0.005434707 0.014714164
##         173         174         175         176         177         178
## 0.010884130 0.014842361 0.011333759 0.012932194 0.007449222 0.011094079
##         179         180         181         182         183         184
## 0.011015666 0.012850744 0.005326252 0.012876076 0.021615405 0.014608411
##         185         186         187         188         189         190
## 0.013155037 0.021043877 0.012658226 0.004540141 0.003540634 0.002698421
##         191         192         193         194         195         196
## 0.001277775 0.011837709 0.012449062 0.013536624 0.015848988 0.008788712
##         197         198         199
## 0.015789566 0.013529210 0.009534335
##
## $CI95
##         [,1]     [,2]
## 101 3.853231 4.278604
## 102 3.865861 4.225341
## 103 3.857439 4.238548
## 104 3.853382 4.280465
## 105 3.801056 4.272430
## 106 3.785731 4.313298
## 107 3.842649 4.266406
## 108 3.878112 4.212323
## 109 3.743577 4.215408
## 110 3.730566 4.174314
## 111 3.722632 4.130995
## 112 3.767935 4.160893
## 113 3.841003 4.150668
## 114 3.753844 4.142225
## 115 3.775804 4.134601
## 116 3.822885 4.132799
## 117 3.970710 4.457279
## 118 3.998380 4.483354
## 119 4.063391 4.433509
## 120 4.077021 4.438127
## 121 3.983520 4.494601
## 122 3.765272 4.242968
## 123 3.784582 4.229467
## 124 3.800860 4.229639
## 125 3.805632 4.207808
## 126 3.791509 4.193725
## 127 3.700250 4.151445
## 128 3.774102 4.217417
## 129 3.821307 4.277930
## 130 3.699117 4.162469
## 131 3.711864 4.104822
## 132 3.710121 4.092915
## 133 3.705584 4.064589
## 134 3.861156 4.219555
## 135 3.863733 4.258207
## 136 3.767708 4.207869
## 137 3.720366 4.183390
## 138 3.547403 4.080624
## 139 3.474947 3.940519
## 140 3.661855 4.190439
## 141 3.727646 4.249843
## 142 3.945699 4.390267
## 143 3.918347 4.395695
## 144 3.952414 4.379879
## 145 3.963210 4.329515
## 146 3.964320 4.327943
## 147 3.962955 4.355150
## 148 3.941721 4.284813
## 149 3.971230 4.294907
## 150 4.004152 4.440293
## 151 4.217036 4.705715
## 152 4.255309 4.721682
## 153 4.215833 4.658667
## 154 4.297789 4.726352
## 155 4.813665 5.092627
## 156 4.935977 5.130528
## 157 4.799283 5.125471
## 158 4.885787 5.132263
## 159 4.902272 5.134507
## 160 3.768375 4.181425
## 161 3.671408 4.089556
## 162 3.653282 4.065255
## 163 3.647416 4.071113
## 164 3.661158 4.082632
## 165 3.683050 4.116777
## 166 3.598098 4.017787
## 167 3.641833 4.011406
## 168 3.917985 4.385211
## 169 3.525204 3.995688
## 170 3.498871 3.809777
## 171 3.532221 3.821205
## 172 3.587807 4.063310
## 173 3.543245 3.952207
## 174 3.575189 4.052760
## 175 3.594415 4.011738
## 176 3.479458 3.925240
## 177 3.397311 3.735642
## 178 3.471792 3.884680
## 179 3.469907 3.881332
## 180 3.440264 3.884639
## 181 3.420138 3.706225
## 182 3.423020 3.867833
## 183 3.729297 4.305622
## 184 3.876793 4.350585
## 185 4.004590 4.454196
## 186 4.097282 4.665937
## 187 4.022636 4.463670
## 188 4.909215 5.173347
## 189 4.934937 5.168189
## 190 4.949904 5.153533
## 191 4.987529 5.127654
## 192 3.970403 4.396903
## 193 3.932817 4.370192
## 194 3.885239 4.341319
## 195 3.723046 4.216545
## 196 3.721376 4.088869
## 197 3.945523 4.438096
## 198 3.933441 4.389397
## 199 3.900759 4.283523

## as we discussed in class, 95% CIs can be broad
fit$CI[1,]
## [1] 3.853231 4.278604

range(svl)
## [1] 3.462014 5.113994

## projection of the reconstruction onto the edges of the tree
obj<-contMap(tree,svl,plot=FALSE)
plot(obj,type="fan",legend=0.7*max(nodeHeights(tree)))

## projection of the tree into phenotype space
phenogram(tree,svl,fsize=0.6,spread.costs=c(1,0))
## Optimizing the positions of the tip labels...

然后橙困，我們可以先根據(jù)布朗模型探討連續(xù)特征的祖先狀態(tài)重構(gòu)的一些性質(zhì)：

## simulate a tree & some data
tree<-pbtree(n=26,scale=1,tip.label=LETTERS[26:1])
## simulate with ancestral states
x<-fastBM(tree,internal=TRUE)
## ancestral states
a<-x[as.character(1:tree$Nnode+Ntip(tree))]
## tip data
x<-x[tree$tip.label]

fit<-fastAnc(tree,x,CI=TRUE)
fit

## $ace
##          27          28          29          30          31          32
## -0.01383743 -0.43466511 -0.78496842 -0.92652372 -0.97992735 -1.06606765
##          33          34          35          36          37          38
## -1.19242145 -0.77435425 -1.33427168 -0.77148395  0.37519774  0.35486511
##          39          40          41          42          43          44
##  0.78011325  0.63488391  1.19427496 -0.20165383 -0.17792873  1.07419593
##          45          46          47          48          49          50
##  1.45924828  1.55175294  1.75289132  2.16445039  1.81192553  0.68319366
##          51
##  0.81241807
##
## $CI95
##          [,1]       [,2]
## 27 -0.8879801  0.8603052
## 28 -1.2581222  0.3887920
## 29 -1.4652988 -0.1046380
## 30 -1.5538351 -0.2992123
## 31 -1.4449545 -0.5149002
## 32 -1.4966267 -0.6355086
## 33 -1.4920621 -0.8927808
## 34 -1.1230122 -0.4256963
## 35 -1.4863921 -1.1821513
## 36 -1.2746445 -0.2683234
## 37 -0.3797317  1.1301272
## 38 -0.4031915  1.1129217
## 39  0.1541249  1.4061016
## 40  0.5162435  0.7535243
## 41  1.0781703  1.3103796
## 42 -0.7364614  0.3331537
## 43 -0.6011442  0.2452867
## 44  0.4157610  1.7326308
## 45  0.8942088  2.0242878
## 46  0.9951790  2.1083269
## 47  1.2382347  2.2675479
## 48  1.9117253  2.4171755
## 49  1.3260228  2.2978283
## 50  0.3463241  1.0200632
## 51  0.5937687  1.0310674

我們可以很容易地將這些估計與用于模擬數(shù)據(jù)的(正常unknown)生成值進(jìn)行比較：

plot(a,fit$ace,xlab="true states",ylab="estimated states")
lines(range(c(x,a)),range(c(x,a)),lty="dashed",col="red") ## 1:1 line

##
對祖先狀態(tài)估計最常見的批評之一是瞧掺，圍繞祖先狀態(tài)的不確定性可能很大；而且凡傅，這種不確定性常常被忽略辟狈。讓我們通過在祖先值上獲得95%的CI來解決這一問題，然后讓我們表明，95%的CI在大約95%的時間內(nèi)包含生成值：
##
## first, let's go back to our previous dataset
print(fit)
mean(((a>=fit$CI95[,1]) + (a<=fit$CI95[,2]))==2)
## [1] 0.92

## custom function that conducts a simulation, estimates ancestral
## states, & returns the fraction on 95% CI
foo<-function(){
    tree<-pbtree(n=100)
    x<-fastBM(tree,internal=TRUE)
    fit<-fastAnc(tree,x[1:length(tree$tip.label)],CI=TRUE)
    mean(((x[1:tree$Nnode+length(tree$tip.label)]>=fit$CI95[,1]) +
        (x[1:tree$Nnode+length(tree$tip.label)]<=fit$CI95[,2]))==2)
}
## conduct 100 simulations
pp<-replicate(100,foo())
mean(pp)

## [1] 0.9448485

這表明哼转，雖然CI可以很大明未，但當(dāng)模型正確時，它們將包括生成值(1-α)%的時間

也可以使用布朗運(yùn)動以外的其他角色進(jìn)化模型來估計祖先的狀態(tài)壹蔓。例如趟妥，我們可以根據(jù)我們已經(jīng)學(xué)到的OU模型來估計祖先的狀態(tài)。需要注意的一點(diǎn)是：當(dāng)選擇參數(shù)(α)非常大時佣蓉，祖先狀態(tài)都會趨向于相同的值(擬合的最優(yōu)披摄，θ):

## simulate tree & data under the OU model
alpha<-2
tree<-pbtree(n=100,scale=1)
x<-fastBM(tree,internal=TRUE,model="OU",alpha=alpha,sig2=0.2)
## Warning: alpha but not theta specified in OU model, setting theta to a.
a<-x[as.character(1:tree$Nnode+Ntip(tree))]
x<-x[tree$tip.label]
phenogram(tree,c(x,a),ftype="off",spread.labels=FALSE)
title("Evolution under OU")

## fit the model (this may be slow)
fit<-anc.ML(tree,x,model="OU")
fit

## $sig2
## [1] 0.05573428
##
## $alpha
## [1] 1e-08
##
## $ace
##           101           102           103           104           105
##  9.963462e-03  2.244757e-02  2.240573e-02 -5.931035e-02 -4.329498e-02
##           106           107           108           109           110
## -3.480964e-02 -4.081294e-02 -1.143122e-01 -2.263273e-02  4.155330e-02
##           111           112           113           114           115
##  4.264017e-02 -1.810736e-02 -9.407912e-02 -1.363360e-01 -6.611306e-02
##           116           117           118           119           120
## -1.392603e-02 -3.488464e-02 -9.974915e-02 -1.471439e-01 -1.667432e-01
##           121           122           123           124           125
## -1.680456e-01  2.899822e-02  4.833528e-02  3.506143e-02  1.868645e-02
##           126           127           128           129           130
##  4.748435e-02  3.448893e-02  3.250233e-02 -1.538064e-03  5.460770e-02
##           131           132           133           134           135
##  1.287799e-01  1.331156e-01  1.469378e-01  8.486024e-03  6.439102e-03
##           136           137           138           139           140
## -5.997895e-02 -2.624823e-02  1.156061e-01  1.450449e-01  1.643474e-01
##           141           142           143           144           145
##  8.871516e-02 -1.500705e-01 -2.088446e-01 -1.485247e-01 -9.546454e-02
##           146           147           148           149           150
## -1.648921e-01 -1.003483e-01 -1.122847e-01 -9.953868e-02  2.319007e-01
##           151           152           153           154           155
##  4.114431e-02  3.261996e-02  5.558858e-02 -1.093121e-03 -2.343046e-03
##           156           157           158           159           160
## -9.063816e-05 -3.395186e-02 -7.084372e-02 -2.180441e-03 -9.891307e-02
##           161           162           163           164           165
## -3.951237e-02 -3.874053e-02 -2.391682e-02  9.737577e-02 -3.801571e-02
##           166           167           168           169           170
## -3.578902e-02 -2.162303e-02 -5.260689e-03 -1.308687e-02 -2.051804e-02
##           171           172           173           174           175
##  1.033658e-02  2.185436e-03  4.877736e-02  9.016846e-02 -6.244681e-03
##           176           177           178           179           180
##  2.600085e-03 -2.081362e-02 -6.850276e-02 -1.012200e-01 -9.815220e-02
##           181           182           183           184           185
## -9.432228e-02 -9.482774e-02 -1.266490e-01 -3.097925e-01 -1.008618e-01
##           186           187           188           189           190
## -1.202127e-02 -1.335903e-02 -3.088787e-04 -1.270292e-01  7.848848e-03
##           191           192           193           194           195
##  2.054886e-01  2.441548e-01 -4.855077e-02 -1.614209e-02 -3.053233e-02
##           196           197           198           199
## -5.844232e-02  2.570525e-02  2.987429e-02 -1.579888e-01
##
## $logLik
## [1] 263.5128
##
## $counts
## function gradient
##        8        8
##
## $convergence

## [1] 0
##
## $message
## [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
##
## $model
## [1] "OU"
##
## attr(,"class")
## [1] "anc.ML"

plot(a,fit$ace,xlab="true values",ylab="estimated under OU")
lines(range(x),range(x),lty="dashed",col="red")
title("ancestral states estimated under OU")

祖先狀態(tài)估計當(dāng)某些節(jié)點(diǎn)已知時。我們可以在祖先狀態(tài)的MLE過程中從理論上確定任何內(nèi)部節(jié)點(diǎn)的狀態(tài)勇凭。在實(shí)踐中疚膊，我們可以通過將長度為零的終端邊緣附加到要修復(fù)的內(nèi)部節(jié)點(diǎn)，然后將其作為分析中的另一個端值來處理虾标。另一種可能也考慮到祖先狀態(tài)不確定的可能性寓盗，即在內(nèi)部節(jié)點(diǎn)上使用一個或多個狀態(tài)上的信息先驗概率密度，然后使用貝葉斯MCMC來估計祖先狀態(tài)璧函。讓我們嘗試一下傀蚌，用一個非常糟糕的例子來估計祖先的特征，來自同一時期的尖端數(shù)據(jù)：有趨勢的布朗進(jìn)化蘸吓。請注意喳张，雖然理論上我們可以這樣做，但我們并不適合這里的趨勢模型美澳。

tree<-pbtree(n=100,scale=1)
## simulate data with a trend
x<-fastBM(tree,internal=TRUE,mu=3)
phenogram(tree,x,ftype="off",spread.labels=FALSE)

a<-x[as.character(1:tree$Nnode+Ntip(tree))]
x<-x[tree$tip.label]
## let's see how bad we do if we ignore the trend
plot(a,fastAnc(tree,x),xlab="true values",
    ylab="estimated states under BM")
lines(range(c(x,a)),range(c(x,a)),lty="dashed",col="red")
title("estimated without prior information")


## incorporate prior knowledge
pm<-setNames(c(1000,rep(0,tree$Nnode)),
    c("sig2",1:tree$Nnode+length(tree$tip.label)))
## the root & two randomly chosen nodes
nn<-as.character(c(length(tree$tip.label)+1,
    sample(2:tree$Nnode+length(tree$tip.label),2)))
pm[nn]<-a[as.character(nn)]
## prior variance
pv<-setNames(c(1000^2,rep(1000,length(pm)-1)),names(pm))
pv[as.character(nn)]<-1e-100
## run MCMC
mcmc<-anc.Bayes(tree,x,ngen=100000,
    control=list(pr.mean=pm,pr.var=pv,
    a=pm[as.character(length(tree$tip.label)+1)],
    y=pm[as.character(2:tree$Nnode+length(tree$tip.label))]))

anc.est<-colMeans(mcmc[201:1001,
    as.character(1:tree$Nnode+length(tree$tip.label))])
plot(a[names(anc.est)],anc.est,xlab="true values",
    ylab="estimated states using informative prior")
lines(range(c(x,a)),range(c(x,a)),lty="dashed",col="red")
title("estimated using informative prior")

接下來，重點(diǎn)介紹如何使用通常稱為Mk模型的連續(xù)時間馬爾可夫鏈模型來估計離散值性狀的祖先特征狀態(tài)摸航。讓我們從阿諾利樹中提取數(shù)據(jù)制跟，以便在這些分析中使用：

require(phytools)
data(anoletree)
## this is just to pull out the tip states from the
## data object - normally we would read this from file
x<-getStates(anoletree,"tips")
tree<-anoletree
rm(anoletree)
tree

##
## Phylogenetic tree with 82 tips and 81 internal nodes.
##
## Tip labels:
##  Anolis_ahli, Anolis_allogus, Anolis_rubribarbus, Anolis_imias, Anolis_sagrei, Anolis_bremeri, ...
##
## Rooted; includes branch lengths.
plotTree(tree,type="fan",fsize=0.8,ftype="i")

## estimate ancestral states under a ER model
fitER<-ace(x,tree,model="ER",type="discrete")
fitER

##
##     Ancestral Character Estimation
##
## Call: ace(x = x, phy = tree, type = "discrete", model = "ER")
##
##     Log-likelihood: -78.04604
##
## Rate index matrix:
##    CG GB TC TG Tr Tw
## CG  .  1  1  1  1  1
## GB  1  .  1  1  1  1
## TC  1  1  .  1  1  1
## TG  1  1  1  .  1  1
## Tr  1  1  1  1  .  1
## Tw  1  1  1  1  1  .
##
## Parameter estimates:
##  rate index estimate std-err
##           1   0.0231   0.004
##
## Scaled likelihoods at the root (type '...$lik.anc' to get them for all nodes):
##          CG          GB          TC          TG          Tr          Tw
## 0.018197565 0.202238628 0.042841575 0.428609607 0.004383532 0.303729093

round(fitER$lik.anc,3)

$lik.anc變量主要包含臨界祖先狀態(tài)，也被稱為“經(jīng)驗貝葉斯后驗概率”：

plotTree(tree,type="fan",fsize=0.8,ftype="i")
nodelabels(node=1:tree$Nnode+Ntip(tree),
    pie=fitER$lik.anc,piecol=cols,cex=0.5)
tiplabels(pie=to.matrix(x,sort(unique(x))),piecol=cols,cex=0.3)
add.simmap.legend(colors=cols,prompt=FALSE,x=0.9*par()$usr[1],
    y=-max(nodeHeights(tree)),fsize=0.8)

以上概述的方法是使用MCMC從它們的后驗概率分布中對性狀歷史進(jìn)行采樣酱虎，這叫做隨機(jī)性狀映射雨膨。模型是相同的，但在這種情況下读串，我們得到一個明確的歷史樣本聊记，我們的離散型性狀在樹上的演變-而不是一個概率分布的性狀在節(jié)點(diǎn)。例如恢暖，考慮到上面模擬的數(shù)據(jù)排监，我們可以生成如下的隨機(jī)圖：

## simulate single stochastic character map using empirical Bayes method
mtree<-make.simmap(tree,x,model="ER")
plotSimmap(mtree,cols,type="fan",fsize=0.8,ftype="i")
add.simmap.legend(colors=cols,prompt=FALSE,x=0.9*par()$usr[1],
    y=-max(nodeHeights(tree)),fsize=0.8)

單個隨機(jī)字符映射并不意味著完全孤立-我們需要從隨機(jī)映射的樣本中查看整個分布。這可能有點(diǎn)令人難以抗拒杰捂。例如舆床，以下代碼從我們的數(shù)據(jù)集中生成100個隨機(jī)字符映射，并在網(wǎng)格中繪制它們：

mtrees<-make.simmap(tree,x,model="ER",nsim=100)
par(mfrow=c(10,10))
null<-sapply(mtrees,plotSimmap,colors=cols,lwd=1,ftype="off")

用一種更有意義的方式概括一組隨機(jī)地圖是可能的。例如挨队，在我們的模型下谷暮，我們可以估計每種類型的變化數(shù)、在每種狀態(tài)下花費(fèi)的時間比例以及每個內(nèi)部節(jié)點(diǎn)在每個狀態(tài)下的后驗概率盛垦。例如：

pd<-describe.simmap(mtrees,plot=FALSE)
pd
plot(pd,fsize=0.6,ftype="i")

## now let's plot a random map, and overlay the posterior probabilities
plotSimmap(mtrees[[1]],cols,type="fan",fsize=0.8,ftype="i")
nodelabels(pie=pd$ace,piecol=cols,cex=0.5)
add.simmap.legend(colors=cols,prompt=FALSE,x=0.9*par()$usr[1],
    y=-max(nodeHeights(tree)),fsize=0.8)

最后湿弦，由于我們在完全相同的模型下獲得了這些推論，讓我們將隨機(jī)映射的后驗概率與我們的邊緣祖先狀態(tài)進(jìn)行比較腾夯。在前一種情況下颊埃，我們的概率是通過對祖先的聯(lián)合(而不是邊緣)概率分布的抽樣來獲得的。

plot(fitER$lik.anc,pd$ace,xlab="marginal ancestral states",
    ylab="posterior probabilities from stochastic mapping")
lines(c(0,1),c(0,1),lty="dashed",col="red",lwd=2)

這告訴我們俯在，雖然聯(lián)合和邊緣祖先狀態(tài)重建是不一樣的竟秫，但聯(lián)合隨機(jī)抽樣的邊緣概率和邊緣祖先狀態(tài)是高度相關(guān)的-至少在這個案例研究中是這樣的。

實(shí)操演練

數(shù)據(jù)集中性狀1和性狀2之間是否存在相關(guān)性(使用標(biāo)準(zhǔn)相關(guān)性GLS)跷乐，然后再使用PGLS來確定數(shù)據(jù)集中的性狀1和2之間是否存在進(jìn)化相關(guān)性肥败，使用AIC來確定對PGLS使用OU還是Pagel的λ。

library(geiger)
gruntTree <- read.tree("grunts.phy")
gruntData <- read.csv("grunts.csv", row.names=1)
name.check(gruntTree, gruntData)

library(nlme)
plot(trait2~trait1, data=gruntData)
###run PGLS
pglsResult<-gls(trait1~trait2, cor=corBrownian(phy=gruntTree), data=gruntData, method="ML")
pglsResultOU<-gls(trait1~trait2, cor=corMartins(1, phy=gruntTree, fixed=F), data=gruntData, method="ML")
pglsResultPagel<-gls(trait1~trait2, cor=corPagel(1, phy=gruntTree, fixed=F), data=gruntData, method="ML")
anova(pglsResult, pglsResultOU, pglsResultPagel)

##                 Model df      AIC      BIC     logLik   Test  L.Ratio
## pglsResult          1  3 59.81018 65.54625 -26.905088
## pglsResultOU        2  4 30.42696 38.07506 -11.213482 1 vs 2 31.38321
## pglsResultPagel     3  4 22.83320 30.48129  -7.416599
##                 p-value
## pglsResult
## pglsResultOU     <.0001
## pglsResultPagel

# the main one is the lowest AIC, choose that
anova(pglsResultPagel)

## Denom. DF: 48
##             numDF  F-value p-value
## (Intercept)     1 0.004714  0.9455
## trait2          1 6.053490  0.0175

# yes there is a relationship.

測試連續(xù)型性狀進(jìn)化模型愕提，將BM馒稍、OU以及EB三種模型應(yīng)用于性狀1、性狀2以及性狀3的數(shù)據(jù)集中浅侨。

results<-matrix(nrow=3, ncol=3)

for(i in 3:5) {
  xx <- gruntData[,i]
  names(xx) <- rownames(gruntData)
  bm<-fitContinuous(gruntTree, xx)
  ou<-fitContinuous(gruntTree, xx, model="OU")
  eb<-fitContinuous(gruntTree, xx, model="EB")

  aicc<-c(bm$opt$aicc, ou$opt$aicc, eb$opt$aicc)
  aiccD <- aicc - min(aicc)
  aw <- exp(-0.5 * aiccD)
  aiccW <- aw/sum(aw)

  results[i-2,]<-aiccW
}

rownames(results)<-colnames(gruntData)[3:5]
colnames(results)<-c("bm", "ou", "eb")

# Table of AICc weights
results

##               bm        ou        eb
## trait1 0.4583683 0.3940379 0.1475937
## trait2 0.5292929 0.3002759 0.1704312
## trait3 0.5230986 0.3084647 0.1684367

# 所以EB模型較合適

重構(gòu)性狀1的祖先狀態(tài)纽谒。

library(phytools)
habitat<-gruntData[,2]
names(habitat)<-rownames(gruntData)
fitER<-ace(habitat,gruntTree,model="ER",type="discrete")
cols<-setNames(palette()[1:length(unique(habitat))],sort(unique(habitat)))
plotTree(gruntTree,type="fan",fsize=0.8,ftype="i")

nodelabels(node=1:gruntTree$Nnode+Ntip(gruntTree),
    pie=fitER$lik.anc,piecol=cols,cex=0.5)
tiplabels(pie=to.matrix(habitat,sort(unique(habitat))),piecol=cols,cex=0.3)
add.simmap.legend(colors=cols,prompt=FALSE,x=0.9*par()$usr[1],
    y=-max(nodeHeights(gruntTree)),fsize=0.8)

t1 <- gruntData[,"trait1"]
names(t1) <- rownames(gruntData)
fit <- fastAnc(gruntTree, t1, vars=T, CI=T)
obj<-contMap(gruntTree,t1,plot=FALSE)
plot(obj,type="fan",legend=0.7*max(nodeHeights(gruntTree)))

物種多樣化

使用RPANDA與BAMM軟件分析物種多樣性

BAMM（Bayesian Analysis of Macroevolutionary Mixtures）是一個利用rjMCMC模型對物種形成、滅絕和性狀進(jìn)化的復(fù)雜動態(tài)進(jìn)行建模的軟件如输。然而鼓黔，本教程將只集中于物種的形成和滅絕率的估計。運(yùn)行BAMM需要兩個主要輸入文件：控制文件和系統(tǒng)發(fā)生樹不见。

控制文件以及樹文件內(nèi)容等詳細(xì)信息可閱讀說明書澳化，下面是運(yùn)行分析結(jié)果：

install.packages("BAMMtools")
library(BAMMtools)

tree <- read.tree("whaletree.tre")
plot(tree, cex = 0.35)

events <- read.csv("event_data.txt")
data(whales, events.whales)
ed <- getEventData(whales, events.whales, burnin=0.25)
head(ed$eventData)

bamm.whales <- plot.bammdata(ed, lwd=2, labels = T, cex = 0.5)
addBAMMshifts(ed, cex=2)
addBAMMlegend(bamm.whales, location  = c(-1, -0.5, 40, 80), nTicks = 4, side = 4, las = 1)

#We can also plot the rate through time estimated by BAMM
plot.new()
plotRateThroughTime(ed, intervalCol="red", avgCol="red", ylim=c(0,1), cex.axis=2)
text(x=30, y= 0.8, label="All whales", font=4, cex=2.0, pos=4)

#plot the rate through time estimated for a given clade
plotRateThroughTime(ed, intervalCol="blue", avgCol="blue", node=141, ylim=c(0,1),cex.axis=1.5)
text(x=30, y= 0.8, label="Dolphins only", font=4, cex=2.0, pos=4)

#exclude this given clade and thus plot the rate through time for the “background clade”
plotRateThroughTime(ed, intervalCol="darkgreen", avgCol="darkgreen", node=141, nodetype = "exclude", ylim=c(0,1), cex.axis=1.5)
text(x=30, y= 0.8, label="Non-dolphins", font=4, cex=2.0, pos=4)

##Bamm estimates speciation and extinction rates for each branch on the tree. 
##Here we will extract the rates estimated for each tip and plot in different ways.
tip.rates <- getTipRates(ed)
str(tip.rates)
hist(tip.rates$lambda.avg,xlab="average lambda"
hist(tip.rates$mu.avg,xlab="average mu")

##you can separate different groups of taxa to compare their tip rates
boxplot(tip.rates$lambda.avg[53:87], tip.rates$lambda.avg[1:52], col = c("red", "blue"), names = c("dolphins", "other cetaceans"))

RPANDA利用最大似然分析，可以擬合不同的速率隨時間變化的模型稳吮，并選擇最優(yōu)的擬合模型缎谷。RPANDA與BAMM主要有兩個區(qū)別：

RPANDA軟件必須指定測試的模型，而BAMM會計算每一分支的速率灶似；
RPANDA軟件列林，用戶必須事先知道哪一類可能會顯示特定的多樣化，而BAMM將使用rjmcmc算法估計速率偏移的位置酪惭；

未完待續(xù)

最后編輯于：2020.04.17 10:46:45

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

禁止轉(zhuǎn)載希痴，如需轉(zhuǎn)載請通過簡信或評論聯(lián)系作者。

人面猴
序言：七十年代末撞蚕，一起剝皮案震驚了整個濱河市润梯，隨后出現(xiàn)的幾起案子，更是在濱河造成了極大的恐慌，老刑警劉巖纺铭，帶你破解...
沈念sama閱讀 206,723評論 6贊 481
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件寇钉，死亡現(xiàn)場離奇詭異，居然都是意外死亡舶赔，警方通過查閱死者的電腦和手機(jī)扫倡，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 88,485評論 2贊 382
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來竟纳，“玉大人撵溃，你說我怎么就攤上這事∽独郏” “怎么了缘挑？”我有些...
開封第一講書人閱讀 152,998評論 0贊 344
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵，是天一觀的道長桶略。經(jīng)常有香客問我语淘，道長，這世上最難降的妖魔是什么际歼？我笑而不...
開封第一講書人閱讀 55,323評論 1贊 279
?港島之戀（遺憾婚禮）
正文為了忘掉前任惶翻，我火速辦了婚禮，結(jié)果婚禮上鹅心，老公的妹妹穿的比我還像新娘吕粗。我一直安慰自己，他們只是感情好旭愧，可當(dāng)我...
茶點(diǎn)故事閱讀 64,355評論 5贊 374
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布颅筋。她就那樣靜靜地躺著，像睡著了一般输枯。火紅的嫁衣襯著肌膚如雪垃沦。梳的紋絲不亂的頭發(fā)上，一...
開封第一講書人閱讀 49,079評論 1贊 285
城市分裂傳說
那天用押，我揣著相機(jī)與錄音，去河邊找鬼靶剑。笑死蜻拨，一個胖子當(dāng)著我的面吹牛，可吹牛的內(nèi)容都是我干的桩引。我是一名探鬼主播缎讼，決...
沈念sama閱讀 38,389評論 3贊 400
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼，長吁一口氣：“原來是場噩夢啊……” “哼坑匠！你這毒婦竟也來了血崭？” 一聲冷哼從身側(cè)響起，我...
開封第一講書人閱讀 37,019評論 0贊 259
萬榮殺人案實(shí)錄
序言：老撾萬榮一對情侶失蹤，失蹤者是張志新（化名）和其女友劉穎夹纫，沒想到半個月后咽瓷，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體，經(jīng)...
沈念sama閱讀 43,519評論 1贊 300
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡舰讹，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 35,971評論 2贊 325
?白月光啟示錄
正文我和宋清朗相戀三年茅姜，在試婚紗的時候發(fā)現(xiàn)自己被綠了。大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片月匣。...
茶點(diǎn)故事閱讀 38,100評論 1贊 333
活死人
序言：一個原本活蹦亂跳的男人離奇死亡钻洒，死狀恐怖，靈堂內(nèi)的尸體忽然破棺而出锄开，到底是詐尸還是另有隱情素标，我是刑警寧澤，帶...
沈念sama閱讀 33,738評論 4贊 324
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布萍悴，位于F島的核電站头遭，受9級特大地震影響，放射性物質(zhì)發(fā)生泄漏退腥。R本人自食惡果不足惜任岸，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 39,293評論 3贊 307
男人毒藥：我在死后第九天來索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望狡刘。院中可真熱鬧享潜，春花似錦、人聲如沸嗅蔬。這莊子的主人今日做“春日...
開封第一講書人閱讀 30,289評論 0贊 19
一樁弒父案，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽澜术。三九已至艺蝴，卻和暖如春，著一層夾襖步出監(jiān)牢的瞬間鸟废，已是汗流浹背猜敢。一陣腳步聲響...
開封第一講書人閱讀 31,517評論 1贊 262
情欲美人皮
我被黑心中介騙來泰國打工，沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留盒延，地道東北人缩擂。一個月前我還...
沈念sama閱讀 45,547評論 2贊 354
代替公主和親
正文我出身青樓，卻偏偏與公主長得像添寺，于是被迫代替她去往敵國和親胯盯。傳聞我的和親對象是個殘疾皇子，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 42,834評論 2贊 345