筆記內(nèi)容:
- R vegan包envfit的output及其計(jì)算
- python實(shí)現(xiàn):見(jiàn)github
- 注意
R vegan包envfit的output及其計(jì)算
vegan::envfit
用法見(jiàn)R使用筆記: scatterplot with confidence ellipses;envfit的實(shí)現(xiàn)及釋意球昨,在R中其實(shí)只用envfit(mds_point, env, permutation=999)
就可以得到envfit的結(jié)果盏触。
library(vegan)
data(iris)
head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
iris_data = iris[,1:4]
iris_env = iris[,1:5] # 包括一個(gè)factor,4個(gè)vector
ord = data.frame(cmdscale(dist(iris_data),k=2))
colnames(ord) = c('X1','X2') # ord為pcoa的坐標(biāo)
envfit_result = envfit(ord,iris_env,permutations = 100)
envfit_result # envfit的結(jié)果:
***VECTORS
X1 X2 r2 Pr(>r)
Sepal.Length 0.48219 0.87607 0.9579 0.009901 **
Sepal.Width -0.11499 0.99337 0.8400 0.009901 **
Petal.Length 0.98013 -0.19836 0.9981 0.009901 **
Petal.Width 0.97852 -0.20615 0.9366 0.009901 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Permutation: free
Number of permutations: 100
***FACTORS:
Centroids:
X1 X2
Speciessetosa -2.6424 0.1909
Speciesversicolor 0.5332 -0.2455
Speciesvirginica 2.1092 0.0547
Goodness of fit:
r2 Pr(>r)
Species 0.8868 0.009901 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Permutation: free
Number of permutations: 100
input包括每個(gè)樣本的降維坐標(biāo)(或者你把整個(gè)降維結(jié)果cmdscale()
傳給envfit也可以),環(huán)境變量表以及permutation的次數(shù)。permutation參數(shù)用于P值的計(jì)算。
結(jié)果分為vector和factor兩個(gè)部分通危,其中vector結(jié)果是對(duì)數(shù)值型的環(huán)境變量分析。在envfit_result中結(jié)構(gòu)如下所示灌曙。
$arrows
是各vector的坐標(biāo);
$r
為goodness of fit, 即該環(huán)境變量的樣本點(diǎn)在linear model上擬合的程度菊碟。
$pvals
為經(jīng)過(guò)n次permutation計(jì)算,得到n個(gè)r_shuffle值在刺,大于等于真實(shí)r值的個(gè)數(shù)/n.
arrows的計(jì)算:
r2(goodness of fit / coefficient of determination)的計(jì)算:
對(duì)factor而言:
$centroids
為各factor中各level的坐標(biāo)
$r
為goodness of fit, 各level只有一個(gè)r值
$pvals
permutation得到的P值
計(jì)算:
SSW和SSB的計(jì)算及ANOVA相關(guān)見(jiàn)這個(gè)連接
python實(shí)現(xiàn)
python代碼見(jiàn)github https://github.com/CS0000/envfit_py
要用到這些模塊:
import pandas as pd
from sklearn.datasets import load_iris
from skbio.stats.ordination import pcoa
from math import sqrt
from scipy.spatial.distance import squareform, pdist
from sklearn.preprocessing import scale
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
import numpy as np
見(jiàn)github中的envfit2py.py
腳本逆害,在if __name__ == '__main__':
里面load進(jìn)數(shù)據(jù)头镊,調(diào)用函數(shù)就可以了。
注意
- anvova table的計(jì)算
- R envfit的源代碼索引:
envfit.default.R:envfit函數(shù)code,調(diào)用了vectorfit和factorfit魄幕。
vectorfit.R
factorfit.R
centroids.cca.R : 在factorfit.R中計(jì)算centroids
goffactor.c:用到的C代碼:可以通過(guò).Call
和vegan:::
來(lái)調(diào)用 - ...用python寫(xiě)一遍envfit的本意是希望節(jié)省時(shí)間相艇,減少內(nèi)存的占用。但是這個(gè)腳本就節(jié)省時(shí)間而言一點(diǎn)也不樂(lè)觀纯陨,甚至比R還慢...=_=坛芽。后面考慮用多線程,但如果多線程也只能勉強(qiáng)達(dá)到R的用的時(shí)間翼抠,那還有什么意義...? 優(yōu)化之路道阻且長(zhǎng)咙轩。
- Python和R的output還是存在一些差別,并不能保證完全一毛一樣阴颖。似乎因?yàn)镽對(duì)linear model的底層算法和python不太一樣活喊,所以會(huì)有差異。介于linear model在R扒到底都是C寫(xiě)的了量愧,我完全看不懂钾菊,所以...=_=后面如果要用到了,還會(huì)再優(yōu)化的偎肃。
- 這里只實(shí)現(xiàn)了envfit最最最基本的功能煞烫,envfit似乎還可以給croodination設(shè)置weight,我在這里沒(méi)有設(shè)置软棺。
-
vegan包文檔中對(duì)envfit的描述:
Function envfit finds vectors or factor averages of environmental variables... If X is a data.frame, envfit uses factorfit for factor variables and vectorfit for other variables. If X is a matrix or a vector, envfit uses only vectorfit. Alternatively, the model can be defined a simplified model formula, where the left hand side must be an ordination result object or a matrix of ordination scores, and right hand side lists the environmental variables.
Functions vectorfit and factorfit can be called directly. Function vectorfit finds directions in the ordination space towards which the environmental vectors change most rapidly and
to which they have maximal correlations with the ordination configuration. Function factorfit finds averages of ordination scores for factor levels. Function factorfit treats ordered and unordered factors similarly.
If permutations > 0, the ‘significance’ of fitted vectors or factors is assessed using permutation of environmental variables. The goodness of fit statistic is squared correlation coefficient (r2). For factors this is defined as r2 = 1 ? ssw/sst, where ssw and sst are within-group and total sums of squares.