之前好多人在公眾號留言問這個 方差分解 的內容,但是之前自己也沒有聽說過晨逝。最近看到有人分享了公眾號推文 一種簡單易行的方差分解方法。看了這個推文我目前理解的是 方差分解的主要作用是 量化回歸模型Y=b0+b1x1+b2x2+…中x1, x2, x3…對Y貢獻的相對大小遏佣,以及不同X所屬的因素類別(如生物因素,非生物因素)對Y的貢獻大小揽浙。
這篇推文以已經(jīng)發(fā)表的論文中的數(shù)據(jù)為例子進行了介紹状婶,論文是
image.png
這篇論文關于方差分解的內容數(shù)據(jù)代碼是公開的,下載鏈接是
https://figshare.com/s/053837c4fa852f035448
image.png
我看了這些代碼馅巷,有的地方還看不明白膛虫,但是利用數(shù)據(jù)能夠跑通流程,今天先記錄一下钓猬,后面抽時間再看稍刀,有什么新的理解再來記錄
首先是讀入數(shù)據(jù)
datatotal<-read.table("datasetmultifunctionality.txt", header=T, sep="\t")
colnames(datatotal)
接下來的代碼是對數(shù)據(jù)進行轉化
有的是常規(guī)的標準化
有的是log轉化
常規(guī)的標準化開頭提到的推文里介紹了方差分解必須用標準化后的數(shù)據(jù),但是有的log轉化是什么意思呢敞曹?
#####logtransformation moments
datatotal[,c(12,13,16,17)]<-log(datatotal[,c(12,13,16,17)])
datatotal[,14]<-log(datatotal[,14]-min(datatotal[,14])+1)
datatotal[,15]<-log(datatotal[,15]-min(datatotal[,15])+1)
datatotal[,18]<-log(datatotal[,18]-min(datatotal[,18])+1)
datatotal[,19]<-log(datatotal[,19]-min(datatotal[,19])+1)
#####Zscorring environmental variables
datatotal$ELEVATION<-(datatotal$ELEVATION-mean(datatotal$ELEVATION))/sd(datatotal$ELEVATION)
datatotal$LAT<-(datatotal$LAT-mean(datatotal$LAT))/sd(datatotal$LAT)
datatotal$SINLONG<-(datatotal$SINLONG-mean(datatotal$SINLONG))/sd(datatotal$SINLONG)
datatotal$COSLONG<-(datatotal$COSLONG-mean(datatotal$COSLONG))/sd(datatotal$COSLONG)
datatotal$SLO<-(datatotal$SLO-mean(datatotal$SLO))/sd(datatotal$SLO)
datatotal$ARIDITY<-(datatotal$ARIDITY-mean(datatotal$ARIDITY))/sd(datatotal$ARIDITY)
datatotal$SAND<-(datatotal$SAND-mean(datatotal$SAND))/sd(datatotal$SAND)
datatotal$PH<-(datatotal$PH-mean(datatotal$PH))/sd(datatotal$PH)
datatotal$SR<-(datatotal$SR-mean(datatotal$SR))/sd(datatotal$SR)
#####Zscorring moments
datatotal$CWM_logH<-(datatotal$CWM_logH-mean(datatotal$CWM_logH))/sd(datatotal$CWM_logH)
datatotal$CWV_logH<-(datatotal$CWV_logH-mean(datatotal$CWV_logH))/sd(datatotal$CWV_logH)
datatotal$CWS_logH<-(datatotal$CWS_logH-mean(datatotal$CWS_logH))/sd(datatotal$CWS_logH)
datatotal$CWK_logH<-(datatotal$CWK_logH-mean(datatotal$CWK_logH))/sd(datatotal$CWK_logH)
datatotal$CWM_logSLA<-(datatotal$CWM_logSLA-mean(datatotal$CWM_logSLA))/sd(datatotal$CWM_logSLA)
datatotal$CWV_logSLA<-(datatotal$CWV_logSLA-mean(datatotal$CWV_logSLA))/sd(datatotal$CWV_logSLA)
datatotal$CWS_logSLA<-(datatotal$CWS_logSLA-mean(datatotal$CWS_logSLA))/sd(datatotal$CWS_logSLA)
datatotal$CWK_logSLA<-(datatotal$CWK_logSLA-mean(datatotal$CWK_logSLA))/sd(datatotal$CWK_logSLA)
#####Zscorring ecosystem functions
datatotal$BGL<-(datatotal$BGL-mean(datatotal$BGL))/sd(datatotal$BGL)
datatotal$FOS<-(datatotal$FOS-mean(datatotal$FOS))/sd(datatotal$FOS)
datatotal$AMP<-(datatotal$AMP-mean(datatotal$AMP))/sd(datatotal$AMP)
datatotal$NTR<-(datatotal$NTR-mean(datatotal$NTR))/sd(datatotal$NTR)
datatotal$I.NDVI<-(datatotal$I.NDVI-mean(datatotal$I.NDVI))/sd(datatotal$I.NDVI)
#####Calculating indices of multifunctionality (M5: 5 functions)
colnames(datatotal)
M5<-rowMeans(datatotal[,c(20,21,22,23,24)])
datatotal<-cbind(datatotal,M5)
#####Log-transfromation of multifunctionality
logM5<-log(datatotal$M5-min(datatotal$M5)+1)
datatotal<-cbind(datatotal,logM5)
加載 MuMIn這個包做模型選擇
代碼是
library(MuMIn)
mod12<-lm(logM5 ~ LAT + SINLONG + COSLONG +
ARIDITY + SLO + SAND + PH + I(PH^2) + ELEVATION+
CWM_logSLA + I(CWM_logSLA^2)+ CWV_logSLA + I(CWV_logSLA^2) + CWS_logSLA + CWK_logSLA + I(CWK_logSLA^2) +
CWM_logH + I(CWM_logH^2)+ CWV_logH + I(CWV_logH^2) + CWS_logH + CWK_logH + I(CWK_logH^2) +
SR
, data=datatotal)
# 這一步要好長時間
dd12<-dredge(mod12, subset = ~ LAT & SINLONG & COSLONG & ARIDITY & SLO & SAND & PH &SR & ELEVATION &
dc(CWM_logSLA,I(CWM_logSLA^2)) & dc(CWV_logSLA,I(CWV_logSLA^2)) & dc(CWK_logSLA,I(CWK_logSLA^2))
& dc(CWM_logH,I(CWM_logH^2)) & dc(CWV_logH,I(CWV_logH^2)) & dc(CWK_logH,I(CWK_logH^2)),
options(na.action = "na.fail"))
subset(dd12,delta<2)
de12<-model.avg(dd12, subset = delta < 2)
summary(de12)
image.png
image.png
這一步得到的數(shù)據(jù)就是論文中 的figure4a
image.png
下期推文介紹如何利用得到的數(shù)據(jù)畫圖
這里遇到的問題是:
- 1账月、 模型里有的變量會用
I()
函數(shù)包起來综膀,這個函數(shù)起到什么作用呢? - 2局齿、模型選擇那一步用到了
dc()
函數(shù)僧须,這個函數(shù)又起到什么作用呢?
今天的內容就到這里了
歡迎大家關注我的公眾號
小明的數(shù)據(jù)分析筆記本
小明的數(shù)據(jù)分析筆記本 公眾號 主要分享:1项炼、R語言和python做數(shù)據(jù)分析和數(shù)據(jù)可視化的簡單小例子担平;2、園藝植物相關轉錄組學锭部、基因組學暂论、群體遺傳學文獻閱讀筆記;3拌禾、生物信息學入門學習資料及自己的學習筆記取胎!
今天的內容主要參考
- 公眾號 二傻統(tǒng)計 的推文 一種簡單易行的方差分解方法