目錄
- 0.問題導(dǎo)入
- 1.示例數(shù)據(jù)
- 2.數(shù)據(jù)導(dǎo)入及訓(xùn)練組/驗證組拆分(70%/30%)
- 3.訓(xùn)練集NDVI~訓(xùn)練結(jié)果的相關(guān)性與時間演進分析
- 4.驗證集NDVI~模型預(yù)測結(jié)果的相關(guān)性與時間演進分析
- 5.模型訓(xùn)練及驗證期間殘差分析
- 6.總結(jié)
- 7.本文所用軟件包(沒有需要通過install.packages進行安裝)
- 8.致謝
0. 問題導(dǎo)入
今天我們通過建立歸一化植被指數(shù)(NDVI)與氣象及生態(tài)要素(包括長波輻射懈费,短波輻射耀找,溫度毛秘,降水,0-200cm土壤水傅事,根系處土壤水,蒸散發(fā)以及總初級生產(chǎn)力(GPP))的關(guān)系來詳細(xì)闡述廣義加性模型在面向地理及時間序列數(shù)據(jù)建模方面的應(yīng)用峡扩。
1. 示例數(shù)據(jù)
本數(shù)據(jù)為基于GLDAS再分析數(shù)據(jù)集與NDVI遙感影像在隨機點的時間序列蹭越,時間長度為2002-4 ~ 2015-12,時間分辨率為月教届。
點我下載示例數(shù)據(jù)
示例數(shù)據(jù)預(yù)覽:
head(df)
ndvi soil1 soil2 soil3 soil4 rain
1 0.237984169 -0.01210715 0.03579731 0.1269299 0.07318894 -0.01543584
2 0.370455335 0.38147139 0.31089661 0.2241396 0.10204067 0.20701857
3 0.331657733 0.41044975 0.48385978 0.4471074 0.25112199 0.62105802
4 0.216662956 0.32583872 0.41198999 0.4231082 0.42613716 0.37216417
5 0.054132382 0.24177292 0.20540345 0.2979310 0.43549429 0.06553887
6 -0.005636952 0.41268755 0.29207486 0.2508858 0.37087816 0.25502620
longwave shortwave root_sm evap temper gpp
1 0.059414987 0.215758745 0.06890271 -0.07747205 0.009909431 -0.04072053
2 0.009142641 0.244385277 0.31129426 0.23793998 0.172678808 0.18329118
3 -0.097150000 0.353491078 0.48706357 0.59985033 0.314583437 0.24478460
4 -0.031527285 0.355970841 0.42948289 0.51469995 0.348457057 0.47172457
5 -0.291598633 0.297255464 0.27643746 0.38420614 0.326270291 0.37802032
6 0.010729154 -0.009709685 0.32028271 0.31684306 0.119881673 0.15986512
1. 數(shù)據(jù)導(dǎo)入及訓(xùn)練組/驗證組拆分(70%/30%)
setwd('L:\\JianShu\\20191222')
df = read.csv('data.csv',header = T)
df = df[,-1]
df = as.data.frame(df)
train = df[1:115,]
test = df[116:165,]
2. 構(gòu)建并訓(xùn)練廣義加性模型(GAM)
注意:本例基于mgcv包的gam函數(shù)展開
通過模型訓(xùn)練結(jié)果响鹃,我們可以判斷在眾多因子中,在顯著性水平小于0.01 的情況下案训,soil2买置,gpp,rain與evaporation的時間演變過程對該點歸一化植被指數(shù)(NDVI)的變化規(guī)律會產(chǎn)生顯著影響萤衰。
library(mgcv)
fit = mgcv::gam(ndvi ~s(soil1)+s(soil2)+s(soil3)+s(soil4)+s(gpp)+
s(rain)+s(longwave)+s(shortwave)+s(root_sm)+s(evap)+s(temper),data = train,
trace = TRUE)
summary(fit)
Family: gaussian
Link function: identity
Formula:
ndvi ~ s(soil1) + s(soil2) + s(soil3) + s(soil4) + s(gpp) + s(rain) +
s(longwave) + s(shortwave) + s(root_sm) + s(evap) + s(temper)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.011837 0.006971 -1.698 0.0936 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(soil1) 3.916 4.885 2.146 0.07025 .
s(soil2) 8.391 8.813 3.212 0.00224 **
s(soil3) 1.000 1.000 6.147 0.01532 *
s(soil4) 4.255 5.079 2.601 0.03054 *
s(gpp) 6.819 7.762 3.107 0.00524 **
s(rain) 1.463 1.776 21.553 2.45e-06 ***
s(longwave) 1.000 1.000 0.008 0.92901
s(shortwave) 1.879 2.356 4.550 0.01028 *
s(root_sm) 2.679 3.510 3.750 0.01658 *
s(evap) 1.000 1.000 9.893 0.00234 **
s(temper) 5.103 6.322 1.751 0.12357
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.804 Deviance explained = 86.9%
GCV = 0.008401 Scale est. = 0.0055881 n = 115
3. 訓(xùn)練集NDVI~訓(xùn)練結(jié)果的相關(guān)性與時間演進分析
3.1 訓(xùn)練集NDVI~訓(xùn)練結(jié)果的相關(guān)性分析(圖1)
pl_df_cor = data.frame(SIMU =fit$fitted.values, REAL = train$ndvi)
label_df = data.frame(x = -0.2, y = 0.4, label = paste0('R: ',round(cor(pl_df_cor$SIMU,pl_df_cor$REAL),2)))
p1_cor = ggplot()+
geom_point(data = pl_df_cor,aes(x= SIMU,y = REAL),color = 'blue',size = 5,
shape = 1)+
geom_abline(intercept = 0,slope = 1,size = 1)+
geom_text(data = label_df,aes(x = x,y = y,label = label),size = 6,color = 'black')+
theme_bw()+
theme(
axis.text = element_text(face = 'bold',color = 'black',size = 12, hjust = .5),
axis.title = element_text(face = 'bold',color = 'black',size = 14, hjust = .5)
)+
xlab('SIMULATION RESULT')+
ylab('REAL NDVI')
png('plot2.png',
height = 10,
width = 20,
units = 'cm',
res = 800)
print(p1_cor)
dev.off()
3.2 訓(xùn)練集NDVI~訓(xùn)練結(jié)果的時間演進分析(圖2)
date = seq(as.Date('2002-04-01'),
as.Date('2015-12-01'),
'1 month')
date_train = date[1:115]
pl_df = data.frame(date = date_train, SIMU = fit$fitted.values, REAL = train$ndvi)
library(reshape2)
library(ggplot2)
pl_df = melt(pl_df,'date')
p1 = ggplot()+
geom_line(data = pl_df,aes(x = date, y = value,color = variable),size = 1)+
scale_color_manual(values= c('green','blue'))+
theme_bw()+
theme(
axis.text = element_text(face = 'bold',color = 'black',size = 12, hjust = .5),
axis.title = element_text(face = 'bold',color = 'black',size = 14, hjust = .5),
legend.position = 'bottom',
legend.direction = 'horizontal'
)+
xlab('Time (month)')+
ylab('NDVI (1)')
png('plot1.png',
height = 10,
width = 20,
units = 'cm',
res = 800)
print(p1)
dev.off()
4. 驗證集NDVI~模型預(yù)測結(jié)果的相關(guān)性與時間演進分析
4.1 模型預(yù)測結(jié)果計算
predict_test = predict(fit, test)
4.2 驗證集NDVI~模型預(yù)測結(jié)果的相關(guān)性分析(圖3)
通過對比圖1與圖3堕义, 我們可以發(fā)現(xiàn)模型在驗證期與訓(xùn)練期相關(guān)性均大于0.7,并未在驗證期出現(xiàn)模型預(yù)測能力顯著下降的問題脆栋,證明該模型未產(chǎn)生過擬合倦卖。
pl_df_cor = data.frame(SIMU =predict_test, REAL = test$ndvi)
label_df = data.frame(x = -0.2, y = 0.4, label = paste0('R: ',round(cor(pl_df_cor$SIMU,pl_df_cor$REAL),2)))
p2_cor = ggplot()+
geom_point(data = pl_df_cor,aes(x= SIMU,y = REAL),color = 'blue',size = 5,
shape = 1)+
geom_abline(intercept = 0,slope = 1,size = 1)+
geom_text(data = label_df,aes(x = x,y = y,label = label),size = 6,color = 'black')+
theme_bw()+
theme(
axis.text = element_text(face = 'bold',color = 'black',size = 12, hjust = .5),
axis.title = element_text(face = 'bold',color = 'black',size = 14, hjust = .5)
)+
xlab('SIMULATION RESULT')+
ylab('REAL NDVI')
png('plot3.png',
height = 20,
width = 20,
units = 'cm',
res = 800)
print(p2_cor)
dev.off()
4.3 驗證集NDVI~模型預(yù)測結(jié)果的時間演進分析(圖4)
date_test = date[116:165]
pl_df_test = data.frame(date = date_test, SIMU = predict_test, REAL = test$ndvi)
pl_df_test = melt(pl_df_test,'date')
p2 = ggplot()+
geom_line(data = pl_df_test,aes(x = date, y = value,color = variable),size = 1)+
scale_color_manual(values= c('green','blue'))+
theme_bw()+
theme(
axis.text = element_text(face = 'bold',color = 'black',size = 12, hjust = .5),
axis.title = element_text(face = 'bold',color = 'black',size = 14, hjust = .5),
legend.position = 'bottom',
legend.direction = 'horizontal'
)+
xlab('Time (month)')+
ylab('NDVI (1)')
png('plot4.png',
height = 10,
width = 20,
units = 'cm',
res = 800)
print(p2)
dev.off()
5. 模型訓(xùn)練及驗證期間殘差分析(圖5)
根據(jù)圖5,我們可以發(fā)現(xiàn)模型在訓(xùn)練及驗證期間殘差均大致服從與均值為1 的正太分布椿争,間接證明模型的實用性怕膛。
train_residuals = fit$residuals
test_residuals = test$ndvi - predict_test
residuals1 = data.frame(RESIDUALS = train_residuals,type = 'TRAIN')
residuals2 = data.frame(RESIDUALS = test_residuals,type = 'TEST')
residuals = rbind(residuals1,residuals2)
print(round(mean(residuals1$RESIDUALS),2))
[1] 0
print(round(mean(residuals2$RESIDUALS),2))
[1] 0.01
p3 = ggplot(data = residuals)+
geom_histogram(aes(x = RESIDUALS, stat(count),fill = type),
binwidth = 0.05)+
scale_fill_manual(values = c('green','blue'))+
theme_bw()+
theme(
axis.text = element_text(face = 'bold',color = 'black',size = 12, hjust = .5),
axis.title = element_text(face = 'bold',color = 'black',size = 14, hjust = .5),
legend.text = element_text(face = 'bold',color = 'black',size = 12, hjust = .5),
legend.title = element_text(face = 'bold',color = 'black',size = 14, hjust = .5),
legend.position = 'bottom',
legend.direction = 'horizontal'
)+
xlab('Residuals')+
ylab('Count')
png('plot4.png',
height = 20,
width = 20,
units = 'cm',
res = 800)
print(p3)
dev.off()
6. 總結(jié)
本文主要解決了以下問題:
- 如何利用廣義加性模型(GAM)面向時間序列建模?
- 如何評估多元模型是否發(fā)生過擬合及其可用性秦踪?
7. 本文所用軟件包(沒有需要通過install.packages進行安裝)
library(reshape2)
library(ggplot2)
library(mgcv)
8. 致謝
首先褐捻,感謝大家的持續(xù)關(guān)注,小編會繼續(xù)努力椅邓,持續(xù)更新下去的柠逞!
大家如果覺得有用,還麻煩大家轉(zhuǎn)發(fā)點贊加關(guān)注哈景馁,也可以擴散到朋友圈板壮,多謝大家啦~
大家如果在使用本文代碼的過程有遇到問題的,可以留言評論合住,也可以私信我哈~~