本文原始地址:
https://www.r-bloggers.com/how-to-fit-a-copula-model-in-r-heavily-revised-part-1-basic-tools/
本文只是自學(xué)copula的記錄文件,并非逐字逐句翻譯,詳情請(qǐng)見(jiàn)原文.
作者一年前發(fā)過(guò)個(gè)帖子,因?yàn)楫?dāng)時(shí)相關(guān)的文檔少,難度大,對(duì)新手非常之不友好.
現(xiàn)在要更新一下這個(gè)老帖子,原因有二:
- 對(duì)原帖不滿意
- 通過(guò)觀察原帖的訪問(wèn)者,似乎大家都想知道的是該如何入門(mén),如何生成一個(gè)copula模型.那好吧,這個(gè)可以有.
此新帖包含兩個(gè)部分:
給新手介紹使用R進(jìn)行copulas的基本工具和函數(shù).
介紹如何選擇copula,擬合過(guò)程,評(píng)估及其它問(wèn)題,最后是一個(gè)案例.
本文是介紹如何用R處理copulas,言下之意是對(duì)copulas
,假設(shè)你已經(jīng)知道起碼橢圓和阿基米德copulas了.如果你不知道,那推薦看看這本書(shū).此書(shū)對(duì)新手很合適,條理和解釋清晰.
橢圓copulas
如果定義$F$為橢圓分布的多元累積分布函數(shù),$F_i$為第i個(gè)邊緣分布,$F^{-1}$為其反函數(shù),那么就可以構(gòu)建如下橢圓copula:
$$ C(u_1, … , u_n) = F (F_1^{-1}(u_1) + … + F_1^{-1}(u_1)) $$
阿基米德copulas
阿基米德copula可以用$phi$函數(shù)(生成器函數(shù))來(lái)生成:
$$ C(u_1, … , u_n) = phi ^{-1} (phi(u_1) + … + phi(u_n)) $$
比如,可以用如下生成器函數(shù)來(lái)生成一個(gè)Gumbel copula:
$$-( ln x )^{theta}$$
其反函數(shù)$$ exp( – y^{frac{1}{theta}} ) $$
最常見(jiàn)的阿基米德copulas包括Frank,Gumbel和Clayton.這三個(gè)也是本文所要用到的.
加載必要的庫(kù)
我們首先加載需要用到的庫(kù)文件,設(shè)置好random seed以復(fù)用.
# Copula package
library(copula)
# Fancy 3D plain scatterplots
library(scatterplot3d)
# ggplot2
library(ggplot2)
# Useful package to set ggplot plots one next to the other
library(grid)
set.seed(235)
橢圓copulas和阿基米德copulas熱身--用已知參數(shù)生成copula
如你所知,橢圓與阿基米德couplas有很好的數(shù)學(xué)性質(zhì),形狀和公式.用它們來(lái)熱身再好不過(guò).假設(shè)你已經(jīng)知道了參數(shù),下面的步驟將告訴你如何生成二元的正態(tài)copula和t copula.
# Generate a bivariate normal copula with rho = 0.7
normal <- normalCopula(param = 0.7, dim = 2)
# Generate a bivariate t-copula with rho = 0.8 and df = 2
stc <- tCopula(param = 0.8, dim = 2, df = 2)
這里的參數(shù)已經(jīng)很明顯表達(dá)了含義.正態(tài)copula有兩個(gè)參數(shù):copula的維度,這里等于2,以及可以從數(shù)據(jù)中估計(jì)出來(lái)的$rho$參數(shù),關(guān)于它我會(huì)在第二部分講到.
t copula除了有上述的兩個(gè)參數(shù)以外,還包括一個(gè)df參數(shù),它能決定copula的形狀,其值和$rho$一樣也可從數(shù)據(jù)中估計(jì)出.copula包提供了相應(yīng)的函數(shù)來(lái)生成需要的copula模型:一般的形式為"name"+"Copula()",用于生成阿基米德copulas.阿基米德copulas只需要一個(gè)參數(shù)$theta$,具體如下:
# Build a Frank, a Gumbel and a Clayton copula
frank <- frankCopula(dim = 2, param = 8)
gumbel <- gumbelCopula(dim = 3, param = 5.6)
clayton <- claytonCopula(dim = 4, param = 19)
# Print information on the Frank copula
print(frank)
用mvdc在指定邊緣分布下生成copula
如你所知逆屡,copulas的吸引力在于紧帕,能將邊緣分布從相關(guān)性結(jié)構(gòu)中剝離出來(lái),獨(dú)立的形成模型澈魄。這一特性能夠極大地簡(jiǎn)化對(duì)復(fù)雜相關(guān)性行為進(jìn)行建模的過(guò)程采章,因?yàn)檫吘壏植急嚷?lián)合分布要簡(jiǎn)單得多运嗜。比如,你要對(duì)20多個(gè)對(duì)象進(jìn)行建模(比如對(duì)電器中的20多個(gè)電容)悯舟,你會(huì)發(fā)現(xiàn)直接建模很困難担租,而用邊緣分布就簡(jiǎn)單多了。
copula包提供了一系列好用的函數(shù)(mvdc抵怎,dMvdc奋救,pMvdc,rMvdc)用于建立多元分布的copula.如下:
# Select the copula
cp <- claytonCopula(param = c(3.4), dim = 2)
# Generate the multivariate distribution (in this case it just bivariate) with normal and t marginals)
multivariate_dist <- mvdc(copula = cp,
margins = c("norm","t"),
paramMargins = list(list(mean = 2, sd = 3),
list(df = 2)))
print(multivariate_dist)
從copula隨機(jī)抽樣
一旦生成了copula,就可以很容易的從中生成隨機(jī)數(shù).針對(duì)single copula用的是rCopula方法,針對(duì)多元分布用的是rMvdc方法,如下:
# Generate random samples
fr <- rCopula(2000,frank)
gu <- rCopula(2000,gumbel)
cl <- rCopula(2000,clayton)
# Plot the samples
p1 <- qplot(fr[,1], fr[,2], colour = fr[,1], main = "Frank copula random samples theta = 8", xlab = "u", ylab = "v")
p2 <- qplot(gu[,1], gu[,2], colour = gu[,1], main = "Gumbel copula random samples theta = 5.6", xlab = "u", ylab = "v")
p3 <- qplot(cl[,1], cl[,2], colour = cl[,1], main = "Clayton copula random samples theta = 19", xlab = "u", ylab = "v")
# Define grid layout to locate plots and print each graph^(1)
pushViewport(viewport(layout = grid.layout(1, 3)))
print(p1, vp = viewport(layout.pos.row = 1, layout.pos.col = 1))
print(p2, vp = viewport(layout.pos.row = 1, layout.pos.col = 2))
print(p3, vp = viewport(layout.pos.row = 1, layout.pos.col = 3))
samples <- rMvdc(2000, multivariate_dist)
scatterplot3d(samples[,1], samples[,2], color = "blue", pch = ".")
PDF和CDF
計(jì)算copula的PDF(概率密度函數(shù))和CDF(累積分布函數(shù))可能對(duì)將來(lái)比較有用.R包提供了計(jì)算密度和CDF的函數(shù),命名方式也保持了一致性:pCopula計(jì)算累積分布,dCopula計(jì)算密度.注意rCopula是用來(lái)抽樣的.針對(duì)多元分布的也是類似命名習(xí)慣,pMvdc,dMvdc,rMvdc.如下:
# Generate the normal copula and sample some observations
coef_ <- 0.8
mycopula <- normalCopula(coef_, dim = 2)
u <- rCopula(2000, mycopula)
# Compute the density
pdf_ <- dCopula(u, mycopula)
# Compute the CDF
cdf <- pCopula(u, mycopula)
# Generate random sample ovservations from the multivariate distributions
v <- rMvdc(2000, multivariate_dist)
# Compute the density
pdf_mvd <- dMvdc(v, multivariate_dist)
# Compute the CDF
cdf_mvd <- pMvdc(v, multivariate_dist)
現(xiàn)在你只需要plot就能顯示密度和CDF了.
圖像
當(dāng)需要作演示時(shí),圖像就很重要了.不過(guò),如果涉及到高維的情況,要作圖是很有挑戰(zhàn)性的.而copula經(jīng)常就要與高維相關(guān)性模型打交道.假設(shè)你現(xiàn)在的copula是二維的,你可以用不同方法對(duì)密度和累積分布函數(shù)作圖.比如你可以畫(huà)一個(gè)密度散點(diǎn)圖或輪廓圖.
copula包包含了一些有用的繪制輪廓圖的方法,可以作為傳統(tǒng)3d繪圖的很好替代.尤其是在mvdc控件和自定義邊緣分布的條件下,輪廓圖可以很好的反映密度函數(shù)的形狀.如下:
\par(mfrow = c(1, 3))
# 3D plain scatterplot of the density, plot of the density and contour plot
scatterplot3d(u[,1], u[,2], pdf_, color="red", main="Density", xlab ="u1", ylab="u2", zlab="dCopula", pch=".")
persp(mycopula, dCopula, main ="Density")
contour(mycopula, dCopula, xlim = c(0, 1), ylim=c(0, 1), main = "Contour plot")
par(mfrow = c(1, 3))
# 3D plain scatterplot of the CDF, plot of the CDF and contour plot
scatterplot3d(u[,1], u[,2], cdf, color="red", main="CDF", xlab = "u1", ylab="u2", zlab="pCopula",pch=".")
persp(mycopula, pCopula, main = "CDF")
contour(mycopula, pCopula, xlim = c(0, 1), ylim=c(0, 1), main = "Contour plot")
# 3D plain scatterplot of the multivariate distribution
par(mfrow = c(1, 2))
scatterplot3d(v[,1],v[,2], pdf_mvd, color="red", main="Density", xlab = "u1", ylab="u2", zlab="pMvdc",pch=".")
scatterplot3d(v[,1],v[,2], cdf_mvd, color="red", main="CDF", xlab = "u1", ylab="u2", zlab="pMvdc",pch=".")
persp(multivariate_dist, dMvdc, xlim = c(-4, 4), ylim=c(0, 2), main = "Density")
contour(multivariate_dist, dMvdc, xlim = c(-4, 4), ylim=c(0, 2), main = "Contour plot")
persp(multivariate_dist, pMvdc, xlim = c(-4, 4), ylim=c(0, 2), main = "CDF")
contour(multivariate_dist, pMvdc, xlim = c(-4, 4), ylim=c(0, 2), main = "Contour plot")
對(duì)多元分布的密度和CDF作圖可能更有趣.注意散點(diǎn)圖和密度圖可能會(huì)比較誤導(dǎo).這也是為什么很多時(shí)候我寧可看輪廓圖.
常見(jiàn)阿基米德copulas的密度和CDF的圖像比較
把Frank反惕,Gumbel和Clayton的密度拿來(lái)比較看看應(yīng)該很有趣尝艘。通過(guò)copula包內(nèi)的方法可以很容易做到這一點(diǎn)。注意姿染,我對(duì)參數(shù)的選擇是隨機(jī)的背亥。
frank <- frankCopula(dim = 2, param = 3)
clayton <- claytonCopula(dim = 2, param = 1.2)
gumbel <- gumbelCopula(dim = 2, param = 1.5)
par(mfrow = c(1, 3))
# Density plot
persp(frank, dCopula, main ="Frank copula density")
persp(clayton, dCopula, main ="Clayton copula density")
persp(gumbel, dCopula, main ="Gumbel copula density")
# Contour plot of the densities
contour(frank, dCopula, xlim = c(0, 1), ylim=c(0, 1), main = "Contour plot Frank")
contour(clayton, dCopula, xlim = c(0, 1), ylim=c(0, 1), main = "Contour plot Clayton")
contour(gumbel, dCopula, xlim = c(0, 1), ylim=c(0, 1), main = "Contour plot Gumbel")