TaoYan
計(jì)算相關(guān)矩陣
R內(nèi)置函數(shù)cor()
可以用來計(jì)算相關(guān)系數(shù):cor(x, method = c("pearson", "kendall", "spearman"))
澜公,如果數(shù)據(jù)有缺失值弦讽,用cor(x, method = "pearson", use = "complete.obs")
。
導(dǎo)入數(shù)據(jù)
如果數(shù)據(jù)格式是txt示弓,用
my_data <- read.delim(file.choose())
csv則用
my_data <- read.csv(file.choose())
導(dǎo)入。
這里我們利用R內(nèi)置數(shù)據(jù)集mtcars呵萨。
data(mtcars)#加載數(shù)據(jù)集
mydata <- mtcars[, c(1,3,4,5,6,7)]
head(mydata, 6)#查看數(shù)據(jù)前6行
計(jì)算相關(guān)系數(shù)矩陣
res <- cor(mydata)
round(res, 2)#保留兩位小數(shù)
cor()
只能計(jì)算出相關(guān)系數(shù)奏属,無法給出顯著性水平p-value
,Hmisc
包里的rcorr()
函數(shù)能夠同時給出相關(guān)系數(shù)以及顯著性水平p-value
。rcorr(x, type = c(“pearson”,“spearman”))
潮峦。
The output of the function rcorr() is a list containing the following elements : - r : the correlation matrix - n : the matrix of the number of observations used in analyzing each pair of variables - P : the p-values corresponding to the significance levels of correlations.
library(Hmisc)#加載包
res2 <- rcorr(as.matrix(mydata))
res2
#可以用res2$r囱皿、res2$P來提取相關(guān)系數(shù)以及顯著性p-value
res2$r
res2$P
如何將相關(guān)系數(shù)以及顯著性水平p-value整合進(jìn)一個矩陣內(nèi)勇婴,可以自定義一個函數(shù)
flattenCorrMatrix
。
# ++++++++++++++++++++++++++++
# flattenCorrMatrix
# ++++++++++++++++++++++++++++
# cormat : matrix of the correlation coefficients
# pmat : matrix of the correlation p-values
flattenCorrMatrix <- function(cormat, pmat) {
ut <- upper.tri(cormat) data.frame( row = rownames(cormat)[row(cormat)[ut]],
column = rownames(cormat)[col(cormat)[ut]], cor =(cormat)[ut], p = pmat[ut] )
}
#舉個栗子
res3 <- rcorr(as.matrix(mtcars[,1:7]))
flattenCorrMatrix(res3$r, res3$P)
可視化相關(guān)系數(shù)矩陣
有不同的方法來可視化嘱腥,主要有下面四種:
- symnum() function
- corrplot() function to plot a correlogram
- scatter plots
- heatmap
symnum() function
主要用法:
symnum(x, cutpoints = c(0.3, 0.6, 0.8, 0.9, 0.95), symbols = c(" “,”.“,”,“,”+“,”*“,”B“),
abbr.colnames = TRUE) #很好理解耕渴,0-0.3用空格表示, 0.3-0.6用.表示齿兔, 以此類推橱脸。
舉個栗子
symnum(res, abbr.colnames = FALSE)#abbr.colnames用來控制列名
corrplot() function to plot a correlogram
這個函數(shù)來自于包c(diǎn)orrplot()
,通過顏色深淺來顯著相關(guān)程度。參數(shù)主要有:
- type: “upper”, “l(fā)ower”, “full”,顯示上三角還是下三角還是全部
- order:用什么方法分苇,這里是hclust
- tl.col (for text label color) and tl.srt (for text label string rotation) :控制文本顏色以及旋轉(zhuǎn)角度
library(corrplot)#先加載包
corrplot(res, type = "upper", order = "hclust", tl.col = "black", tl.srt = 45)
也可以結(jié)合顯著性繪制
# Insignificant correlations are leaved blank
corrplot(res2$r, type="upper", order="hclust", p.mat = res2$P, sig.level = 0.01, insig = "blank")
Use chart.Correlation(): Draw scatter plots
chart.Correlation()來自于包PerformanceAnalytics
library(PerformanceAnalytics)#加載包
chart.Correlation(mydata, histogram=TRUE, pch=19)
解釋一下上圖:
- 對角線上顯示的是分布圖
- 左下部顯示的是具有擬合線的雙變量散點(diǎn)圖
- 右上部顯示的是相關(guān)系數(shù)以及顯著性水平
heatmap()
col<- colorRampPalette(c("blue", "white", "red"))(20)#調(diào)用顏色版自定義顏色
heatmap(x = res, col = col, symm = TRUE)#symm表示是否對稱