R語言與統(tǒng)計-1:t檢驗與秩和檢驗
R語言與統(tǒng)計-2:方差分析
t檢驗和方差分析主要針對連續(xù)型變量笔咽,卡方檢驗主要針對分類變量弟翘。
1. 擬合優(yōu)度檢驗
擬合優(yōu)度檢驗是用卡方統(tǒng)計量進行統(tǒng)計顯著性檢驗的重要內(nèi)容之一。它是依據(jù)總體分布狀況侈百,計算出分類變量中各類別的期望頻數(shù)疹味,與分布的觀察頻數(shù)進行對比谎柄,判斷期望頻數(shù)與觀察頻數(shù)是否有顯著差異觉鼻,從而達到從分類變量進行分析的目的。
簡單來說棋凳,就是檢驗樣本數(shù)據(jù)分布是否與已知總體的分布是一致的
#生成數(shù)據(jù)集
men <- c(11,120,60,45)
women <- c(20,102,39,30)
df <- as.data.frame(rbind(men,women))
colnames(df) <- c('AB','O','A','B')
df
# AB O A B
# men 11 120 60 45
# women 20 102 39 30
-
chisel.test
函數(shù)
檢驗男性組中拦坠,四種血型的分布是否一樣
chisq.test(men)
# Chi-squared test for given
# probabilities
# data: men
# X-squared = 105.46, df = 3, p-value <
# 2.2e-16
##p值遠遠小于0.05,男性組中四種血型的分布不一樣
如若已知人群中四種血型的占比為0.1 0.5 0.2 0.2剩岳,看該組男性的血型分布是否與人群的一致贞滨。參數(shù)p:傳入已知總體的參數(shù)情況。
chisq.test(men,p=c(0.1,0.5,0.2,0.2))
# Chi-squared test for given
# probabilities
# data: men
# X-squared = 10.335, df = 3, p-value =
# 0.01592
##結(jié)果顯示不一致
2. 卡方齊性檢驗和卡方獨立性檢驗
兩者寫法一樣拍棕,解釋的方法不一樣晓铆。
卡方齊性檢驗:比較不同的分類水平下,各個類型的比例是否一致绰播。
chisq.test(df)
# Pearson's Chi-squared test
# data: df
# X-squared = 6.8607, df = 3, p-value =
# 0.07647
##男女不同血型的分布是一致的骄噪。即:血型的分布與性別無關(guān)。
卡方獨立性檢驗:
chisq.test(df)
# Pearson's Chi-squared test
# data: df
# X-squared = 6.8607, df = 3, p-value =
# 0.07647
##意思是行變量(性別)和列變量(血型)之前沒有關(guān)聯(lián)
3. CMH檢驗 分層檢驗 針對不同的分層數(shù)據(jù)來進行
對于行變量為無序分類蠢箩,列變量為有序分類的數(shù)據(jù)链蕊,由于不能忽略等級關(guān)系,也只能使用CMH檢驗忙芒,而不能使用皮爾森卡方檢驗示弓。
# 生成一個數(shù)據(jù)集
Rabbits <- array(c(0,0,6,5,
3,0,3,6,
6,2,1,0,
5,6,1,0,
2,5,0,0),
dim=c(2,2,5),
dimnames = list(
Delay=c('None','1.5h'),
Response=c('Cured','Died'),
Penicillin.level=c('1/8','1/4','1/2','1','4')))
Rabbits
# , , Penicillin.level = 1/8
# Response
# Delay Cured Died
# None 0 6
# 1.5h 0 5
# , , Penicillin.level = 1/4
# Response
# Delay Cured Died
# None 3 3
# 1.5h 0 6
# , , Penicillin.level = 1/2
# Response
# Delay Cured Died
# None 6 1
# 1.5h 2 0
# , , Penicillin.level = 1
# Response
# Delay Cured Died
# None 5 1
# 1.5h 6 0
# , , Penicillin.level = 4
# Response
# Delay Cured Died
# None 2 0
# 1.5h 5 0
使用CMH檢驗查看盤尼西林的水平和是否推遲注射對兔子的結(jié)局是否有影響。
mantelhaen.test()函數(shù)
mantelhaen.test(Rabbits)
# Mantel-Haenszel chi-squared test with
# continuity correction
# data: Rabbits
# Mantel-Haenszel X-squared = 0.074445, df = 1,
# p-value = 0.785
# alternative hypothesis: true common odds ratio is not equal to 1
# 95 percent confidence interval:
# 0.3111294 13.8643579
# sample estimates:
# common odds ratio
# 2.076923
p值>0.05呵萨,無統(tǒng)計學意義奏属。將盤尼西林分為5層水平后,立即注射和推遲1.5h注射的OR值是2.076923
4. 有序分類的卡方檢驗
mantelhaen.test()函數(shù)
Satisfaction <-
as.table(array(c(1,2,0,0,3,3,1,2,
11,17,8,4,2,3,5,2,
1,0,0,0,1,3,0,1,
2,5,7,9,1,1,3,6),
dim=c(4,4,2),
dimnames=list(Income=c('<5000','5000-15000','15000-25000','>25000'),
'Job Satisfaction'=c('V_D','L_S','M_S','V_S'),
Gender=c('Female','Male'))))
Satisfaction
# , , Gender = Female
# Job Satisfaction
# Income V_D L_S M_S V_S
# <5000 1 3 11 2
# 5000-15000 2 3 17 3
# 15000-25000 0 1 8 5
# >25000 0 2 4 2
#, , Gender = Male
# Job Satisfaction
# Income V_D L_S M_S V_S
# <5000 1 1 2 1
# 5000-15000 0 3 5 1
# 15000-25000 0 0 7 3
# >25000 0 1 9 6
income是一個有序分類變量潮峦。結(jié)果顯示工資水平對工作滿意度沒有顯著的統(tǒng)計學關(guān)系囱皿。
5. 配對四格表的卡方檢驗(常見的病例對照研究等)
paired <- as.table(matrix(c(157,24,69,18),nrow = 2,dimnames = list(case=c('A','B'),control=c('A','B'))))
paired
# control
# case A B
# A 157 69
# B 24 18
mcnemar.test()函數(shù)
mcnemar.test(paired)
# McNemar's Chi-squared test with continuity
# correction
# data: paired
# McNemar's chi-squared = 20.817, df = 1,
# p-value = 5.053e-06
###p<0.05, case和control之間存在相關(guān)性