第2章 無序多分類和有序多分類

第2章

image.png

image.png
代碼案例

數(shù)據(jù)形式

image.png
#第 2章 代碼開始
> ## 利用 nnet 包中的函數(shù)multinom 翎卓,建立多元logistic回歸模型:
> library(foreign)
> library(nnet)
> library(ggplot2)
> library(reshape2)
> ml <- read.dta("hsbdemo.dta")#讀取.dta格式數(shù)據(jù)
> #View(ml)#查看數(shù)據(jù)的具體形式
> with(ml, table(ses, prog))#ses行和prog列的表格
        prog
ses      general academic vocation
  low         16       19       12
  middle      20       44       31
  high         9       42        7
> #with在數(shù)據(jù)框上的操作
> #do.call(操作泪掀,數(shù)據(jù))
> with(ml, do.call(rbind, tapply(write, prog, function(x) c(M = mean(x), SD = sd(x)))))
                M       SD
general  51.33333 9.397775
academic 56.25714 7.943343
vocation 46.76000 9.318754
> ml$prog2 <- relevel(ml$prog, ref = "academic")#以academic作為參考
> test <- multinom(prog2 ~ ses + write, data = ml)#回歸
# weights:  15 (8 variable)
initial  value 219.722458 
iter  10 value 179.982880
final  value 179.981726 
converged
> summary(test)
Call:
multinom(formula = prog2 ~ ses + write, data = ml)

Coefficients:
         (Intercept)  sesmiddle    seshigh      write
general     2.852198 -0.5332810 -1.1628226 -0.0579287
vocation    5.218260  0.2913859 -0.9826649 -0.1136037

#行g(shù)eneral 和vocation是指相對于academic的值徐鹤,列sesmiddle   seshigh是指相對于seslow的值

Std. Errors:
         (Intercept) sesmiddle   seshigh      write
general     1.166441 0.4437323 0.5142196 0.02141097
vocation    1.163552 0.4763739 0.5955665 0.02221996

Residual Deviance: 359.9635 
AIC: 375.9635 
> # 2-tailed z test
> z <- summary(test)$coefficients/summary(test)$standard.errors #Z值的計算
> z
         (Intercept)  sesmiddle   seshigh     write
general     2.445214 -1.2018081 -2.261334 -2.705562
vocation    4.484769  0.6116747 -1.649967 -5.112689
> p <- (1 - pnorm(abs(z), 0, 1)) * 2##P值的計算
> p
          (Intercept) sesmiddle    seshigh        write
general  0.0144766100 0.2294379 0.02373856 6.818902e-03
vocation 0.0000072993 0.5407530 0.09894976 3.176045e-07
#general 相對于academic,因素seshigh和write是有意義的
#vocation  相對于academic邀层,因素write是有意義的
> # extract the coefficients from the model and exponentiate
> exp(coef(test)) #OR值
         (Intercept) sesmiddle   seshigh     write
general     17.32582 0.5866769 0.3126026 0.9437172
vocation   184.61262 1.3382809 0.3743123 0.8926116
#解釋OR值返敬,write每提高一個單位,產(chǎn)生general是產(chǎn)生academic的0.9437172倍
#write每提高一個單位寥院,產(chǎn)生vocation是產(chǎn)生academic的0.8926116倍
#
> head(pp <- fitted(test))
   academic   general  vocation
1 0.1482764 0.3382454 0.5134781
2 0.1202017 0.1806283 0.6991700
3 0.4186747 0.2368082 0.3445171
4 0.1726885 0.3508384 0.4764731
5 0.1001231 0.1689374 0.7309395
6 0.3533566 0.2377976 0.4088458

> dses <- data.frame(ses = c("low", "middle", "high"), write = mean(ml$write))
#這里把寫作write取均值劲赠,即,每個人的寫作貢獻(xiàn)固定了秸谢,只有社會經(jīng)濟(jì)地位的變化了
> predict(test, newdata = dses, "probs")#預(yù)測
   academic   general  vocation
1 0.4396845 0.3581917 0.2021238
2 0.4777488 0.2283353 0.2939159
3 0.7009007 0.1784939 0.1206054
> dwrite <- data.frame(ses = rep(c("low", "middle", "high"), each = 41), write = rep(c(30:70),3))
> # store the predicted probabilities for each value of ses and write
> pp.write <- cbind(dwrite, predict(test, newdata = dwrite, type = "probs", se = TRUE))
> # calculate the mean probabilities within each level of ses
> by(pp.write[, 3:5], pp.write$ses, colMeans)
pp.write$ses: high
 academic   general  vocation 
0.6164315 0.1808037 0.2027648 
-------------------------------------------------------- 
pp.write$ses: low
 academic   general  vocation 
0.3972977 0.3278174 0.2748849 
-------------------------------------------------------- 
pp.write$ses: middle
 academic   general  vocation 
0.4256198 0.2010864 0.3732938 
> #melt data set to long for ggplot2
> lpp <- melt(pp.write, id.vars = c("ses", "write"), value.name = "probability")
> head(lpp)  # view first few rows
  ses write variable probability
1 low    30 academic  0.09843588
2 low    31 academic  0.10716868
3 low    32 academic  0.11650390
4 low    33 academic  0.12645834
5 low    34 academic  0.13704576
6 low    35 academic  0.14827643
> # plot predicted probabilities across write values for each level of ses
> # facetted by program type
> ggplot(lpp, aes(x = write, y = probability, colour = ses)) + geom_line() + facet_grid(variable ~., scales = "free")



> install.packages("mlogit")
> library(Formula)
> library(maxLik)
> library(miscTools)
> library(mlogit)
> data("Fishing", package = "mlogit")
> Fish <- mlogit.data(Fishing,shape = "wide",choice = "mode")
> summary(mlogit(mode ~ 0|income, data = Fish))

Call:
mlogit(formula = mode ~ 0 | income, data = Fish, method = "nr")

Frequencies of alternatives:choice
  beach    boat charter    pier 
0.11337 0.35364 0.38240 0.15059 

nr method
4 iterations, 0h:0m:0s 
g'(-H)^-1g = 8.32E-07 
gradient close to zero 

Coefficients :
                       Estimate  Std. Error z-value  Pr(>|z|)    
(Intercept):boat     7.3892e-01  1.9673e-01  3.7560 0.0001727 ***
(Intercept):charter  1.3413e+00  1.9452e-01  6.8955 5.367e-12 ***
(Intercept):pier     8.1415e-01  2.2863e-01  3.5610 0.0003695 ***
income:boat          9.1906e-05  4.0664e-05  2.2602 0.0238116 *  
income:charter      -3.1640e-05  4.1846e-05 -0.7561 0.4495908    
income:pier         -1.4340e-04  5.3288e-05 -2.6911 0.0071223 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Log-Likelihood: -1477.2
McFadden R^2:  0.013736 
Likelihood ratio test : chisq = 41.145 (p.value = 6.0931e-09)

2.2章 有序分類logistic回歸 即 等級資料

有序分類logistic回歸

案例

有可能把等級資料轉(zhuǎn)換成二分類資料就轉(zhuǎn)換凛澎,否則只能當(dāng)作等級資料或多分類資料;


image.png

是否從學(xué)校畢業(yè):不可能估蹄,有可能塑煎,極有可能;父母是否有學(xué)位:1表示有元媚,0表示無轧叽;公立還是私立學(xué)校:1公里,0私立

## 程序包MASS提供polr()函數(shù)可以進(jìn)行ordered logit或probit回歸

require(foreign)
require(ggplot2)
require(MASS)
require(Hmisc)
require(reshape2)

dat <- read.dta("ologit.dta")
#View(dat)
head(dat)
## one at a time, table apply, pared, and public
lapply(dat[, c("apply", "pared", "public")], table)
## three way cross tabs (xtabs) and flatten the table
ftable(xtabs(~ public + apply + pared, data = dat))
summary(dat$gpa)
sd(dat$gpa)

ggplot(dat, aes(x = apply, y = gpa)) +
  geom_boxplot(size = .75) +
  geom_jitter(alpha = .5) +
  facet_grid(pared ~ public, margins = TRUE) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))

## fit ordered logit model and store results 'm'
m <- polr(apply ~ pared + public + gpa, data = dat, Hess=TRUE)
## view a summary of the model
summary(m)

## store table
(ctable <- coef(summary(m)))
## calculate and store p values
p <- pnorm(abs(ctable[, "t value"]), lower.tail = FALSE) * 2
## combined table
(ctable <- cbind(ctable, "p value" = p))

(ci <- confint(m)) # default method gives profiled CIs
confint.default(m) # CIs assuming normality
## odds ratios
exp(coef(m))
## OR and CI
exp(cbind(OR = coef(m), ci))


sf <- function(y) {
  c('Y>=1' = qlogis(mean(y >= 1)),
    'Y>=2' = qlogis(mean(y >= 2)),
    'Y>=3' = qlogis(mean(y >= 3)))
}
(s <- with(dat, summary(as.numeric(apply) ~ pared + public + gpa, fun=sf)))

glm(I(as.numeric(apply) >= 2) ~ pared, family="binomial", data = dat)

glm(I(as.numeric(apply) >= 3) ~ pared, family="binomial", data = dat)

s[, 4] <- s[, 4] - s[, 3]
s[, 3] <- s[, 3] - s[, 3]
s # print

plot(s, which=1:3, pch=1:3, xlab='logit', main=' ', xlim=range(s[,3:4]))

newdat <- data.frame(
  pared = rep(0:1, 200),
  public = rep(0:1, each = 200),
  gpa = rep(seq(from = 1.9, to = 4, length.out = 100), 4))
newdat <- cbind(newdat, predict(m, newdat, type = "probs"))

##show first few rows
head(newdat)

lnewdat <- melt(newdat, id.vars = c("pared", "public", "gpa"),
                variable.name = "Level", value.name="Probability")
## view first few rows
head(lnewdat)

ggplot(lnewdat, aes(x = gpa, y = Probability, colour = Level)) +
  geom_line() + facet_grid(pared ~ public, labeller="label_both")
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末刊棕,一起剝皮案震驚了整個濱河市,隨后出現(xiàn)的幾起案子待逞,更是在濱河造成了極大的恐慌甥角,老刑警劉巖,帶你破解...
    沈念sama閱讀 219,427評論 6 508
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件识樱,死亡現(xiàn)場離奇詭異嗤无,居然都是意外死亡,警方通過查閱死者的電腦和手機(jī)怜庸,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 93,551評論 3 395
  • 文/潘曉璐 我一進(jìn)店門当犯,熙熙樓的掌柜王于貴愁眉苦臉地迎上來,“玉大人割疾,你說我怎么就攤上這事嚎卫。” “怎么了宏榕?”我有些...
    開封第一講書人閱讀 165,747評論 0 356
  • 文/不壞的土叔 我叫張陵拓诸,是天一觀的道長。 經(jīng)常有香客問我麻昼,道長奠支,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 58,939評論 1 295
  • 正文 為了忘掉前任抚芦,我火速辦了婚禮倍谜,結(jié)果婚禮上迈螟,老公的妹妹穿的比我還像新娘。我一直安慰自己尔崔,他們只是感情好答毫,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,955評論 6 392
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著您旁,像睡著了一般烙常。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上鹤盒,一...
    開封第一講書人閱讀 51,737評論 1 305
  • 那天蚕脏,我揣著相機(jī)與錄音,去河邊找鬼侦锯。 笑死驼鞭,一個胖子當(dāng)著我的面吹牛,可吹牛的內(nèi)容都是我干的尺碰。 我是一名探鬼主播挣棕,決...
    沈念sama閱讀 40,448評論 3 420
  • 文/蒼蘭香墨 我猛地睜開眼,長吁一口氣:“原來是場噩夢啊……” “哼亲桥!你這毒婦竟也來了洛心?” 一聲冷哼從身側(cè)響起,我...
    開封第一講書人閱讀 39,352評論 0 276
  • 序言:老撾萬榮一對情侶失蹤题篷,失蹤者是張志新(化名)和其女友劉穎词身,沒想到半個月后,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體番枚,經(jīng)...
    沈念sama閱讀 45,834評論 1 317
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡法严,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 37,992評論 3 338
  • 正文 我和宋清朗相戀三年,在試婚紗的時候發(fā)現(xiàn)自己被綠了葫笼。 大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片深啤。...
    茶點(diǎn)故事閱讀 40,133評論 1 351
  • 序言:一個原本活蹦亂跳的男人離奇死亡,死狀恐怖路星,靈堂內(nèi)的尸體忽然破棺而出溯街,到底是詐尸還是另有隱情,我是刑警寧澤奥额,帶...
    沈念sama閱讀 35,815評論 5 346
  • 正文 年R本政府宣布苫幢,位于F島的核電站,受9級特大地震影響垫挨,放射性物質(zhì)發(fā)生泄漏韩肝。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,477評論 3 331
  • 文/蒙蒙 一九榔、第九天 我趴在偏房一處隱蔽的房頂上張望哀峻。 院中可真熱鬧涡相,春花似錦、人聲如沸剩蟀。這莊子的主人今日做“春日...
    開封第一講書人閱讀 32,022評論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽育特。三九已至丙号,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間缰冤,已是汗流浹背犬缨。 一陣腳步聲響...
    開封第一講書人閱讀 33,147評論 1 272
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留棉浸,地道東北人怀薛。 一個月前我還...
    沈念sama閱讀 48,398評論 3 373
  • 正文 我出身青樓,卻偏偏與公主長得像迷郑,于是被迫代替她去往敵國和親枝恋。 傳聞我的和親對象是個殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 45,077評論 2 355