AUC和ROC

AUC ：曲線下面積（Area Under the Curve）

AUROC ：接受者操作特征曲線下面積（Area Under the Receiver Operating Characteristic curve）

1. ROC曲線概述

ROC曲線是一種評(píng)價(jià)分類模型的可視化工具站削。ROC的圖形是橫縱坐標(biāo)限定在0-1范圍內(nèi)的曲線罩句，橫坐標(biāo)是假正率FPR（錯(cuò)誤的判斷為正確的概率），縱坐標(biāo)是真正率TPR（正確的判斷為正確的概率）行疏。通常我們認(rèn)為，曲線的凸起程度越高，模型性能越好，而曲線越接近于對(duì)角線斩熊，模型的準(zhǔn)確性越低。

2. AUC

AUC表示ROC曲線下方的面積蒸健，是對(duì)ROC曲線的量化座享。由于ROC曲線的橫縱坐標(biāo)都是0-1婉商，因此AUC是1x1方格中的一部分，其大小在0-1之間渣叛。

3. ROC曲線的繪制

3.1 基礎(chǔ)概念

預(yù)測概率和閾值：
分類模型的輸出結(jié)果中包含一個(gè)0-1的概率值丈秩，該概率值代表著對(duì)應(yīng)的樣本被預(yù)測為某類別的可能性。然后再通過閾值來進(jìn)行劃分淳衙，概率大于閾值的被判斷為正蘑秽，概率小于閾值的被判斷為負(fù)。
TPR和FPR：ROC曲線的橫坐標(biāo)為FPR箫攀，縱坐標(biāo)為TPR肠牲，F(xiàn)PR是錯(cuò)誤的預(yù)測為正的概率，TPR是錯(cuò)誤的預(yù)測為正的概率靴跛。

3.2 ROC曲線繪制步驟

將全部樣本按概率遞減排序
閾值從1至0變更缀雳，計(jì)算各閾值下對(duì)應(yīng)的（FPR，TPR）數(shù)值對(duì)梢睛。
將數(shù)值對(duì)繪于直角坐標(biāo)系中肥印。

4. ROC and AUC in R

# install.packages("pROC")
# install.packages("randomForest")
library(pROC) 
library(randomForest) #Random Forest is a way to classify samples and we can change the threshold that we use to make those decisions.
set.seed(420) # this will make my results match yours
num.samples <- 100
weight <- sort(rnorm(n=num.samples, mean=172, sd=29))
obese <- ifelse(test=(runif(n=num.samples) < (rank(weight)/num.samples)), 
                yes=1, no=0)
obese
plot(x=weight, y=obese)

## fit a logistic regression to the data...
glm.fit=glm(obese ~ weight, family=binomial)
lines(weight, glm.fit$fitted.values)

draw ROC and AUC using pROC

#######################################
##
## draw ROC and AUC using pROC
##
#######################################
## NOTE: By default, the graphs come out looking terrible
## The problem is that ROC graphs should be square, since the x and y axes
## both go from 0 to 1. However, the window in which I draw them isn't square
## so extra whitespace is added to pad the sides.
roc(obese, glm.fit$fitted.values, plot=TRUE)
## Now let's configure R so that it prints the graph as a square.
##
par(pty = "s") ## pty sets the aspect ratio of the plot region. Two options:
##                "s" - creates a square plotting region
##                "m" - (the default) creates a maximal plotting region
roc(obese, glm.fit$fitted.values, plot=TRUE)
## NOTE: By default, roc() uses specificity on the x-axis and the values range
## from 1 to 0. This makes the graph look like what we would expect, but the
## x-axis itself might induce a headache. To use 1-specificity (i.e. the 
## False Positive Rate) on the x-axis, set "legacy.axes" to TRUE.
roc(obese, glm.fit$fitted.values, plot=TRUE, legacy.axes=TRUE)
## If you want to rename the x and y axes...
roc(obese, glm.fit$fitted.values, plot=TRUE, legacy.axes=TRUE, percent=TRUE, xlab="False Positive Percentage", ylab="True Postive Percentage")
## We can also change the color of the ROC line, and make it wider...
roc(obese, glm.fit$fitted.values, plot=TRUE, legacy.axes=TRUE, percent=TRUE, xlab="False Positive Percentage", ylab="True Postive Percentage", col="#377eb8", lwd=4)
## If we want to find out the optimal threshold we can store the 
## data used to make the ROC graph in a variable...
roc.info <- roc(obese, glm.fit$fitted.values, legacy.axes=TRUE)
str(roc.info)
## and then extract just the information that we want from that variable.
roc.df <- data.frame(
  tpp=roc.info$sensitivities*100, ## tpp = true positive percentage
  fpp=(1 - roc.info$specificities)*100, ## fpp = false positive precentage
  thresholds=roc.info$thresholds)
head(roc.df) ## head() will show us the values for the upper right-hand corner
## of the ROC graph, when the threshold is so low 
## (negative infinity) that every single sample is called "obese".
## Thus TPP = 100% and FPP = 100%
tail(roc.df) ## tail() will show us the values for the lower left-hand corner
## of the ROC graph, when the threshold is so high (infinity) 
## that every single sample is called "not obese". 
## Thus, TPP = 0% and FPP = 0%
## now let's look at the thresholds between TPP 60% and 80%...
roc.df[roc.df$tpp > 60 & roc.df$tpp < 80,]
## We can calculate the area under the curve...
roc(obese, glm.fit$fitted.values, plot=TRUE, legacy.axes=TRUE, percent=TRUE, xlab="False Positive Percentage", ylab="True Postive Percentage", col="#377eb8", lwd=4, print.auc=TRUE)
## ...and the partial area under the curve.
roc(obese, glm.fit$fitted.values, plot=TRUE, legacy.axes=TRUE, percent=TRUE, xlab="False Positive Percentage", ylab="True Postive Percentage", col="#377eb8", lwd=4, print.auc=TRUE, print.auc.x=45, partial.auc=c(100, 90), auc.polygon = TRUE, auc.polygon.col = "#377eb822")
#######################################
##
## Now let's fit the data with a random forest...
##
#######################################
rf.model <- randomForest(factor(obese) ~ weight)
## ROC for random forest
roc(obese, rf.model$votes[,1], plot=TRUE, legacy.axes=TRUE, percent=TRUE, xlab="False Positive Percentage", ylab="True Postive Percentage", col="#4daf4a", lwd=4, print.auc=TRUE)
#######################################
##
## Now layer logistic regression and random forest ROC graphs..
##
#######################################
roc(obese, glm.fit$fitted.values, plot=TRUE, legacy.axes=TRUE, percent=TRUE, xlab="False Positive Percentage", ylab="True Postive Percentage", col="#377eb8", lwd=4, print.auc=TRUE)
plot.roc(obese, rf.model$votes[,1], percent=TRUE, col="#4daf4a", lwd=4, print.auc=TRUE, add=TRUE, print.auc.y=40)
legend("bottomright", legend=c("Logisitic Regression", "Random Forest"), col=c("#377eb8", "#4daf4a"), lwd=4)
#######################################
##
## Now that we're done with our ROC fun, let's reset the par() variables.
## There are two ways to do it...
##
#######################################
par(pty = "m")

參考：
https://www.bilibili.com/video/BV1SK4y1K7v3
https://www.youtube.com/watch?v=qcvAqAH60Yw

最后編輯于：2022.07.01 10:15:04

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者

禁止轉(zhuǎn)載，如需轉(zhuǎn)載請(qǐng)通過簡信或評(píng)論聯(lián)系作者绝葡。

人面猴
序言：七十年代末深碱，一起剝皮案震驚了整個(gè)濱河市，隨后出現(xiàn)的幾起案子藏畅，更是在濱河造成了極大的恐慌敷硅，老刑警劉巖，帶你破解...
沈念sama閱讀 206,839評(píng)論 6贊 482
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件愉阎，死亡現(xiàn)場離奇詭異绞蹦，居然都是意外死亡，警方通過查閱死者的電腦和手機(jī)诫硕，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 88,543評(píng)論 2贊 382
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門坦辟，熙熙樓的掌柜王于貴愁眉苦臉地迎上來刊侯，“玉大人章办，你說我怎么就攤上這事”醭梗” “怎么了藕届？”我有些...
開封第一講書人閱讀 153,116評(píng)論 0贊 344
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵，是天一觀的道長亭饵。經(jīng)常有香客問我休偶，道長，這世上最難降的妖魔是什么辜羊？我笑而不...
開封第一講書人閱讀 55,371評(píng)論 1贊 279
?港島之戀（遺憾婚禮）
正文為了忘掉前任踏兜，我火速辦了婚禮词顾，結(jié)果婚禮上，老公的妹妹穿的比我還像新娘碱妆。我一直安慰自己肉盹，他們只是感情好，可當(dāng)我...
茶點(diǎn)故事閱讀 64,384評(píng)論 5贊 374
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布疹尾。她就那樣靜靜地躺著上忍，像睡著了一般。火紅的嫁衣襯著肌膚如雪纳本。梳的紋絲不亂的頭發(fā)上窍蓝，一...
開封第一講書人閱讀 49,111評(píng)論 1贊 285
城市分裂傳說
那天，我揣著相機(jī)與錄音繁成，去河邊找鬼吓笙。笑死，一個(gè)胖子當(dāng)著我的面吹牛巾腕，可吹牛的內(nèi)容都是我干的观蓄。我是一名探鬼主播，決...
沈念sama閱讀 38,416評(píng)論 3贊 400
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼祠墅，長吁一口氣：“原來是場噩夢(mèng)啊……” “哼侮穿！你這毒婦竟也來了？” 一聲冷哼從身側(cè)響起毁嗦，我...
開封第一講書人閱讀 37,053評(píng)論 0贊 259
萬榮殺人案實(shí)錄
序言：老撾萬榮一對(duì)情侶失蹤亲茅，失蹤者是張志新（化名）和其女友劉穎，沒想到半個(gè)月后狗准，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體克锣，經(jīng)...
沈念sama閱讀 43,558評(píng)論 1贊 300
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 36,007評(píng)論 2贊 325
?白月光啟示錄
正文我和宋清朗相戀三年腔长，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了袭祟。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
茶點(diǎn)故事閱讀 38,117評(píng)論 1贊 334
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡捞附，死狀恐怖巾乳，靈堂內(nèi)的尸體忽然破棺而出，到底是詐尸還是另有隱情鸟召，我是刑警寧澤胆绊，帶...
沈念sama閱讀 33,756評(píng)論 4贊 324
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布，位于F島的核電站欧募，受9級(jí)特大地震影響压状，放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜跟继，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 39,324評(píng)論 3贊 307
男人毒藥：我在死后第九天來索命
文/蒙蒙一种冬、第九天我趴在偏房一處隱蔽的房頂上張望镣丑。院中可真熱鬧，春花似錦娱两、人聲如沸传轰。這莊子的主人今日做“春日...
開封第一講書人閱讀 30,315評(píng)論 0贊 19
一樁弒父案谷婆，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽慨蛙。三九已至，卻和暖如春纪挎，著一層夾襖步出監(jiān)牢的瞬間期贫，已是汗流浹背。一陣腳步聲響...
開封第一講書人閱讀 31,539評(píng)論 1贊 262
情欲美人皮
我被黑心中介騙來泰國打工异袄，沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留通砍，地道東北人。一個(gè)月前我還...
沈念sama閱讀 45,578評(píng)論 2贊 355
代替公主和親
正文我出身青樓烤蜕，卻偏偏與公主長得像封孙，于是被迫代替她去往敵國和親。傳聞我的和親對(duì)象是個(gè)殘疾皇子讽营，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 42,877評(píng)論 2贊 345