【(R)梯度Boosting算法】《Learn Gradient Boosting Algorithm for better predictions (with codes in R)》

【(R)梯度Boosting算法】《Learn Gradient Boosting Algorithm for better predictions (with codes in R)》by Tavish SrivastavaO網(wǎng)頁鏈接

http://www.analyticsvidhya.com/blog/2015/09/complete-guide-boosting-methods/

Introduction

The accuracy of a predictive model can be boosted in two ways: Either by embracingfeature engineeringor by applying boosting algorithms straight away. Having participated in lots of data science competition, I’ve noticed that people prefer to work with boosting algorithms as it takes less time and produces similar results.

There are multiple boosting algorithms like Gradient Boosting, XGBoost, AdaBoost, Gentle Boost etc. Every algorithm has its own underlying mathematics and a slight variation is observed while applying them. If you are new to this, Great! You shall be learning all these concepts in a week’s time from now.

In this article, I’ve explained the underlying concepts and complexities of Gradient Boosting Algorithm. In addition, I’ve also shared?an example to learn its implementation in R.

Note: This guide is meant for beginners. Hence, if you’ve already mastered this concept, you may skip this article here.

Quick?Explanation

While working with?boosting algorithms, you’ll soon come across two frequently occurring buzzwords: Bagging and Boosting. So, how are they different? Here’s a one line explanation:

Bagging:It?is an approach where you take random samples of data, build learning algorithms and take simple means to find bagging probabilities.

Boosting:Boosting is similar, however the selection of sample is made more intelligently.?We subsequently give more and more weight to hard to classify observations.

Okay! I understand you’ve questions sprouting up?like ‘what do you mean by hard? How do I know how much additional weight am I supposed to give to a mis-classified observation.’ I shall answer all your questions?in subsequent sections.?Keep Calm and proceed.

Let’s begin?with an?easy example

Assume, you are given a previous model M to improve on. Currently you observe that the model has an accuracy of 80% (any metric). How do you go further about it?

One simple?way is to build an entirely different model using new set of input variables and trying better ensemble learners. On the contrary, I have a much simpler way to suggest. It goes like this:

Y?= M(x) + error

What if I am able to see that error is not a white noise but have same correlation with outcome(Y) value. What if we can develop a model on this error term? Like,

error = G(x) + error2

Probably, you’ll see error rate will improve to a higher number, say 84%. Let’s take another step and regress against error2.

error2 = H(x) + error3

Now we combine all these together :

Y = M(x) + G(x) + H(x) + error3

This probably will have a accuracy of even more than 84%. What if I can find an optimal weights for each of the three learners,

Y = alpha * M(x) + beta * G(x) + gamma * H(x) + error4

If we found good weights, we probably have made even a better model. This is the underlying principle of a boosting learner. When I read the theory for the first time, I had two quick questions:

Do we really see non white noise error in regression/classification equations? If not, how can we even use this algorithm?

Wow, if this is possible, why not get near 100% accuracy?

I’ll?answer these questions in this article, however, in a crisp manner. Boosting is generally done on weak learners, which do not have a capacity to leave behind white noise. ?Secondly, boosting can lead to overfitting, so we need to stop at the right point.

Let’s try to visualize one Classification Problem

Look at the below diagram :

We start with the first box. We see one vertical line which becomes our first week learner. Now in total we have 3/10 mis-classified observations. We now start giving higher weights to 3 plus mis-classified observations. Now, it becomes very important to classify them right. Hence, the vertical line towards right edge. We repeat this process and then combine each of the learner in appropriate weights.

Explaining?underlying mathematics

How do we assign weight to observations?

We always start with a uniform distribution assumption. Lets call it as D1 which is 1/n for all n observations.

Step 1 . We assume an alpha(t)

Step 2:?Get a weak classifier h(t)

Step 3: Update the population distribution for the next step

where,

Step 4 : Use the new population distribution to again find the next learner

Scared of Step 3 mathematics? Let me break it down for you. Simply look at the argument in exponent. Alpha is kind of learning rate, y is the actual response ( + 1 or -1) and h(x) will be the class predicted by learner. Essentially, if learner is going wrong, the exponent becomes 1*alpha and else -1*alpha. Essentially, the weight will probably increase if the prediction went wrong the last time. So, what’s next?

Step 5 : Iterate step 1 – step 4 until no hypothesis is found which can improve further.

Step 6 : Take a weighted average of the frontier using all the learners used till now. But what are the weights? Weights are simply the alpha values. Alpha is calculated as follows:

Time to Practice – Example

I?recently participated?in anonline hackathonorganized by Analytics Vidhya. For making the variable transformation easier, I combined both test and train data in the file complete_data. I started with basic import function and splitted the population in Devlopment, ITV and Scoring.

library(caret)

rm(list=ls())

setwd("C:\\Users\\ts93856\\Desktop\\AV")

library(Metrics)

complete <- read.csv("complete_data.csv", stringsAsFactors = TRUE)

train <- complete[complete$Train == 1,]

score <- complete[complete$Train != 1,]

set.seed(999)

ind <- sample(2, nrow(train), replace=T, prob=c(0.60,0.40))

trainData<-train[ind==1,]

testData <- train[ind==2,]

set.seed(999)

ind1 <- sample(2, nrow(testData), replace=T, prob=c(0.50,0.50))

trainData_ens1<-testData[ind1==1,]

testData_ens1 <- testData[ind1==2,]

table(testData_ens1$Disbursed)[2]/nrow(testData_ens1)

#Response Rate of 9.052%

Here is all you need to do, to build a GBM model.

fitControl <- trainControl(method = "repeatedcv", number = 4, repeats = 4)

trainData$outcome1 <- ifelse(trainData$Disbursed == 1, "Yes","No")

set.seed(33)

gbmFit1 <- train(as.factor(outcome1) ~ ., data = trainData[,-26], method = "gbm", trControl = fitControl,verbose = FALSE)

gbm_dev <- predict(gbmFit1, trainData,type= "prob")[,2]

gbm_ITV1 <- predict(gbmFit1, trainData_ens1,type= "prob")[,2]

gbm_ITV2 <- predict(gbmFit1, testData_ens1,type= "prob")[,2]

auc(trainData$Disbursed,gbm_dev)

auc(trainData_ens1$Disbursed,gbm_ITV1)

auc(testData_ens1$Disbursed,gbm_ITV2)

As you will see after running this code, all AUC will come extremely close to 0.84 . I will leave the feature engineering upto you, as the competition is still on. You are welcome to use this code to compete though. GBM is the most widely used algorithm. XGBoost is another faster version of boosting learner which I will cover in any future articles.

End Notes

I have seen boosting learners extremely quick and highly efficient. They have never disappointed me to get high initial scores on Kaggle and other platforms. However, it all boils down to how well can you do feature engineering.

Have you used Gradient Boosting before? How did the model perform? Have you used boosting learners in any other capacity. If yes, I would love to?hear your experiences in the comments section below.

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末迅皇,一起剝皮案震驚了整個濱河市,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌,老刑警劉巖,帶你破解...
    沈念sama閱讀 211,884評論 6 492
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件卖擅,死亡現(xiàn)場離奇詭異,居然都是意外死亡,警方通過查閱死者的電腦和手機腔丧,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 90,347評論 3 385
  • 文/潘曉璐 我一進店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來作烟,“玉大人愉粤,你說我怎么就攤上這事∧昧茫” “怎么了衣厘?”我有些...
    開封第一講書人閱讀 157,435評論 0 348
  • 文/不壞的土叔 我叫張陵,是天一觀的道長。 經(jīng)常有香客問我影暴,道長错邦,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 56,509評論 1 284
  • 正文 為了忘掉前任型宙,我火速辦了婚禮撬呢,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘妆兑。我一直安慰自己魂拦,他們只是感情好,可當我...
    茶點故事閱讀 65,611評論 6 386
  • 文/花漫 我一把揭開白布搁嗓。 她就那樣靜靜地躺著芯勘,像睡著了一般。 火紅的嫁衣襯著肌膚如雪谱姓。 梳的紋絲不亂的頭發(fā)上借尿,一...
    開封第一講書人閱讀 49,837評論 1 290
  • 那天,我揣著相機與錄音屉来,去河邊找鬼路翻。 笑死,一個胖子當著我的面吹牛茄靠,可吹牛的內(nèi)容都是我干的茂契。 我是一名探鬼主播,決...
    沈念sama閱讀 38,987評論 3 408
  • 文/蒼蘭香墨 我猛地睜開眼慨绳,長吁一口氣:“原來是場噩夢啊……” “哼掉冶!你這毒婦竟也來了?” 一聲冷哼從身側(cè)響起脐雪,我...
    開封第一講書人閱讀 37,730評論 0 267
  • 序言:老撾萬榮一對情侶失蹤厌小,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后战秋,有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體璧亚,經(jīng)...
    沈念sama閱讀 44,194評論 1 303
  • 正文 獨居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 36,525評論 2 327
  • 正文 我和宋清朗相戀三年脂信,在試婚紗的時候發(fā)現(xiàn)自己被綠了癣蟋。 大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點故事閱讀 38,664評論 1 340
  • 序言:一個原本活蹦亂跳的男人離奇死亡狰闪,死狀恐怖疯搅,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情埋泵,我是刑警寧澤幔欧,帶...
    沈念sama閱讀 34,334評論 4 330
  • 正文 年R本政府宣布,位于F島的核電站,受9級特大地震影響琐馆,放射性物質(zhì)發(fā)生泄漏规阀。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點故事閱讀 39,944評論 3 313
  • 文/蒙蒙 一瘦麸、第九天 我趴在偏房一處隱蔽的房頂上張望谁撼。 院中可真熱鬧,春花似錦滋饲、人聲如沸厉碟。這莊子的主人今日做“春日...
    開封第一講書人閱讀 30,764評論 0 21
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽箍鼓。三九已至,卻和暖如春呵曹,著一層夾襖步出監(jiān)牢的瞬間款咖,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 31,997評論 1 266
  • 我被黑心中介騙來泰國打工奄喂, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留铐殃,地道東北人。 一個月前我還...
    沈念sama閱讀 46,389評論 2 360
  • 正文 我出身青樓跨新,卻偏偏與公主長得像富腊,于是被迫代替她去往敵國和親。 傳聞我的和親對象是個殘疾皇子域帐,可洞房花燭夜當晚...
    茶點故事閱讀 43,554評論 2 349

推薦閱讀更多精彩內(nèi)容