機器學習-監(jiān)督學習模型總結(jié) V1.1

學習 Andrew Ng 吳恩達先生的《Machine Learning》站欺,以及臺灣國立大學林軒田先生的《機器學習基石》布疼、《機器學習技法》,先將課程中涉及的機器學習的監(jiān)督學習模型總結(jié)如下。


Classification 是指分類問題淌友。



PLA = Perceptrons Learning Algorithm ,屬于 classification骇陈。一般說的 PLA 分為 Naive PLA 與 Pocket PLA震庭。其中,感知機(英語:Perceptron)是一種二元線性分類器你雌。





Naive PLA算法的思想很簡單。一直修正權(quán)重向量 W婿崭,直到向量 W 滿足所有數(shù)據(jù)為止拨拓。Naive PLA的一大問題就是如果數(shù)據(jù)有雜音,不能完美的分類的話氓栈,算法就不會中止渣磷。所以,對于有雜音的數(shù)據(jù)授瘦,我們只能期望找到錯誤最少的結(jié)果醋界。然后這是一個 NP Hard 問題祟身。

Pocket PLA 一個貪心的近似算法,和 Naive PLA 算法類似物独。變順序迭代為隨機迭代袜硫,如果找出錯誤,則修正結(jié)果挡篓。在修正過程中婉陷,記錄犯錯誤最少的向量。


Regression 與 Classification 的比較:
Classification trees have dependent variables that are categorical and unordered. Regression trees have dependent variables that are continuous values or ordered whole values. Regression means to predict the output value using training data. Classification means to group the output into a class.

When it comes to how to figure out which is a classification problem and which is a regression problem, an easy way to think about it is to ask yourself if you are trying to predict which class (or category) something belongs to or are you trying to predict a value.

Predicting a class is classification (ham/spam, image of a cat/not an image of a cat, etc...)Predicting a value (a number) is regression. (Housing prices, tomorrows temperature, etc...) Classification can.be built on top of regression.

Linear Regression


Linear Regression.png

In statistics, linear regression is an approach for modeling the relationship between a scalar[標量的] dependent variable y and one or more explanatory variables (or independent variables) denoted X. The case of one explanatory variable is called simple linear regression.


Training Set 中的數(shù)據(jù)是線性分布的官研,且輸出的預計量也為數(shù)字秽澳。


Logistic Regression


Logistic Regression.png

Logistic regression is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary). Like all regression analyses, the logistic regression is a predictive analysis. Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.





linear regression 與 logistic regression 的區(qū)別:
In linear regression, the outcome (dependent variable) is continuous. It can have any one of an infinite number of possible values. In logistic regression, the outcome (dependent variable) has only a limited number of possible values. Logistic Regression is used when response variable is categorical in nature.

Generative Learning algorithms

Consider a classification problem in which we want to learn to distinguish between elephants (y = 1) and dogs (y = 0), based on some features of an animal. Given a training set, an algorithm like logistic regression or the perceptron algorithm (basically) tries to find a straight line—that is, a decision boundary—that separates the elephants and dogs. Then, to classify a new animal as either an elephant or a dog, it checks on which side of the decision boundary it falls, and makes its prediction accordingly.

Gaussian Discriminant Analysis model(GDA)



GDA, is a method for data classification commonly used when data can be approximated with a Normal distribution. You will need a training set, i.e. a bunch of data yet classified. These data are used to train your classifier, and obtain a discriminant function that will tell you to which class a data has higher probability to belong.


Training data can be approximated with a Normal distribution.



GDA 與 Logistic Regression 的區(qū)別:
高斯判別算法(strong assumption)與logistic收斂(week assumption)戏羽〉I瘢可參見 Andrew NG Notes2, Page 6 of 14.


Naive Bayes 樸素貝葉斯


Naive Bayes.png

It is a classification technique based on Bayes' Theorem with an assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.


Training set data xi are discrete-valued.




  1. GDA VS Bayes
    In GDA, the feature vectors x were continuous, real-valued vectors. Lets now talk about a different learning algorithm in which the xi’s are discrete-valued

  2. Logistic regression VS Naive Bayes
    Logistic Regression comes under the category of a Discriminative classifier, which models the posterior P(class|x) directly from the data, or learn a direct map from inputs x to the class labels.
    Whereas, Discriminant Analysis is a Generative classifier that learns a model of the joint probability P(x,class) and makes their predictions by Bayes' rule

Support Vector Machine (SVM)

SVM is a supervised machine learning algorithm which can be used for classification or regression problems. It uses a technique called the kernel trick to transform your data and then based on these transformations it finds an optimal boundary between the possible outputs. Simply put, it does some extremely complex data transformations, then figures out how to seperate your data based on the labels or outputs you've defined.

when it comes to computing the SVM classifier, there are three approaches: primal, dual and kernel.

Linear SVM

Margin: If the training data are linearly separable, we can select two parallel hyperplanes that separate the two classes of data, so that the distance between them is as large as possible. The region bounded by these two hyperplanes is called the "margin", and the maximum-margin hyperplane is the hyperplane that lies halfway between them.

Hard and soft margin:……

Non-linear SVM

The idea is to gain linearly separation by mapping the data to a higher dimensional space.

AdaBoost(Adaptive Boosting)


參見林軒田 Chapter 7 - 8

AdaBoost, short for "Adaptive Boosting, is a machine learning algorithm. It can be used in conjunction with many other types of learning algorithms to improve their performance. **The output of the other learning algorithms ('weak learners') is combined into a weighted sum that represents the final output of the boosted classifier. **AdaBoost is adaptive in the sense that **subsequent weak learners are tweaked in favor of those instances misclassified by previous classifiers. **

AdaBoost is sensitive to noisy data and outliers. In some problems it can be less susceptible to the overfitting problem than other learning algorithms. The individual learners can be weak, but as long as the performance of each one is slightly better than random guessing (e.g., their error rate is smaller than 0.5 for binary classification), the final model can be proven to converge to a strong learner.



adaboost is actually something like aggregation.
uniform blending or linear blending -> Bagging(Bootstrap Aggregation: resampling from D given)-> boosting(Focus on key examples(wrong predictions)) -> re-weighting different g -> adaptive boosting algorithm(Scale up incorrect -> dif hypothesis)

blending:aggregate after getting gt
learning:aggregate as well as getting gt

(Bootstrap Aggregation):用同一份資料得到不同的 g
Bootstrapping - resampling from D given, re-sample N examples form D uniformly with replacement(有放回的取出一筆又一筆的資料)

U = 開根號(e/(1-e)): 錯誤越大,對形成 G 越重要寡壮,則權(quán)重比 U 越大贩疙。

Decision Tree


Decision tree learning uses a decision tree as a predictive model observations about an item (represented in the branches) to conclusions about the item's target value (represented in the leaves).

Random Forest

See more on 林軒田機器學習技法 Chapter 10.


Random forests or random decision forests are an ensemble learning method for classification, regression and other tasks, that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random decision forests correct for decision trees' habit of overfitting to their training set.

Random Forest = bagging + decision tree.



Out-of-bag (OOB) error
also called out-of-bag estimate, is a method of measuring the prediction error of random forests, boosted decision trees, and other machine learning models utilizing bootstrap aggregating to sub-sample data samples used for training. Eoob is self-validation of bagging/RF.


Feature Selection

Permutation 方法 是將某個 Feature 下的數(shù)據(jù)亂序排列讹弯,再將這個 Feature 下的亂序數(shù)據(jù)和其他 Feature 下的原始數(shù)據(jù)重新組合起來况既,看該 Feature 數(shù)據(jù)亂序之后知否對整體產(chǎn)生重大影響。如果是组民,則該 Feature 很重要棒仍。如下圖:


事實上如下圖,對于 RF臭胜,feature selection 要通過 permutation + OOB莫其。


Gradient boosted decision tree


Gradient boosting is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees.



Gradient boosted decision tree.png

Gradient Boosted Decision Tree - GBDT
整體思路:sn 是根據(jù)數(shù)據(jù) xn 和 gt 預測出來的值癞尚,yn 是真實值,yn-sn 是殘差乱陡。我們會用切割后的數(shù)據(jù)集 x浇揩、殘差 y-s 作為新的數(shù)據(jù)集,使用新的 gt (仍是 DecisionTree)做新的數(shù)據(jù)切割和預測憨颠,一直到殘差無限接近于 0胳徽,即預測值和真實值非常接近。

  1. A 是我們未知的一個 regression 算法爽彤,采用的是 squared error 方法养盗,然后決定采用 C&RT decision tree 做我們的 gt∈矢荩可以簡單理解為 A = gt = C&RT往核。
  2. 第一步將數(shù)據(jù)切一刀之后,at 是根據(jù)切分后的這部分數(shù)據(jù)做出的單變量 linear regression 的斜率嚷节,體現(xiàn)了我們 regression 的算法聂儒。此時的 gt(xn) 是采用 decision tree gt(x) 切后的那一部分數(shù)據(jù)。yn - sn
  3. s (score) = s + at*gt(xn)硫痰, 其中此時的 s 是根據(jù) linear regression 和 X 做出的預測值薄货。

將該預測值和真實的 yn 的求差值。


如下內(nèi)容本文暫不涉及 neural network



  1. An Introduction to Gradient Descent and Linear Regression
  2. Gradient Descent For Machine Learning
  3. How to select kernel for SVM
  4. An idiot's guide to Support vector machines(SVMs)
  • 序言:七十年代末谅猾,一起剝皮案震驚了整個濱河市,隨后出現(xiàn)的幾起案子鳍悠,更是在濱河造成了極大的恐慌税娜,老刑警劉巖,帶你破解...
    沈念sama閱讀 221,548評論 6 515
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件藏研,死亡現(xiàn)場離奇詭異敬矩,居然都是意外死亡,警方通過查閱死者的電腦和手機蠢挡,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 94,497評論 3 399
  • 文/潘曉璐 我一進店門弧岳,熙熙樓的掌柜王于貴愁眉苦臉地迎上來,“玉大人业踏,你說我怎么就攤上這事禽炬。” “怎么了勤家?”我有些...
    開封第一講書人閱讀 167,990評論 0 360
  • 文/不壞的土叔 我叫張陵腹尖,是天一觀的道長。 經(jīng)常有香客問我伐脖,道長热幔,這世上最難降的妖魔是什么乐设? 我笑而不...
    開封第一講書人閱讀 59,618評論 1 296
  • 正文 為了忘掉前任,我火速辦了婚禮绎巨,結(jié)果婚禮上近尚,老公的妹妹穿的比我還像新娘。我一直安慰自己场勤,他們只是感情好肿男,可當我...
    茶點故事閱讀 68,618評論 6 397
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著却嗡,像睡著了一般舶沛。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上窗价,一...
    開封第一講書人閱讀 52,246評論 1 308
  • 那天如庭,我揣著相機與錄音,去河邊找鬼撼港。 笑死坪它,一個胖子當著我的面吹牛,可吹牛的內(nèi)容都是我干的帝牡。 我是一名探鬼主播往毡,決...
    沈念sama閱讀 40,819評論 3 421
  • 文/蒼蘭香墨 我猛地睜開眼,長吁一口氣:“原來是場噩夢啊……” “哼靶溜!你這毒婦竟也來了开瞭?” 一聲冷哼從身側(cè)響起,我...
    開封第一講書人閱讀 39,725評論 0 276
  • 序言:老撾萬榮一對情侶失蹤罩息,失蹤者是張志新(化名)和其女友劉穎嗤详,沒想到半個月后,有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體瓷炮,經(jīng)...
    沈念sama閱讀 46,268評論 1 320
  • 正文 獨居荒郊野嶺守林人離奇死亡葱色,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 38,356評論 3 340
  • 正文 我和宋清朗相戀三年,在試婚紗的時候發(fā)現(xiàn)自己被綠了娘香。 大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片苍狰。...
    茶點故事閱讀 40,488評論 1 352
  • 序言:一個原本活蹦亂跳的男人離奇死亡,死狀恐怖烘绽,靈堂內(nèi)的尸體忽然破棺而出淋昭,到底是詐尸還是另有隱情,我是刑警寧澤诀姚,帶...
    沈念sama閱讀 36,181評論 5 350
  • 正文 年R本政府宣布响牛,位于F島的核電站,受9級特大地震影響赫段,放射性物質(zhì)發(fā)生泄漏呀打。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點故事閱讀 41,862評論 3 333
  • 文/蒙蒙 一糯笙、第九天 我趴在偏房一處隱蔽的房頂上張望贬丛。 院中可真熱鬧,春花似錦给涕、人聲如沸豺憔。這莊子的主人今日做“春日...
    開封第一講書人閱讀 32,331評論 0 24
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽恭应。三九已至,卻和暖如春耘眨,著一層夾襖步出監(jiān)牢的瞬間昼榛,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 33,445評論 1 272
  • 我被黑心中介騙來泰國打工剔难, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留胆屿,地道東北人。 一個月前我還...
    沈念sama閱讀 48,897評論 3 376
  • 正文 我出身青樓偶宫,卻偏偏與公主長得像非迹,于是被迫代替她去往敵國和親。 傳聞我的和親對象是個殘疾皇子纯趋,可洞房花燭夜當晚...
    茶點故事閱讀 45,500評論 2 359


  • 據(jù)調(diào)查研究證實憎兽,如果孕婦長期玩手機或者睡前兩個小時玩手機等電子產(chǎn)品,會抑制褪黑素的分泌吵冒,導致睡眠不好唇兑,長此以往將會...
    愛的家庭閱讀 230評論 0 0
  • 內(nèi)存是計算機非常關(guān)鍵的部件之一,是暫時存儲程序以及數(shù)據(jù)的空間桦锄,CPU只有有限的寄存器可以用于 存儲計算數(shù)據(jù)扎附,而大部...
    dreamer_lk閱讀 1,210評論 2 10
  • 暮云平留夜,南山橫。一葉知秋片片成图甜,春夏又秋冬碍粥。 挽子玉,莫長空黑毅。飛絮流云和雁聲嚼摩,西樓風上風。
    愛羽扇綸巾閱讀 170評論 0 0