機器學習-監(jiān)督學習模型總結(jié) V1.1

學習 Andrew Ng 吳恩達先生的《Machine Learning》站欺，以及臺灣國立大學林軒田先生的《機器學習基石》布疼、《機器學習技法》，先將課程中涉及的機器學習的監(jiān)督學習模型總結(jié)如下。

Classification

Classification 是指分類問題淌友。

PLA

定義

PLA = Perceptrons Learning Algorithm ，屬于 classification骇陈。一般說的 PLA 分為 Naive PLA 與 Pocket PLA震庭。其中，感知機（英語：Perceptron）是一種二元線性分類器你雌。

適用條件

二元線性分類器联。

如何使用

比較與拓展說明

Naive PLA算法的思想很簡單。一直修正權(quán)重向量 W婿崭，直到向量 W 滿足所有數(shù)據(jù)為止拨拓。Naive PLA的一大問題就是如果數(shù)據(jù)有雜音，不能完美的分類的話氓栈，算法就不會中止渣磷。所以，對于有雜音的數(shù)據(jù)授瘦，我們只能期望找到錯誤最少的結(jié)果醋界。然后這是一個 NP Hard 問題祟身。

Pocket PLA 一個貪心的近似算法，和 Naive PLA 算法類似物独。變順序迭代為隨機迭代袜硫，如果找出錯誤，則修正結(jié)果挡篓。在修正過程中婉陷，記錄犯錯誤最少的向量。

Regression

Regression 與 Classification 的比較：
Classification trees have dependent variables that are categorical and unordered. Regression trees have dependent variables that are continuous values or ordered whole values. Regression means to predict the output value using training data. Classification means to group the output into a class.

When it comes to how to figure out which is a classification problem and which is a regression problem, an easy way to think about it is to ask yourself if you are trying to predict which class (or category) something belongs to or are you trying to predict a value.

Predicting a class is classification (ham/spam, image of a cat/not an image of a cat, etc...)Predicting a value (a number) is regression. (Housing prices, tomorrows temperature, etc...) Classification can.be built on top of regression.

Linear Regression

定義

Linear Regression.png

In statistics, linear regression is an approach for modeling the relationship between a scalar[標量的] dependent variable y and one or more explanatory variables (or independent variables) denoted X. The case of one explanatory variable is called simple linear regression.

適用條件

Training Set 中的數(shù)據(jù)是線性分布的官研，且輸出的預計量也為數(shù)字秽澳。

如何使用

Logistic Regression

定義

Logistic Regression.png

Logistic regression is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary). Like all regression analyses, the logistic regression is a predictive analysis. Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.

適用條件

輸出的預計量也為分出的類別。

如何使用

比較與拓展說明

linear regression 與 logistic regression 的區(qū)別：
In linear regression, the outcome (dependent variable) is continuous. It can have any one of an infinite number of possible values. In logistic regression, the outcome (dependent variable) has only a limited number of possible values. Logistic Regression is used when response variable is categorical in nature.

Generative Learning algorithms

Consider a classification problem in which we want to learn to distinguish between elephants (y = 1) and dogs (y = 0), based on some features of an animal. Given a training set, an algorithm like logistic regression or the perceptron algorithm (basically) tries to find a straight line—that is, a decision boundary—that separates the elephants and dogs. Then, to classify a new animal as either an elephant or a dog, it checks on which side of the decision boundary it falls, and makes its prediction accordingly.

Gaussian Discriminant Analysis model（GDA)

定義

GDA.png

GDA, is a method for data classification commonly used when data can be approximated with a Normal distribution. You will need a training set, i.e. a bunch of data yet classified. These data are used to train your classifier, and obtain a discriminant function that will tell you to which class a data has higher probability to belong.

適用條件

Training data can be approximated with a Normal distribution.

如何使用

比較與拓展說明

GDA 與 Logistic Regression 的區(qū)別：
高斯判別算法(strong assumption)與logistic收斂(week assumption)戏羽〉Ｉ瘢可參見 Andrew NG Notes2, Page 6 of 14.

回歸模型是判別模型，也就是根據(jù)特征值來求結(jié)果的概率始花。比如說要確定一只羊是山羊還是綿羊妄讯，用判別模型的方法是先從歷史數(shù)據(jù)中學習到模型，然后通過提取這只羊的特征來預測出這只羊是山羊的概率酷宵，是綿羊的概率亥贸。換一種思路，我們可以根據(jù)山羊的特征首先學習出一個山羊模型浇垦，然后根據(jù)綿羊的特征學習出一個綿羊模型炕置。然后從這只羊中提取特征，放到山羊模型中看概率是多少男韧，再放到綿羊模型中看概率是多少朴摊，哪個大就是哪個。

Naive Bayes 樸素貝葉斯

定義

Naive Bayes.png

It is a classification technique based on Bayes' Theorem with an assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.

適用條件

Training set data xi are discrete-valued.

如何使用

假設(shè)關(guān)鍵詞與關(guān)鍵詞沒有關(guān)聯(lián)此虑，常用于垃圾郵件分類等一類的分類問題甚纲。

比較與拓展說明

GDA VS Bayes
In GDA, the feature vectors x were continuous, real-valued vectors. Lets now talk about a different learning algorithm in which the xi’s are discrete-valued
Logistic regression VS Naive Bayes
Logistic Regression comes under the category of a Discriminative classifier, which models the posterior P(class|x) directly from the data, or learn a direct map from inputs x to the class labels.
Whereas, Discriminant Analysis is a Generative classifier that learns a model of the joint probability P(x,class) and makes their predictions by Bayes' rule

Support Vector Machine (SVM)

SVM is a supervised machine learning algorithm which can be used for classification or regression problems. It uses a technique called the kernel trick to transform your data and then based on these transformations it finds an optimal boundary between the possible outputs. Simply put, it does some extremely complex data transformations, then figures out how to seperate your data based on the labels or outputs you've defined.

when it comes to computing the SVM classifier, there are three approaches: primal, dual and kernel.

Linear SVM

Margin: If the training data are linearly separable, we can select two parallel hyperplanes that separate the two classes of data, so that the distance between them is as large as possible. The region bounded by these two hyperplanes is called the "margin", and the maximum-margin hyperplane is the hyperplane that lies halfway between them.

Hard and soft margin:……

Non-linear SVM

The idea is to gain linearly separation by mapping the data to a higher dimensional space.

AdaBoost（Adaptive Boosting）

定義

參見林軒田 Chapter 7 - 8

AdaBoost, short for "Adaptive Boosting, is a machine learning algorithm. It can be used in conjunction with many other types of learning algorithms to improve their performance. **The output of the other learning algorithms ('weak learners') is combined into a weighted sum that represents the final output of the boosted classifier. **AdaBoost is adaptive in the sense that **subsequent weak learners are tweaked in favor of those instances misclassified by previous classifiers. **

AdaBoost is sensitive to noisy data and outliers. In some problems it can be less susceptible to the overfitting problem than other learning algorithms. The individual learners can be weak, but as long as the performance of each one is slightly better than random guessing (e.g., their error rate is smaller than 0.5 for binary classification), the final model can be proven to converge to a strong learner.

使用方法

推導思路與過程

adaboost is actually something like aggregation.
uniform blending or linear blending -> Bagging（Bootstrap Aggregation: resampling from D given）-> boosting(Focus on key examples(wrong predictions)) -> re-weighting different g -> adaptive boosting algorithm(Scale up incorrect -> dif hypothesis)

解釋說明：
blending：aggregate after getting gt
learning：aggregate as well as getting gt

（Bootstrap Aggregation）：用同一份資料得到不同的 g
Bootstrapping - resampling from D given, re-sample N examples form D uniformly with replacement(有放回的取出一筆又一筆的資料)

AdaBoosting:
U = 開根號(e/(1-e)): 錯誤越大，對形成 G 越重要寡壮，則權(quán)重比 U 越大贩疙。

Decision Tree

定義

Decision tree learning uses a decision tree as a predictive model observations about an item (represented in the branches) to conclusions about the item's target value (represented in the leaves).

Random Forest

See more on 林軒田機器學習技法 Chapter 10.

定義

Random forests or random decision forests are an ensemble learning method for classification, regression and other tasks, that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random decision forests correct for decision trees' habit of overfitting to their training set.

Random Forest = bagging + decision tree.

使用方法

推導思路與過程

Out-of-bag (OOB) error
also called out-of-bag estimate, is a method of measuring the prediction error of random forests, boosted decision trees, and other machine learning models utilizing bootstrap aggregating to sub-sample data samples used for training. Eoob is self-validation of bagging/RF.

OutOfBag.png

Feature Selection

Permutation 方法 是將某個 Feature 下的數(shù)據(jù)亂序排列讹弯，再將這個 Feature 下的亂序數(shù)據(jù)和其他 Feature 下的原始數(shù)據(jù)重新組合起來况既，看該 Feature 數(shù)據(jù)亂序之后知否對整體產(chǎn)生重大影響。如果是组民，則該 Feature 很重要棒仍。如下圖：

Permutation.png

事實上如下圖，對于 RF臭胜，feature selection 要通過 permutation + OOB莫其。

image.png

Gradient boosted decision tree

定義

Gradient boosting is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees.

如何使用

推導過程

Gradient boosted decision tree.png

image.png

Gradient Boosted Decision Tree - GBDT
整體思路：sn 是根據(jù)數(shù)據(jù) xn 和 gt 預測出來的值癞尚，yn 是真實值，yn-sn 是殘差乱陡。我們會用切割后的數(shù)據(jù)集 x浇揩、殘差 y-s 作為新的數(shù)據(jù)集，使用新的 gt （仍是 DecisionTree）做新的數(shù)據(jù)切割和預測憨颠，一直到殘差無限接近于 0胳徽，即預測值和真實值非常接近。

A 是我們未知的一個 regression 算法爽彤，采用的是 squared error 方法养盗，然后決定采用 C&RT decision tree 做我們的 gt∈矢荩可以簡單理解為 A = gt = C&RT往核。
第一步將數(shù)據(jù)切一刀之后，at 是根據(jù)切分后的這部分數(shù)據(jù)做出的單變量 linear regression 的斜率嚷节，體現(xiàn)了我們 regression 的算法聂儒。此時的 gt(xn) 是采用 decision tree gt(x) 切后的那一部分數(shù)據(jù)。yn - sn
s (score) = s + at*gt(xn)硫痰，其中此時的 s 是根據(jù) linear regression 和 X 做出的預測值薄货。

將該預測值和真實的 yn 的求差值。

GBDT.png

如下內(nèi)容本文暫不涉及 neural network

參考鏈接

文中的參考鏈接以鏈接形式已在原文標出碍论，其他參考鏈接或建議額外閱讀的鏈接列舉如下：

An Introduction to Gradient Descent and Linear Regression
Gradient Descent For Machine Learning
How to select kernel for SVM
An idiot's guide to Support vector machines(SVMs)

最后編輯于：2017.12.07 02:10:55

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末谅猾，一起剝皮案震驚了整個濱河市，隨后出現(xiàn)的幾起案子鳍悠，更是在濱河造成了極大的恐慌税娜，老刑警劉巖，帶你破解...
沈念sama閱讀 221,548評論 6贊 515
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件藏研，死亡現(xiàn)場離奇詭異敬矩，居然都是意外死亡，警方通過查閱死者的電腦和手機蠢挡，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 94,497評論 3贊 399
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進店門弧岳，熙熙樓的掌柜王于貴愁眉苦臉地迎上來，“玉大人业踏，你說我怎么就攤上這事禽炬。” “怎么了勤家？”我有些...
開封第一講書人閱讀 167,990評論 0贊 360
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵腹尖，是天一觀的道長。經(jīng)常有香客問我伐脖，道長热幔，這世上最難降的妖魔是什么乐设？我笑而不...
開封第一講書人閱讀 59,618評論 1贊 296
?港島之戀（遺憾婚禮）
正文為了忘掉前任，我火速辦了婚禮绎巨，結(jié)果婚禮上近尚，老公的妹妹穿的比我還像新娘。我一直安慰自己场勤，他們只是感情好肿男，可當我...
茶點故事閱讀 68,618評論 6贊 397
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布。她就那樣靜靜地躺著却嗡，像睡著了一般舶沛。火紅的嫁衣襯著肌膚如雪。梳的紋絲不亂的頭發(fā)上窗价，一...
開封第一講書人閱讀 52,246評論 1贊 308
城市分裂傳說
那天如庭，我揣著相機與錄音，去河邊找鬼撼港。笑死坪它，一個胖子當著我的面吹牛，可吹牛的內(nèi)容都是我干的帝牡。我是一名探鬼主播往毡，決...
沈念sama閱讀 40,819評論 3贊 421
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼，長吁一口氣：“原來是場噩夢啊……” “哼靶溜！你這毒婦竟也來了开瞭？” 一聲冷哼從身側(cè)響起，我...
開封第一講書人閱讀 39,725評論 0贊 276
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤罩息，失蹤者是張志新（化名）和其女友劉穎嗤详，沒想到半個月后，有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體瓷炮，經(jīng)...
沈念sama閱讀 46,268評論 1贊 320
?護林員之死
正文獨居荒郊野嶺守林人離奇死亡葱色，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點故事閱讀 38,356評論 3贊 340
?白月光啟示錄
正文我和宋清朗相戀三年，在試婚紗的時候發(fā)現(xiàn)自己被綠了娘香。大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片苍狰。...
茶點故事閱讀 40,488評論 1贊 352
活死人
序言：一個原本活蹦亂跳的男人離奇死亡，死狀恐怖烘绽，靈堂內(nèi)的尸體忽然破棺而出淋昭，到底是詐尸還是另有隱情，我是刑警寧澤诀姚，帶...
沈念sama閱讀 36,181評論 5贊 350
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布响牛，位于F島的核電站，受9級特大地震影響赫段，放射性物質(zhì)發(fā)生泄漏呀打。R本人自食惡果不足惜，卻給世界環(huán)境...
茶點故事閱讀 41,862評論 3贊 333
男人毒藥：我在死后第九天來索命
文/蒙蒙一糯笙、第九天我趴在偏房一處隱蔽的房頂上張望贬丛。院中可真熱鬧，春花似錦给涕、人聲如沸豺憔。這莊子的主人今日做“春日...
開封第一講書人閱讀 32,331評論 0贊 24
一樁弒父案够庙，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽恭应。三九已至，卻和暖如春耘眨，著一層夾襖步出監(jiān)牢的瞬間昼榛，已是汗流浹背。一陣腳步聲響...
開封第一講書人閱讀 33,445評論 1贊 272
情欲美人皮
我被黑心中介騙來泰國打工剔难，沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留胆屿，地道東北人。一個月前我還...
沈念sama閱讀 48,897評論 3贊 376
代替公主和親
正文我出身青樓偶宫，卻偏偏與公主長得像非迹，于是被迫代替她去往敵國和親。傳聞我的和親對象是個殘疾皇子纯趋，可洞房花燭夜當晚...
茶點故事閱讀 45,500評論 2贊 359

機器學習-監(jiān)督學習模型總結(jié) V1.1

Classification

PLA

定義

適用條件

如何使用

比較與拓展說明

Regression

Linear Regression

定義

適用條件

如何使用

Logistic Regression

定義

適用條件

如何使用

比較與拓展說明

Generative Learning algorithms

Gaussian Discriminant Analysis model（GDA)

定義

適用條件

如何使用

比較與拓展說明

Naive Bayes 樸素貝葉斯

定義

適用條件

如何使用

比較與拓展說明

Support Vector Machine (SVM)

Linear SVM

Non-linear SVM

AdaBoost（Adaptive Boosting）

定義

使用方法

推導思路與過程

Decision Tree

定義

Random Forest

定義

使用方法

推導思路與過程

Gradient boosted decision tree

定義

如何使用

推導過程

參考鏈接

推薦閱讀更多精彩內(nèi)容