學習 Andrew Ng 吳恩達先生的《Machine Learning》站欺,以及臺灣國立大學林軒田先生的《機器學習基石》布疼、《機器學習技法》,先將課程中涉及的機器學習的監(jiān)督學習模型總結(jié)如下。
Classification
Classification 是指分類問題淌友。
PLA
定義
PLA = Perceptrons Learning Algorithm ,屬于 classification骇陈。一般說的 PLA 分為 Naive PLA 與 Pocket PLA震庭。其中,感知機(英語:Perceptron)是一種二元線性分類器你雌。
適用條件
二元線性分類器联。
如何使用
比較與拓展說明
Naive PLA算法的思想很簡單。一直修正權(quán)重向量 W婿崭,直到向量 W 滿足所有數(shù)據(jù)為止拨拓。Naive PLA的一大問題就是如果數(shù)據(jù)有雜音,不能完美的分類的話氓栈,算法就不會中止渣磷。所以,對于有雜音的數(shù)據(jù)授瘦,我們只能期望找到錯誤最少的結(jié)果醋界。然后這是一個 NP Hard 問題祟身。
Pocket PLA 一個貪心的近似算法,和 Naive PLA 算法類似物独。變順序迭代為隨機迭代袜硫,如果找出錯誤,則修正結(jié)果挡篓。在修正過程中婉陷,記錄犯錯誤最少的向量。
Regression
Regression 與 Classification 的比較:
Classification trees have dependent variables that are categorical and unordered. Regression trees have dependent variables that are continuous values or ordered whole values. Regression means to predict the output value using training data. Classification means to group the output into a class.
When it comes to how to figure out which is a classification problem and which is a regression problem, an easy way to think about it is to ask yourself if you are trying to predict which class (or category) something belongs to or are you trying to predict a value.
Predicting a class is classification (ham/spam, image of a cat/not an image of a cat, etc...)Predicting a value (a number) is regression. (Housing prices, tomorrows temperature, etc...) Classification can.be built on top of regression.
Linear Regression
定義
In statistics, linear regression is an approach for modeling the relationship between a scalar[標量的] dependent variable y and one or more explanatory variables (or independent variables) denoted X. The case of one explanatory variable is called simple linear regression.
適用條件
Training Set 中的數(shù)據(jù)是線性分布的官研,且輸出的預計量也為數(shù)字秽澳。
如何使用
Logistic Regression
定義
Logistic regression is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary). Like all regression analyses, the logistic regression is a predictive analysis. Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.
適用條件
輸出的預計量也為分出的類別。
如何使用
比較與拓展說明
linear regression 與 logistic regression 的區(qū)別:
In linear regression, the outcome (dependent variable) is continuous. It can have any one of an infinite number of possible values. In logistic regression, the outcome (dependent variable) has only a limited number of possible values. Logistic Regression is used when response variable is categorical in nature.
Generative Learning algorithms
Consider a classification problem in which we want to learn to distinguish between elephants (y = 1) and dogs (y = 0), based on some features of an animal. Given a training set, an algorithm like logistic regression or the perceptron algorithm (basically) tries to find a straight line—that is, a decision boundary—that separates the elephants and dogs. Then, to classify a new animal as either an elephant or a dog, it checks on which side of the decision boundary it falls, and makes its prediction accordingly.
Gaussian Discriminant Analysis model(GDA)
定義
GDA, is a method for data classification commonly used when data can be approximated with a Normal distribution. You will need a training set, i.e. a bunch of data yet classified. These data are used to train your classifier, and obtain a discriminant function that will tell you to which class a data has higher probability to belong.
適用條件
Training data can be approximated with a Normal distribution.
如何使用
比較與拓展說明
GDA 與 Logistic Regression 的區(qū)別:
高斯判別算法(strong assumption)與logistic收斂(week assumption)戏羽〉I瘢可參見 Andrew NG Notes2, Page 6 of 14.
回歸模型是判別模型,也就是根據(jù)特征值來求結(jié)果的概率始花。比如說要確定一只羊是山羊還是綿羊妄讯,用判別模型的方法是先從歷史數(shù)據(jù)中學習到模型,然后通過提取這只羊的特征來預測出這只羊是山羊的概率酷宵,是綿羊的概率亥贸。換一種思路,我們可以根據(jù)山羊的特征首先學習出一個山羊模型浇垦,然后根據(jù)綿羊的特征學習出一個綿羊模型炕置。然后從這只羊中提取特征,放到山羊模型中看概率是多少男韧,再放到綿羊模型中看概率是多少朴摊,哪個大就是哪個。
Naive Bayes 樸素貝葉斯
定義
It is a classification technique based on Bayes' Theorem with an assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.
適用條件
Training set data xi are discrete-valued.
如何使用
假設(shè)關(guān)鍵詞與關(guān)鍵詞沒有關(guān)聯(lián)此虑,常用于垃圾郵件分類等一類的分類問題甚纲。
比較與拓展說明
GDA VS Bayes
In GDA, the feature vectors x were continuous, real-valued vectors. Lets now talk about a different learning algorithm in which the xi’s are discrete-valuedLogistic regression VS Naive Bayes
Logistic Regression comes under the category of a Discriminative classifier, which models the posterior P(class|x) directly from the data, or learn a direct map from inputs x to the class labels.
Whereas, Discriminant Analysis is a Generative classifier that learns a model of the joint probability P(x,class) and makes their predictions by Bayes' rule
Support Vector Machine (SVM)
SVM is a supervised machine learning algorithm which can be used for classification or regression problems. It uses a technique called the kernel trick to transform your data and then based on these transformations it finds an optimal boundary between the possible outputs. Simply put, it does some extremely complex data transformations, then figures out how to seperate your data based on the labels or outputs you've defined.
when it comes to computing the SVM classifier, there are three approaches: primal, dual and kernel.
Linear SVM
Margin: If the training data are linearly separable, we can select two parallel hyperplanes that separate the two classes of data, so that the distance between them is as large as possible. The region bounded by these two hyperplanes is called the "margin", and the maximum-margin hyperplane is the hyperplane that lies halfway between them.
Hard and soft margin:……
Non-linear SVM
The idea is to gain linearly separation by mapping the data to a higher dimensional space.
AdaBoost(Adaptive Boosting)
定義
參見林軒田 Chapter 7 - 8
AdaBoost, short for "Adaptive Boosting, is a machine learning algorithm. It can be used in conjunction with many other types of learning algorithms to improve their performance. **The output of the other learning algorithms ('weak learners') is combined into a weighted sum that represents the final output of the boosted classifier. **AdaBoost is adaptive in the sense that **subsequent weak learners are tweaked in favor of those instances misclassified by previous classifiers. **
AdaBoost is sensitive to noisy data and outliers. In some problems it can be less susceptible to the overfitting problem than other learning algorithms. The individual learners can be weak, but as long as the performance of each one is slightly better than random guessing (e.g., their error rate is smaller than 0.5 for binary classification), the final model can be proven to converge to a strong learner.
使用方法
推導思路與過程
adaboost is actually something like aggregation.
uniform blending or linear blending -> Bagging(Bootstrap Aggregation: resampling from D given)-> boosting(Focus on key examples(wrong predictions)) -> re-weighting different g -> adaptive boosting algorithm(Scale up incorrect -> dif hypothesis)
解釋說明:
blending:aggregate after getting gt
learning:aggregate as well as getting gt
(Bootstrap Aggregation):用同一份資料得到不同的 g
Bootstrapping - resampling from D given, re-sample N examples form D uniformly with replacement(有放回的取出一筆又一筆的資料)
AdaBoosting:
U = 開根號(e/(1-e)): 錯誤越大,對形成 G 越重要寡壮,則權(quán)重比 U 越大贩疙。
Decision Tree
定義
Decision tree learning uses a decision tree as a predictive model observations about an item (represented in the branches) to conclusions about the item's target value (represented in the leaves).
Random Forest
See more on 林軒田機器學習技法 Chapter 10.
定義
Random forests or random decision forests are an ensemble learning method for classification, regression and other tasks, that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random decision forests correct for decision trees' habit of overfitting to their training set.
Random Forest = bagging + decision tree.
使用方法
推導思路與過程
Out-of-bag (OOB) error
also called out-of-bag estimate, is a method of measuring the prediction error of random forests, boosted decision trees, and other machine learning models utilizing bootstrap aggregating to sub-sample data samples used for training. Eoob is self-validation of bagging/RF.
Feature Selection
Permutation 方法 是將某個 Feature 下的數(shù)據(jù)亂序排列讹弯,再將這個 Feature 下的亂序數(shù)據(jù)和其他 Feature 下的原始數(shù)據(jù)重新組合起來况既,看該 Feature 數(shù)據(jù)亂序之后知否對整體產(chǎn)生重大影響。如果是组民,則該 Feature 很重要棒仍。如下圖:
事實上如下圖,對于 RF臭胜,feature selection 要通過 permutation + OOB莫其。
Gradient boosted decision tree
定義
Gradient boosting is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees.
如何使用
推導過程
Gradient Boosted Decision Tree - GBDT
整體思路:sn 是根據(jù)數(shù)據(jù) xn 和 gt 預測出來的值癞尚,yn 是真實值,yn-sn 是殘差乱陡。我們會用切割后的數(shù)據(jù)集 x浇揩、殘差 y-s 作為新的數(shù)據(jù)集,使用新的 gt (仍是 DecisionTree)做新的數(shù)據(jù)切割和預測憨颠,一直到殘差無限接近于 0胳徽,即預測值和真實值非常接近。
- A 是我們未知的一個 regression 算法爽彤,采用的是 squared error 方法养盗,然后決定采用 C&RT decision tree 做我們的 gt∈矢荩可以簡單理解為 A = gt = C&RT往核。
- 第一步將數(shù)據(jù)切一刀之后,at 是根據(jù)切分后的這部分數(shù)據(jù)做出的單變量 linear regression 的斜率嚷节,體現(xiàn)了我們 regression 的算法聂儒。此時的 gt(xn) 是采用 decision tree gt(x) 切后的那一部分數(shù)據(jù)。yn - sn
- s (score) = s + at*gt(xn)硫痰, 其中此時的 s 是根據(jù) linear regression 和 X 做出的預測值薄货。
將該預測值和真實的 yn 的求差值。
如下內(nèi)容本文暫不涉及 neural network
參考鏈接
文中的參考鏈接以鏈接形式已在原文標出碍论,其他參考鏈接或建議額外閱讀的鏈接列舉如下: