What is Machine Learning?
什么是機器學習?
Two definitions of Machine Learning are offered. Arthur Samuel described it as: "the field of study that gives computers the ability to learn without being explicitly programmed." This is an older, informal definition.
現(xiàn)在有兩種解釋婆廊,一種是Arthur Samuel的過時并且非官方的定義: “無需通過精確的編程而提供給計算機學習能力的一種研究領(lǐng)域”
Tom Mitchell provides a more modern definition: "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E."
Tom Mitchell 提供了一種更先進的定義"一種從任務T的經(jīng)驗E中改善性能P的程序鸟蜡。"
Example: playing checkers.
E = the experience of playing many games of checkers
T = the task of playing checkers.
P = the probability that the program will win the next game.
In general, any machine learning problem can be assigned to one of two broad classifications:
Supervised learning and Unsupervised learning.
通常來講担映,機器學習分為兩大類:監(jiān)督學習和無監(jiān)督學習
Supervised Learning
In supervised learning, we are given a data set and already know what our correct output should look like, having the idea that there is a relationship between the input and the output.
Supervised learning problems are categorized into "regression" and "classification" problems. In a regression problem, we are trying to predict results within a continuous output, meaning that we are trying to map input variables to some continuous function. In a classification problem, we are instead trying to predict results in a discrete output. In other words, we are trying to map input variables into discrete categories.
Example 1:
Given data about the size of houses on the real estate market, try to predict their price. Price as a function of size is a continuous output, so this is a regression problem.
We could turn this example into a classification problem by instead making our output about whether the house "sells for more or less than the asking price." Here we are classifying the houses based on price into two discrete categories.
Example 2:
(a) Regression - Given a picture of a person, we have to predict their age on the basis of the given picture
(b) Classification - Given a patient with a tumor, we have to predict whether the tumor is malignant or benign.
監(jiān)督學習(Supervised Learning),意指給定一個算法恤磷,需要部分數(shù)據(jù)集已經(jīng)有正確的答案。比如給定房價數(shù)據(jù)集今野。對于里面每個數(shù)據(jù)免钻,算法都知道對應的正確房價,即這房子實際賣出的價格汗侵。算法的結(jié)果就是算出更多正確的價格幸缕,比如那個新房子,你朋友想賣的那個晃择。用更術(shù)語的方式來定義冀值, 監(jiān)督學習又叫回歸(Regression)問題,(應該是回歸屬于監(jiān)督學習中的一種)宫屠,意指要預測一個連續(xù)值的輸出,比如房價滑蚯。再比如分類問題浪蹂。分類(Classification)是要根據(jù)1個或者多個特征(features)抵栈,預測一個離散值輸出,也是一種監(jiān)督學習坤次,指的是之前已經(jīng)了一部分正確的答案古劲,根據(jù)這個答案來學習從而預測新數(shù)據(jù)的結(jié)果。
有趣的學習算法能夠處理無窮多個特征缰猴。不是3個5個這么少产艾,要用到無數(shù)多個特征,非常多的屬性(Attributes)滑绒,那么闷堡,如何處理無限多的特征,甚至如何存儲無數(shù)的東西到你的電腦里而又要避免內(nèi)存空間的不足疑故,這就是一種叫做“支持向量機(Support Vector)的算法”的功勞了杠览。
總結(jié):監(jiān)督學習中,對于數(shù)據(jù)集中的每個數(shù)據(jù)纵势,都有相應的正確答案(訓練集)踱阿,算法就是基于這些來做出預測∏仗回歸和分類問題都是監(jiān)督學習的一種软舌,前者通過回歸來預測連續(xù)值輸出。后者是通過分類來預測離散值輸出牛曹。
Unsupervised Learning
Unsupervised learning allows us to approach problems with little or no idea what our results should look like. We can derive structure from data where we don't necessarily know the effect of the variables.
We can derive this structure by clustering the data based on relationships among the variables in the data.
With unsupervised learning there is no feedback based on the prediction results.
Example:
Clustering: Take a collection of 1,000,000 different genes, and find a way to automatically group these genes into groups that are somehow similar or related by different variables, such as lifespan, location, roles, and so on.
Non-clustering: The "Cocktail Party Algorithm", allows you to find structure in a chaotic environment. (i.e. identifying individual voices and music from a mesh of sounds at a cocktail party).
無監(jiān)督學習(Unsupervised Learning)佛点,在無監(jiān)督學習中,沒有屬性或者標簽這一概念躏仇,所有數(shù)據(jù)都是一樣的恋脚,沒有區(qū)別,它只是告訴我們焰手,“現(xiàn)在有一個數(shù)據(jù)集糟描,你能在其中找到某種結(jié)構(gòu)嗎?"例如聚類算法(Clustering algorithm),對于給定的數(shù)據(jù)集书妻,無監(jiān)督學習算法可能判定該數(shù)據(jù)集包含兩個不同的聚類船响。無監(jiān)督學習算法會把這些數(shù)據(jù)分成兩個不同的聚類,這是用了聚類算法.我們沒有給算法一個正確答案躲履,但是他卻能自己分好類见间,所以,這就是無監(jiān)督學習工猜。
Q: Is there a prerequisite for this course?
A: Students are expected to have the following background:
0 . 使用Octave的話米诉,會學的更快。
1 . 了解基本計算機理論并且能夠?qū)懸恍┎凰銖碗s的代碼
2 . 熟悉基本的概率論知識
3 . 熟悉基本的線性代數(shù)知識