1庐舟、機器學習入門

一、什么是機器學習聊浅？

機器學習是人工智能（AI）的一部分餐抢，研究如何讓計算機從數(shù)據(jù)學習某種規(guī)律现使；
機器學習并不是人工智能的全部，也不等同于人工智能弹澎；
人工智能 > 機器學習 > 深度學習朴下；

人工智能范疇.png

備注：手工創(chuàng)建的規(guī)則，屬于AI苦蒿，不屬于ML殴胧；

二、機器學習 VS 數(shù)據(jù)挖掘 VS 大數(shù)據(jù)

機器學習 VS 大數(shù)據(jù).png

三佩迟、理解機器學習

通過計算機程序根據(jù)數(shù)據(jù)去優(yōu)化一個評價指標团滥；
自動的從數(shù)據(jù)發(fā)現(xiàn)規(guī)律，使用這些規(guī)律作出預(yù)測报强；
根據(jù)過去預(yù)測未來灸姊。

機器學習別名：

推理/估計：統(tǒng)計學
模式識別

四、機器學習家族

監(jiān)督式學習 -- 訓練數(shù)據(jù)包括輸入和預(yù)期的輸出
- 分類
  - 垃圾郵件/短信檢測
  - 自動車牌號識別
  - 人臉識別
  - 手寫字符識別
  - 語音識別
  - 醫(yī)療圖片的病癥診斷
  - ......
- 回歸
  - 二手車股價
  - 股票價格預(yù)測
  - 氣溫預(yù)測
  - 自動駕駛
  - ......
非監(jiān)督式學習 -- 訓練數(shù)據(jù)只有輸入秉溉，沒有預(yù)期的輸出
- 聚類: 把對象分成不同的子集(subset)力惯，使得屬于同一個子集中的成員對象都有相似的一些屬性

五、機器學習流程

機器學習流程.png

數(shù)據(jù)獲取
數(shù)據(jù)清洗
特征工程
預(yù)處理
- 特征提取
- 處理缺失數(shù)據(jù)
- 數(shù)據(jù)定標
  - 歸一化
  - 標準化
- 數(shù)據(jù)轉(zhuǎn)換
選擇機器學習模型
模型訓練 <==> 模型調(diào)參
模型部署

六召嘶、線性回歸示例 1

Question:

你所在的公司在電視上做產(chǎn)品廣告, 收集到了電視廣告投入x(以百萬為單位)與產(chǎn)品銷售量y(以億為單位)的數(shù)據(jù). 你作為公司的數(shù)據(jù)科學家, 希望通過分析這些數(shù)據(jù), 了解電視廣告投入x(以百萬為單位)與產(chǎn)品銷售量y的關(guān)系.

假設(shè)x與y的之間的關(guān)系是線性的, 也就是說 y = ax + b. 通過線性回歸(Linear Regression), 我們就可以得知 a 和 b 的值. 于是我們在未來做規(guī)劃的時候, 通過電視廣告投入x, 就可以預(yù)測產(chǎn)品銷售量y, 從而可以提前做好生產(chǎn)和物流, 倉儲的規(guī)劃. 為客戶提供更好的服務(wù).

data:
    TV  sales
0   230.1   22.1
1   44.5    10.4
2   17.2    9.3
3   151.5   18.5
4   180.8   12.9
5   8.7 7.2
6   57.5    11.8
7   120.2   13.2
8   8.6 4.8
9   199.8   10.6
10  66.1    8.6
... ... ...

Script Demo:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

加載數(shù)據(jù)

# 加載數(shù)據(jù)
data = pd.read_csv('data/Advertising.csv')
# 查看部分數(shù)據(jù)
data.head()
# 查看列索引
data.columns
# 查看行索引
data.index

# 通過數(shù)據(jù)可視化分析數(shù)據(jù)
plt.figure(figsize=(16, 8))
plt.scatter(data['TV'], data['sales'], c='blue')
plt.xlabel('Money spent on TV ads (million)')
plt.ylabel('Sales (hundred million)')
plt.show()

散點圖：

數(shù)據(jù)可視化.png

訓練線性回歸數(shù)據(jù)

# 訓練線性回歸數(shù)據(jù)
X = data['TV'].values.reshape(-1, 1)
y = data['sales'].values.reshape(-1, 1)

reg = LinearRegression()
reg.fit(X, y)

打印線性模型

reg.coef_[0][0]
reg.intercept_[0]
print('a = {:.5}'.format(reg.coef_[0][0]))
print('b = {:.5}'.format(reg.intercept_[0]))
print('線性模型：y = {:.5}X + {:.5}'.format(reg.coef_[0][0], reg.intercept_[0]))

打印結(jié)果:
>>> a = 0.047537
>>> b = 7.0326
>>> 線性模型：y = 0.047537X + 7.0326

可視化訓練好的線性回歸模型

# 可視化訓練好的線性回歸模型
predictions = reg.predict(X)
plt.figure(figsize=(16, 8))
plt.scatter(data['TV'], data['sales'], c='black')
plt.plot(data['TV'], predictions, c='blue', linewidth=2)
plt.xlabel('Money spent on TV ads (million)')
plt.ylabel('Sales (hundred million)')
plt.show()

散點圖：

模型可視化.png

預(yù)測

test = [[100], [200], [300]]
predictions = reg.predict(test)
for investment, prediction in zip(test, predictions):
    print('>>> 投入{:.2}億元父晶，預(yù)計銷售量：{:.5}'.format(investment[0]/100.0, prediction[0]))

打印結(jié)果:
>>> 投入1.0億元，預(yù)計銷售量：11.786
>>> 投入2.0億元弄跌，預(yù)計銷售量：16.54
>>> 投入3.0億元甲喝，預(yù)計銷售量：21.294

七、線性回歸示例 2

Question:

氣溫會隨著海拔高度的升高而降低, 我們可以通過測量不同海拔高度的氣溫來預(yù)測海拔高度和氣溫的關(guān)系. 我們假設(shè)海拔高度和氣溫的關(guān)系可以使用如下公式表達: y(氣溫) = a * x(海拔高度) + b

理論上來講, 確定以上公式 a 和 b的值只需在兩個不同高度測試, 就可以算出來 a 和 b 的值了. 但是由于所有的設(shè)備都是有誤差的, 而使用更多的高度測試的值可以使得預(yù)測的值更加準確. 我們提供了在9個不同高度測量的氣溫值, 請你根據(jù)今天學習的線性回歸方法預(yù)測 a 和 b 的值. 根據(jù)這個公式, 我們預(yù)測一下在8000米的海拔, 氣溫會是多少?

data:
height  temperature
0   0.0 12.834044
1   500.0   10.190649
2   1000.0  5.500229
3   1500.0  2.854665
4   2000.0  -0.706488
5   2500.0  -4.065323
6   3000.0  -7.127480
7   3500.0  -10.058879
8   4000.0  -13.206465

Script Demo:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

加載數(shù)據(jù)

# 加載數(shù)據(jù)
data = pd.read_csv('exercise/height.vs.temperature.csv')
# 查看部分數(shù)據(jù)
data.head()
# 查看列索引
data.columns
# 查看行索引
data.index

可視化分析數(shù)據(jù)

plt.figure(figsize=(16, 8))
plt.scatter(data['height'], data['temperature'], c='blue')
plt.xlabel('height (m)')
plt.ylabel('temperature (℃)')
plt.show()

散點圖：

數(shù)據(jù)可視化.png

訓練線性回歸模型

X = data['height'].values.reshape(-1, 1)
y = data['temperature'].values.reshape(-1, 1)
reg = LinearRegression()
reg.fit(X, y)

打印線性模型

reg.coef_[0][0]
reg.intercept_[0]
print('>>> a = {:.6}'.format(reg.coef_[0][0]))
print('>>> b = {:.6}'.format(reg.intercept_[0]))
print('>>> 線性模型：y = {:.6}X + {:.6}'.format(reg.coef_[0][0], reg.intercept_[0]))

打印結(jié)果:
>>> a = -0.00656953
>>> b = 12.7185
>>> 線性模型：y = -0.00656953X + 12.7185

可視化線性回歸模型

predictions = reg.predict(X)
plt.figure(figsize=(16, 8))
plt.scatter(data['height'], data['temperature'], c='black')
plt.plot(data['height'], predictions, c='blue', linewidth=2)
plt.show()

散點圖：

模型可視化.png

預(yù)測

test = [[8000], [9000], [10000]]
predictions = reg.predict(test)
for height, temperature in zip(test, predictions):
    print(height)
    print(temperature)
    print('>>> 海拔{}m铛只，氣溫為：{:.6}℃'.format(height[0], temperature[0]))

打印結(jié)果:
>>> 海拔8000m埠胖，氣溫為：-39.8378℃
>>> 海拔9000m，氣溫為：-46.4073℃
>>> 海拔10000m淳玩，氣溫為：-52.9768℃

1舀患、機器學習入門

1徽级、機器學習入門

一、什么是機器學習聊浅？

二、機器學習 VS 數(shù)據(jù)挖掘 VS 大數(shù)據(jù)

三佩迟、理解機器學習

四、機器學習家族

五、機器學習流程

六召嘶、線性回歸示例 1

七、線性回歸示例 2