本文我參加Udacity的深度學(xué)習(xí)基石課程的學(xué)習(xí)的第3周總結(jié)规脸,主題是在學(xué)習(xí) TensorFlow 之前细办,先自己做一個(gè)miniflow雪隧,通過本周的學(xué)習(xí)这溅,對于TensorFlow有了個(gè)簡單的認(rèn)識(shí)属瓣,github上的項(xiàng)目是:https://github.com/zhuanxuhit/nd101 载迄,歡迎關(guān)注的。
我們知道創(chuàng)建一個(gè)神經(jīng)網(wǎng)絡(luò)的一般步驟是:
- normalization
- learning hyperparameters
- initializing weights
- forward propagation
- caculate error
- backpropagation
而上面步驟在TensorFlow中實(shí)現(xiàn)的時(shí)候抡蛙,一般我們的步驟是:
- Define the graph of nodes and edges.
- Propagate(傳播) values through the graph.
接著在我們實(shí)現(xiàn)miniflow的時(shí)候护昧,我們會(huì)先來定義node和graph,然后再來實(shí)現(xiàn) forward propagation 和 backpropagation
1. node
我們先來看node的概念粗截,看個(gè)簡單的神經(jīng)網(wǎng)絡(luò):
上面的神經(jīng)網(wǎng)絡(luò)就是一個(gè)大的網(wǎng)絡(luò)惋耙,每個(gè)node都有輸入和輸出,每個(gè)node根據(jù)輸入都會(huì)計(jì)算出輸出熊昌,因此我們先來定義node:
class Node(object):
def __init__(self, inbound_nodes=[]):
self.inbound_nodes = inbound_nodes
self.outbound_nodes = []
for n in self.inbound_nodes:
n.outbound_nodes.append(self)
self.value = None
有了最簡單的node绽榛,下一步就是來實(shí)現(xiàn) forward propagation。
Forward propagation
為了計(jì)算一個(gè)node婿屹,需要知道它的輸入灭美,而輸入又依賴于其他節(jié)點(diǎn)的輸出,這種為了計(jì)算當(dāng)前節(jié)點(diǎn)而求其所有前置節(jié)點(diǎn)的技術(shù)叫拓?fù)渑判騮opological sort
用圖來表示就如下圖:
上面為了計(jì)算最后的Node F昂利,我們給出了一個(gè)可行的計(jì)算順序届腐,我們此處直接給出一個(gè)算法:Kahn's Algorithm,代碼如下:
def topological_sort(feed_dict):
input_nodes = [n for n in feed_dict.keys()]
G = {}
nodes = [n for n in input_nodes]
while len(nodes) > 0:
n = nodes.pop(0)
if n not in G:
G[n] = {'in': set(), 'out': set()}
for m in n.outbound_nodes:
if m not in G:
G[m] = {'in': set(), 'out': set()}
G[n]['out'].add(m)
G[m]['in'].add(n)
nodes.append(m)
L = []
S = set(input_nodes)
while len(S) > 0:
n = S.pop()
if isinstance(n, Input):
n.value = feed_dict[n]
L.append(n)
for m in n.outbound_nodes:
G[n]['out'].remove(m)
G[m]['in'].remove(n)
# if no other incoming edges add to S
if len(G[m]['in']) == 0:
S.add(m)
return L
def forward_pass(output_node, sorted_nodes):
for n in sorted_nodes:
n.forward()
return output_node.value
下面我們來實(shí)現(xiàn)一些簡單的Node類型铁坎,第一個(gè)是Input類型:
class Input(Node):
def __init__(self):
Node.__init__(self)
def forward(self, value=None):
if value is not None:
self.value = value
下面是Mul類型:
class Mul(Node):
def __init__(self, *inputs):
Node.__init__(self, inputs)
def forward(self):
sum = 1.0
for n in self.inbound_nodes:
sum *= n.value
self.value = sum
具體的用法如下:
x, y, z = Input(), Input(), Input()
f = Mul(x, y, z)
feed_dict = {x: 4, y: 5, z: 10}
graph = topological_sort(feed_dict)
output = forward_pass(f, graph)
# should output 19
print("{} * {} * {} = {} (according to miniflow)".format(feed_dict[x], feed_dict[y], feed_dict[z], output))
4 * 5 * 10 = 200.0 (according to miniflow)
下面我們來實(shí)現(xiàn)下稍微復(fù)雜點(diǎn)的Node類型:Linear Node
class Linear(Node):
def __init__(self, inputs, weights, bias):
Node.__init__(self, [inputs, weights, bias])
def forward(self):
inputs = self.inbound_nodes[0].value
weights = self.inbound_nodes[1].value
bias = self.inbound_nodes[2].value
sum = 0
for i in range(len(inputs)):
sum += inputs[i] * weights[i]
self.value = sum + bias
有了LinearNode,我們就可以進(jìn)行下面的計(jì)算了:
inputs, weights, bias = Input(), Input(), Input()
f = Linear(inputs, weights, bias)
feed_dict = {
inputs: [6, 20, 4],
weights: [0.5, 0.25, 1.5],
bias: 2
}
graph = topological_sort(feed_dict)
output = forward_pass(f, graph)
print(output)
16.0
有了LinearNode犁苏,我們還可以再定義sigmoidNode厢呵。
class Sigmoid(Node):
def __init__(self, node):
Node.__init__(self, [node])
def _sigmoid(self, x):
return 1. / (1. + np.exp(-x))
def forward(self):
input_value = self.inbound_nodes[0].value
self.value = self._sigmoid(input_value)
定義完node,我們下一步就是來看怎么定義輸出好壞的標(biāo)準(zhǔn)了傀顾。
2. 定義cost函數(shù)
我們在訓(xùn)練神經(jīng)網(wǎng)絡(luò)的時(shí)候,需要有個(gè)目標(biāo)碌奉,就是盡可能的讓輸出準(zhǔn)確短曾,怎么衡量呢赐劣?我們可以通過均方誤差 (MSE)來衡量嫉拐,這也可以用一個(gè)MSENode來建模
class MSE(Node):
def __init__(self, y, a):
Node.__init__(self, [y, a])
def forward(self):
y = self.inbound_nodes[0].value.reshape(-1, 1)
a = self.inbound_nodes[1].value.reshape(-1, 1)
# TODO: your code here
m = len(y)
sum = 0.
for (yi,ai) in zip(y,a):
sum += np.square(yi-ai)
self.value = sum / m
3. 定義反向傳播
現(xiàn)在我們有了衡量輸出好壞的函數(shù)婉徘,我們需要的是怎么能快速的讓輸出盡可能的好,這就要引出Gradient Descent化撕,梯度即slope斜率植阴,我們通過它來定義我們優(yōu)化的方向掠手,更詳細(xì)的可以看文章停下來思考下神經(jīng)網(wǎng)絡(luò)
有了梯度的概念后众雷,我們來看一個(gè)神經(jīng)網(wǎng)絡(luò)圖:
上面我們?yōu)榱擞?jì)算MESE對于w1的梯度纯蛾,我們沿著圖中的紅色線走翻诉,給出了梯度的計(jì)算方式,這種計(jì)算方式就是微積分中的鏈?zhǔn)椒▌t蛾派,能讓我們計(jì)算任意一個(gè)變量的梯度洪乍,下面我們給出梯度的計(jì)算代碼,相比較之前的Node中,多了一個(gè)backward函數(shù)弃酌,看下面的實(shí)現(xiàn):
import numpy as np
class Node(object):
def __init__(self, inbound_nodes=[]):
self.inbound_nodes = inbound_nodes
self.value = None
self.outbound_nodes = []
self.gradients = {}
for node in inbound_nodes:
node.outbound_nodes.append(self)
def forward(self):
raise NotImplementedError
def backward(self):
raise NotImplementedError
class Input(Node):
def __init__(self):
Node.__init__(self)
def forward(self):
pass
def backward(self):
self.gradients = {self: 0}
# 輸入節(jié)點(diǎn)的梯度等于所有輸出的梯度相加
for n in self.outbound_nodes:
grad_cost = n.gradients[self]
self.gradients[self] += grad_cost * 1
class Linear(Node):
def __init__(self, X, W, b):
Node.__init__(self, [X, W, b])
def forward(self):
X = self.inbound_nodes[0].value
W = self.inbound_nodes[1].value
b = self.inbound_nodes[2].value
X = self.inbound_nodes[0].value
W = self.inbound_nodes[1].value
b = self.inbound_nodes[2].value
self.value = np.dot(X, W) + b
def backward(self):
self.gradients = {n: np.zeros_like(n.value) for n in self.inbound_nodes}
for n in self.outbound_nodes:
grad_cost = n.gradients[self]
# y = XW + b
# 分別計(jì)算y相對于每個(gè)輸入節(jié)點(diǎn)的梯度
# delta_x = w
self.gradients[self.inbound_nodes[0]] += np.dot(grad_cost, self.inbound_nodes[1].value.T)
# delta_w = x
self.gradients[self.inbound_nodes[1]] += np.dot(self.inbound_nodes[0].value.T, grad_cost)
# delta_b = 1
self.gradients[self.inbound_nodes[2]] += np.sum(grad_cost, axis=0, keepdims=False)
class Sigmoid(Node):
def __init__(self, node):
# The base class constructor.
Node.__init__(self, [node])
def _sigmoid(self, x):
return 1. / (1. + np.exp(-x))
def forward(self):
input_value = self.inbound_nodes[0].value
self.value = self._sigmoid(input_value)
def backward(self):
# Initialize the gradients to 0.
self.gradients = {n: np.zeros_like(n.value) for n in self.inbound_nodes}
for n in self.outbound_nodes:
# Get the partial of the cost with respect to this node.
grad_cost = n.gradients[self]
sigmoid = self.value
self.gradients[self.inbound_nodes[0]] = sigmoid * (1-sigmoid) * grad_cost
class MSE(Node):
def __init__(self, y, a):
# Call the base class' constructor.
Node.__init__(self, [y, a])
def forward(self):
y = self.inbound_nodes[0].value.reshape(-1, 1)
a = self.inbound_nodes[1].value.reshape(-1, 1)
self.m = self.inbound_nodes[0].value.shape[0]
self.diff = y - a
self.value = np.mean(self.diff**2)
def backward(self):
self.gradients[self.inbound_nodes[0]] = (2 / self.m) * self.diff
self.gradients[self.inbound_nodes[1]] = (-2 / self.m) * self.diff
def topological_sort(feed_dict):
input_nodes = [n for n in feed_dict.keys()]
G = {}
nodes = [n for n in input_nodes]
while len(nodes) > 0:
n = nodes.pop(0)
if n not in G:
G[n] = {'in': set(), 'out': set()}
for m in n.outbound_nodes:
if m not in G:
G[m] = {'in': set(), 'out': set()}
G[n]['out'].add(m)
G[m]['in'].add(n)
nodes.append(m)
L = []
S = set(input_nodes)
while len(S) > 0:
n = S.pop()
if isinstance(n, Input):
n.value = feed_dict[n]
L.append(n)
for m in n.outbound_nodes:
G[n]['out'].remove(m)
G[m]['in'].remove(n)
# if no other incoming edges add to S
if len(G[m]['in']) == 0:
S.add(m)
return L
def forward_and_backward(graph):
# Forward pass
for n in graph:
n.forward()
# Backward pass
# see: https://docs.python.org/2.3/whatsnew/section-slices.html
for n in graph[::-1]:
n.backward()
上面定義了所有需要的節(jié)點(diǎn)和函數(shù),根據(jù)上面我們就可以得出下面的方法了:
X, W, b = Input(), Input(), Input()
y = Input()
f = Linear(X, W, b)
a = Sigmoid(f)
cost = MSE(y, a)
X_ = np.array([[-1., -2.], [-1, -2]])
W_ = np.array([[2.], [3.]])
b_ = np.array([-3.])
y_ = np.array([1, 2])
feed_dict = {
X: X_,
y: y_,
W: W_,
b: b_,
}
graph = topological_sort(feed_dict)
forward_and_backward(graph)
# return the gradients for each Input
gradients = [t.gradients[t] for t in [X, y, W, b]]
print(gradients)
[array([[ -3.34017280e-05, -5.01025919e-05],
[ -6.68040138e-05, -1.00206021e-04]]), array([[ 0.9999833],
[ 1.9999833]]), array([[ 5.01028709e-05],
[ 1.00205742e-04]]), array([ -5.01028709e-05])]
## 4. 隨機(jī)梯度下降(Stochastic Gradient Descent)
以前一直沒明白SGD是什么锹漱,最近才知道。
我們來看如果我們每次對全量數(shù)據(jù)都計(jì)算gradient后再去更新參數(shù)砂心,我們可能會(huì)出現(xiàn)內(nèi)存不夠的情況,
因此我們的一個(gè)策略是:從全量中選出一部分?jǐn)?shù)據(jù),計(jì)算這些數(shù)據(jù)后就更新參數(shù)
因此我們就有了下面的代碼:
def sgd_update(trainables, learning_rate=1e-2):
for n in trainables:
n.value -= learning_rate * n.gradients[n]
from sklearn.datasets import load_boston
from sklearn.utils import shuffle, resample
# Load data
data = load_boston()
X_ = data['data']
y_ = data['target']
# Normalize data
X_ = (X_ - np.mean(X_, axis=0)) / np.std(X_, axis=0)
n_features = X_.shape[1]
n_hidden = 10
W1_ = np.random.randn(n_features, n_hidden)
b1_ = np.zeros(n_hidden)
W2_ = np.random.randn(n_hidden, 1)
b2_ = np.zeros(1)
# Neural network
X, y = Input(), Input()
W1, b1 = Input(), Input()
W2, b2 = Input(), Input()
l1 = Linear(X, W1, b1)
s1 = Sigmoid(l1)
l2 = Linear(s1, W2, b2)
cost = MSE(y, l2)
feed_dict = {
X: X_,
y: y_,
W1: W1_,
b1: b1_,
W2: W2_,
b2: b2_
}
epochs = 10
# Total number of examples
m = X_.shape[0]
batch_size = 11
steps_per_epoch = m // batch_size
graph = topological_sort(feed_dict)
trainables = [W1, b1, W2, b2]
print("Total number of examples = {}".format(m))
# Step 4
for i in range(epochs):
loss = 0
for j in range(steps_per_epoch):
# Step 1
# Randomly sample a batch of examples
X_batch, y_batch = resample(X_, y_, n_samples=batch_size)
# Reset value of X and y Inputs
X.value = X_batch
y.value = y_batch
# Step 2
forward_and_backward(graph)
# Step 3
sgd_update(trainables)
loss += graph[-1].value
print("Epoch: {}, Loss: {:.3f}".format(i+1, loss/steps_per_epoch))
Total number of examples = 506
Epoch: 1, Loss: 133.910
Epoch: 2, Loss: 36.332
Epoch: 3, Loss: 22.353
Epoch: 4, Loss: 26.704
Epoch: 5, Loss: 23.121
Epoch: 6, Loss: 23.491
Epoch: 7, Loss: 21.393
Epoch: 8, Loss: 15.300
Epoch: 9, Loss: 13.391
Epoch: 10, Loss: 15.651
總結(jié)
以上就是我們miniflow的全部了拧咳,我們先是定義Node骆膝,然后定義Node之間的關(guān)系得到圖碎连,再通過forward propagation計(jì)算輸出,通過MES來衡量輸出好壞,通過鏈?zhǔn)椒▌t計(jì)算梯度來更新參數(shù)讓cost不斷縮小,最后通過SGD來加快計(jì)算。