- 深度學(xué)習(xí)
- 目標(biāo)檢測(cè)
- 深度學(xué)習(xí)
- 神經(jīng)網(wǎng)絡(luò)
學(xué)到的新知識(shí)
bn放在relu后面
用于分類广凸、檢測(cè)和分割的移動(dòng)網(wǎng)絡(luò) MobileNetV2
卷積核的數(shù)量
卷積神經(jīng)網(wǎng)絡(luò) — 從0開始
當(dāng)輸入數(shù)據(jù)有多個(gè)通道的時(shí)候搏屑,每個(gè)通道會(huì)有對(duì)應(yīng)的權(quán)重氛谜,然后會(huì)對(duì)每個(gè)通道做卷積之后在通道之間求和证薇。所以當(dāng)輸出只有一個(gè)的時(shí)候肢藐,卷積的channel數(shù)目和data的channel數(shù)目是一樣的。
當(dāng)輸出需要多通道時(shí)乞娄,每個(gè)輸出通道有對(duì)應(yīng)權(quán)重剃斧,然后每個(gè)通道上做卷積杂拨。所以當(dāng)輸入有n個(gè)channel,輸出有h個(gè)channel時(shí)悯衬,卷積核channel數(shù)目為n * h弹沽,每個(gè)輸出channel對(duì)應(yīng)一個(gè)bias ,卷積核的維度為(h,n,w,h)
123
gluon語法
nn.Block與nn.sequential的嵌套使用
class RecMLP(nn.Block):
def __init__(self, **kwargs):
super(RecMLP, self).__init__(**kwargs)
self.net = nn.Sequential()
with self.name_scope():
self.net.add(nn.Dense(256, activation="relu"))
self.net.add(nn.Dense(128, activation="relu"))
self.dense = nn.Dense(64)
def forward(self, x):
return nd.relu(self.dense(self.net(x)))
rec_mlp = nn.Sequential()
rec_mlp.add(RecMLP())
rec_mlp.add(nn.Dense(10))
print(rec_mlp)
初始化與參數(shù)訪問
from mxnet import init
params.initialize(init=init.Normal(sigma=0.02), force_reinit=True)
print(net[0].weight.data(), net[0].bias.data())
我們也可以通過collect_params來訪問Block里面所有的參數(shù)(這個(gè)會(huì)包括所有的子Block)。它會(huì)返回一個(gè)名字到對(duì)應(yīng)Parameter的dict筋粗。
也可以自定義各層的初始化方法策橘,沒有自定義的按照net.initialize里面的方法進(jìn)行定義
from mxnet.gluon import nn
from mxnet import nd
from mxnet import init
def get_net():
net = nn.Sequential()
with net.name_scope():
net.add(nn.Dense(4,activation="relu"))#,weight_initializer=init.Xavier()))
net.add(nn.Dense(2,weight_initializer=init.Zero(),bias_initializer=init.Zero()) )
return net
x = nd.random.uniform(shape=(3,5))
net = get_net()
net.initialize(init.One())
net(x)
print(net[1].weight.data
GPU訪問
- 刪除cpu版本mxnet
pip uninstall mxnet
- 更新GPU版本mxnet
pip install -U --pre mxnet-cu80
- 查看版本號(hào)
import pip
for pkg in ['mxnet', 'mxnet-cu75', 'mxnet-cu80']:
pip.main(['show', pkg])
使用jupyter的相關(guān)插件
notedown插件
可以在jupyter 中查看markdown文件nb_conda
是conda的插件,可以在jupyter里面修改python內(nèi)核版本
優(yōu)化方法
momentum
gluon.Trainer的learning_rate屬性和set_learning_rate函數(shù)可以隨意調(diào)整學(xué)習(xí)率娜亿。
trainer = gluon.Trainer(net.collect_params(), 'sgd',
{'learning_rate': lr, 'momentum': mom})
adagrad
Adagrad是一個(gè)在迭代過程中不斷自我調(diào)整學(xué)習(xí)率丽已,并讓模型參數(shù)中每個(gè)元素都使用不同學(xué)習(xí)率的優(yōu)化算法。
trainer = gluon.Trainer(net.collect_params(), 'adagrad',
{'learning_rate': lr})
Adam
trainer = gluon.Trainer(net.collect_params(), 'adam',
{'learning_rate': lr})
通過以上分析, 理論上可以說, 在數(shù)據(jù)比較稀疏的時(shí)候, adaptive 的方法能得到更好的效果, 例如, adagrad, adadelta, rmsprop, adam 等. 在數(shù)據(jù)稀疏的情況下, adam 方法也會(huì)比 rmsprop 方法收斂的結(jié)果要好一些, 所以, 通常在沒有其它更好的理由的前框下, 我會(huì)選用 adam 方法, 可以比較快地得到一個(gè)預(yù)估結(jié)果. 但是, 在論文中, 我們看到的大部分還是最原始的 mini-batch 的 SGD 方法. 因?yàn)轳R鞍面的存在等問題, SGD 方法有時(shí)候較難收斂. 另外, SGD 對(duì)于參數(shù)的初始化要求也比較高. 所以, 如果要是想快速收斂的話, 建議使用 adam 這類 adaptive 的方法
延遲執(zhí)行
延后執(zhí)行使得系統(tǒng)有更多空間來做性能優(yōu)化买决。但我們推薦每個(gè)批量里至少有一個(gè)同步函數(shù)沛婴,例如對(duì)損失函數(shù)進(jìn)行評(píng)估,來避免將過多任務(wù)同時(shí)丟進(jìn)后端系統(tǒng)督赤。
from mxnet import autograd
mem = get_mem()
total_loss = 0
for x, y in get_data():
with autograd.record():
L = loss(y, net(x))
total_loss += L.sum().asscalar()
L.backward()
trainer.step(x.shape[0])
nd.waitall()
print('Increased memory %f MB' % (get_mem() - mem))
多GPU訓(xùn)練
ctx = [gpu(i) for i in range(num_gpus)]
data_list = gluon.utils.split_and_load(data, ctx)
label_list = gluon.utils.split_and_load(label, ctx)
fintune 微調(diào)
一些可以重復(fù)使用的代碼
讀取數(shù)據(jù)
from mxnet import gluon
from mxnet import ndarray as nd
def transform(data, label):
return data.astype('float32')/255, label.astype('float32')
mnist_train = gluon.data.vision.FashionMNIST(train=True, transform=transform)
mnist_test = gluon.data.vision.FashionMNIST(train=False, transform=transform)
計(jì)算精度
def accuracy(output, label):
return nd.mean(output.argmax(axis=1)==label).asscalar()
我們先使用Flatten層將輸入數(shù)據(jù)轉(zhuǎn)成 batch_size x ? 的矩陣嘁灯,然后輸入到10個(gè)輸出節(jié)點(diǎn)的全連接層。照例我們不需要制定每層輸入的大小躲舌,gluon會(huì)做自動(dòng)推導(dǎo)丑婿。
激活函數(shù)
sigmoid
from mxnet import nd
def softmax(X):
exp = nd.exp(X)
# 假設(shè)exp是矩陣,這里對(duì)行進(jìn)行求和没卸,并要求保留axis 1羹奉,
# 就是返回 (nrows, 1) 形狀的矩陣
partition = exp.sum(axis=1, keepdims=True)
return exp / partition
relu
def relu(X):
return nd.maximum(X, 0)
損失函數(shù)
平方誤差
square_loss = gluon.loss.L2Loss()
def square_loss(yhat, y):
# 注意這里我們把y變形成yhat的形狀來避免矩陣形狀的自動(dòng)轉(zhuǎn)換
return (yhat - y.reshape(yhat.shape)) ** 2
交叉熵?fù)p失
def cross_entropy(yhat, y):
return - nd.pick(nd.log(yhat), y)
softmax_cross_entropy = gluon.loss.SoftmaxCrossEntropyLoss()
取一個(gè)batch_size的代碼
scratch版本
import random
batch_size = 1
def data_iter(num_examples):
idx = list(range(num_examples))
random.shuffle(idx)
for i in range(0, num_examples, batch_size):
j = nd.array(idx[i:min(i+batch_size,num_examples)])
yield X.take(j), y.take(j)
gluon版本
batch_size = 1
dataset_train = gluon.data.ArrayDataset(X_train, y_train)
data_iter_train = gluon.data.DataLoader(dataset_train, batch_size, shuffle=True)
初始化權(quán)值
scratch版本
def get_params():
w = nd.random.normal(shape=(num_inputs, 1))*0.1
b = nd.zeros((1,))
for param in (w, b):
param.attach_grad()
return (w, b)
gluon版本
net.initialize()
net.collect_params().initialize(mx.init.Normal(sigma=1))
SGD
scratch版本
def SGD(params, lr):
for param in params:
param[:] = param - lr * param.grad
L2正則
def L2_penalty(w, b):
return ((w**2).sum() + b**2) / 2
gluon版本
trainer = gluon.Trainer(net.collect_params(), 'sgd', {
'learning_rate': learning_rate, 'wd': weight_decay})
這里的weight_decay表明這里添加了L2正則,正則化
w = w -lr * grad - wd * w
訓(xùn)練過程
scratch版本
for e in range(epochs):
for data, label in data_iter(num_train):
with autograd.record():
output = net(data, lambd, *params)
loss = square_loss(
output, label) + lambd * L2_penalty(*params)
loss.backward()
SGD(params, learning_rate)
train_loss.append(test(params, X_train, y_train))
test_loss.append(test(params, X_test, y_test))
gluon版本
for e in range(epochs):
for data, label in data_iter_train:
with autograd.record():
output = net(data)
loss = square_loss(output, label)
loss.backward()
trainer.step(batch_size)
train_loss.append(test(net, X_train, y_train))
test_loss.append(test(net, X_test, y_test))
%matplotlib inline
import matplotlib as mpl
mpl.rcParams['figure.dpi']= 120
import matplotlib.pyplot as plt
def train(X_train, X_test, y_train, y_test):
# 線性回歸模型
net = gluon.nn.Sequential()
with net.name_scope():
net.add(gluon.nn.Dense(1))
net.initialize()
# 設(shè)一些默認(rèn)參數(shù)
learning_rate = 0.01
epochs = 100
batch_size = min(10, y_train.shape[0])
dataset_train = gluon.data.ArrayDataset(X_train, y_train)
data_iter_train = gluon.data.DataLoader(
dataset_train, batch_size, shuffle=True)
# 默認(rèn)SGD和均方誤差
trainer = gluon.Trainer(net.collect_params(), 'sgd', {
'learning_rate': learning_rate})
square_loss = gluon.loss.L2Loss()
# 保存訓(xùn)練和測(cè)試損失
train_loss = []
test_loss = []
for e in range(epochs):
for data, label in data_iter_train:
with autograd.record():
output = net(data)
loss = square_loss(output, label)
loss.backward()
trainer.step(batch_size)
train_loss.append(square_loss(
net(X_train), y_train).mean().asscalar())
test_loss.append(square_loss(
net(X_test), y_test).mean().asscalar())
# 打印結(jié)果
plt.plot(train_loss)
plt.plot(test_loss)
plt.legend(['train','test'])
plt.show()
return ('learned weight', net[0].weight.data(),
'learned bias', net[0].bias.data())
最終版
def train(train_data, test_data, net, loss, trainer, ctx, num_epochs, print_batches=None):
"""Train a network"""
print("Start training on ", ctx)
if isinstance(ctx, mx.Context):
ctx = [ctx]
for epoch in range(num_epochs):
train_loss, train_acc, n, m = 0.0, 0.0, 0.0, 0.0
if isinstance(train_data, mx.io.MXDataIter):
train_data.reset()
start = time()
for i, batch in enumerate(train_data):
data, label, batch_size = _get_batch(batch, ctx)
losses = []
with autograd.record():
outputs = [net(X) for X in data]
losses = [loss(yhat, y) for yhat, y in zip(outputs, label)]
for l in losses:
l.backward()
train_acc += sum([(yhat.argmax(axis=1)==y).sum().asscalar()
for yhat, y in zip(outputs, label)])
train_loss += sum([l.sum().asscalar() for l in losses])
trainer.step(batch_size)
n += batch_size
m += sum([y.size for y in label])
if print_batches and (i+1) % print_batches == 0:
print("Batch %d. Loss: %f, Train acc %f" % (
n, train_loss/n, train_acc/m
))
test_acc = evaluate_accuracy(test_data, net, ctx)
print("Epoch %d. Loss: %.3f, Train acc %.2f, Test acc %.2f, Time %.1f sec" % (
epoch, train_loss/n, train_acc/m, test_acc, time() - start
))
reference
mxnet 使用自己的圖片數(shù)據(jù)訓(xùn)練CNN模型
Create a Dataset Using RecordIO
解決conda與ipython notebook的python版本問題
神經(jīng)網(wǎng)絡(luò)計(jì)算參數(shù)量的方法