Hinton公開課 PA2

typora-copy-images-to: ./

Hinton 公開課PA2

這篇文章是Hinto公開課Neural Networks for Machine Learning的筆記，因為不常用Matlab尤辱，會有很多關(guān)于Matlab的解釋戚篙。

背景介紹

作業(yè)中使用了一個250個單詞的集合表示單詞表，數(shù)據(jù)集中的每一條記錄都有4個單詞，作業(yè)的目的是使用前三個單詞作為輸入預(yù)測第4個單詞。

這個作業(yè)對應(yīng)的課件是Lec 4，以word的概率表示為出發(fā)點鉴逞，介紹了bp。比較難理解的點是word 的encode司训，以及在嵌入層中的表示^[1]^[2]。

嵌入層的含義

在課中的小測驗中已經(jīng)透漏了為什么要將每個單詞占用一個元素的位置：

線性可分開
各個數(shù)據(jù)獨立液南，保證沒有先驗性知識存在壳猜。

Load Data和Struct

Matlab的Struct

data.mat和load.m數(shù)據(jù)相關(guān)，用load data.m后獲得一個struct滑凉，matlab中的stuct跟C的比較類似统扳，是眾多支持的dtatype中的一種喘帚，支持的函數(shù)很多，比較有用的一個是fieldnames獲得字段名咒钟，其他函數(shù)參考這里吹由。在octave的command line里邊打data可以獲得該類型的提示：

>> data
data =
  scalar structure containing the fields:
      testData =
           Columns 1 through 25:
           xx xx xx

使用fieldnames函數(shù)獲得如下字段：

>> fieldnames(data)
ans =
{
  [1, 1] = testData
  [2, 1] = trainData
  [3, 1] = validData
  [4, 1] = vocab
}

有點不適應(yīng)matlab里邊什么都用矩陣索引的方式，另外朱嘴，為什么是scalar sruct倾鲫？看matlab官網(wǎng)的tutorial怎么創(chuàng)建一個struct：

patient(1).name = 'John Doe';
patient(1).billing = 127.00;
patient(1).test = [79, 75, 73; 180, 178, 177.5; 220, 210, 205];
patient

patient = scalar struct containing the fields::
       name: 'John Doe'
       billing: 127
       test: [3×3 double]

往array里邊添加一個struct：

patient(2).name = 'Ann Lane';
patient(2).billing = 28.50;
patient(2).test = [68, 70, 68; 118, 118, 119; 172, 170, 169];
patient

patient = 1×2 struct array with fields:
    name
    billing
    test

此時就變成了1x2 struct array不再是scaar了。有一個有趣的章節(jié)是Cell vs. Struct Arrays萍嬉，cell和struct的區(qū)別是struct可以用field name索引乌昔，而cell只能用index索引∪雷罚看下面例子：

temperature(1,:) = {'2009-12-31', [45, 49, 0]};
temperature(2,:) = {'2010-04-03', [54, 68, 21]};
temperature(3,:) = {'2010-06-20', [72, 85, 53]};
temperature(4,:) = {'2010-09-15', [63, 81, 56]};
temperature(5,:) = {'2010-12-09', [38, 54, 18]};

temperature
temperature = 5×2 cell array
    '2009-12-31'    [1×3 double]
    '2010-04-03'    [1×3 double]
    '2010-06-20'    [1×3 double]
    '2010-09-15'    [1×3 double]
    '2010-12-09'    [1×3 double]

創(chuàng)建數(shù)組的方式可以看到仍然是(row, col)這種磕道，數(shù)組索引都是從1開始。很像是Python里邊的tuple行冰，用行列方式獲得數(shù)據(jù)溺蕉。例如：

>> temperature(:, 1)
ans = 
{
  [1, 1] = 2009-12-31
  [2, 1] = 2010-04-03
  [3, 1] = 2010-06-20
  [4, 1] = 2010-09-15
  [5, 1] = 2010-12-09
}

>> temperature(1, :)
ans =
{
  [1,1] = 2009-12-31
  [1,2] = 45   49    0
}

順便說一下，在command里邊用clear可以清除已經(jīng)設(shè)置的變量悼做。用whos或者class 查看變量的類型焙贷。

代碼

function [train_input, train_target, valid_input, valid_target, test_input, test_target, vocab] = load_data(N)
% This method loads the training, validation and test set.
% It also divides the training set into mini-batches.
% Inputs:
%   N: Mini-batch size. 批量的大小
% Outputs:
%   train_input: An array of size D X N X M, where
%                 D: number of input dimensions (in this case, 3).
%                 N: size of each mini-batch (in this case, 100).
%                 M: number of minibatches.
%   train_target: An array of size 1 X N X M.
%   valid_input: An array of size D X number of points in the validation set.
%   test: An array of size D X number of points in the test set.
%   vocab: Vocabulary containing index to word mapping.

load data.mat;
numdims = size(data.trainData, 1);
D = numdims - 1;
M = floor(size(data.trainData, 2) / N);
train_input = reshape(data.trainData(1:D, 1:N * M), D, N, M);
train_target = reshape(data.trainData(D + 1, 1:N * M), 1, N, M);
valid_input = data.validData(1:D, :);
valid_target = data.validData(D + 1, :);
test_input = data.testData(1:D, :);
test_target = data.testData(D + 1, :);
vocab = data.vocab;
end

size函數(shù)的使用方法：

sz = size(A) returns a row vector whose elements contain the length of the corresponding dimension of A. For example, if A is a 3-by-4 matrix, then size(A) returns the vector [3 4]. The length of sz is ndims(A).
If A is a table or timetable, then size(A) returns a two-element row vector consisting of the number of rows and the number of table variables.

szdim = size(A,dim) returns the length of dimension dim.
[m,n] = size(A) returns the number of rows and columns when A is a matrix.
[sz1,...,szN] = size(A) returns the length of each dimension of A separately.

可以看到，size返回了一個dimension的數(shù)組贿堰，可以通過參數(shù)返回某一維的數(shù)據(jù)辙芍。

(1:D, 1:N * M) 前者是說取1-D行，后面的意思是說取1-N列乘以M次羹与，也就是多少個1-N故硅。

data.trainData是一個4 x 372550的矩陣，分成了兩部分纵搁，train_input和train_target吃衅，前者是三維矩陣，后者是一維矩陣腾誉。然后通過reshape分成一個個的batch徘层。這樣N = 100, D = 3, M = 3725。

>> data.trainData(1:4, 1:10)
ans = 
   28  184  183  117  223   42  242  223   74   42
   26   44   32  247  190   74   32   32   32  192
   90  249   76  201  249   26  223  158  221   91
  144  117  122  186    6   32   32  144   32   68
>> train_input(1:3, 1:10)
ans =
   28  184  183  117  223   42  242  223   74   42
   26   44   32  247  190   74   32   32   32  192
   90  249   76  201  249   26  223  158  221   91
>> train_target(1, 1:10)
ans =
  144  117  122  186    6   32   32  144   32   68

validData和testData的大小都是4 x 46568利职，比train的大小小了一個數(shù)量級趣效。可以看到猪贪，train是分批量的跷敬，valid和test是不用分批量的。輸入是列向量热押。

Train 數(shù)據(jù)集

初始化代碼和參數(shù)配置

% This function trains a neural network language model.
function [model] = train(epochs)
% Inputs:
%   epochs: Number of epochs to run.
% Output:
%   model: A struct containing the learned weights and biases and vocabulary.

% SET HYPERPARAMETERS HERE.
batchsize = 100;      % Mini-batch size.
learning_rate = 0.1;  % Learning rate; default = 0.1.
momentum = 0.9;       % Momentum; default = 0.9.
numhid1 = 50;         % Dimensionality of embedding space; default = 50.
numhid2 = 200;        % Number of units in hidden layer; default = 200.
init_wt = 0.01;       % Standard deviation of the normal distribution
                      % which is sampled to get the initial weights; default = 0.01

epochs表示訓(xùn)練多少個來回西傀，momentum表示使用了動量gradient descent方法斤寇。

% VARIABLES FOR TRACKING TRAINING PROGRESS.
show_training_CE_after = 100;
show_validation_CE_after = 1000;

cross entropy (CE) 表示交叉熵，每100個batch求一次平均交叉熵拥褂，每1000個batch后運行一次validation娘锁，這時求一次valid 交叉熵。

% LOAD DATA.
[train_input, train_target, valid_input, valid_target, ...
  test_input, test_target, vocab] = load_data(batchsize);
[numwords, batchsize, numbatches] = size(train_input); 
vocab_size = size(vocab, 2);

除了第一節(jié)的內(nèi)容饺鹃，numwords = 3, batchsize = 100, numbatches = 3725, 補充一下這里的vocab是一個行向量莫秆，一共有250個，所以vocab_size = 250.

% INITIALIZE WEIGHTS AND BIASES.
word_embedding_weights = init_wt * randn(vocab_size, numhid1);
embed_to_hid_weights = init_wt * randn(numwords * numhid1, numhid2);
hid_to_output_weights = init_wt * randn(numhid2, vocab_size);
hid_bias = zeros(numhid2, 1);
output_bias = zeros(vocab_size, 1);

word_embedding_weights_delta = zeros(vocab_size, numhid1);
word_embedding_weights_gradient = zeros(vocab_size, numhid1);
embed_to_hid_weights_delta = zeros(numwords * numhid1, numhid2);
hid_to_output_weights_delta = zeros(numhid2, vocab_size);
hid_bias_delta = zeros(numhid2, 1);
output_bias_delta = zeros(vocab_size, 1);
expansion_matrix = eye(vocab_size);
count = 0;
tiny = exp(-30);

randn 表示創(chuàng)建一個服從正態(tài)分布的隨機數(shù)尤慰，參數(shù)就是行馏锡、列，同類型的函數(shù)族：randn() randn(n)伟端。 zeros 的類型跟randn 完全相同杯道。eye 也類似，就是identity matrix责蝠。網(wǎng)絡(luò)圖：

word_embedding_weights = [250, 50]
embed_to_hid_weights = [3 * 50, 200]
hid_to_output_weight = [200, 250]
hid_bias = [200, 1]
output_bias = [250, 1]

這個圖里邊輸入是三個單詞的索引党巾，輸出是推測出來的第四個單詞的索引，嵌入層默認有50個霜医，隱藏層默認有200個齿拂。所以輸入到嵌入層是在word_embedding_weights 里邊查找對應(yīng)的index。

Q: 這里為什么要用50個是什么意思肴敛？不是只有三個輸入嗎署海？

A: 這個網(wǎng)絡(luò)圖畫的很精簡，沒有畫出神經(jīng)元的個數(shù)医男，事實上砸狞，嵌入層有50個神經(jīng)元，與輸入層是全連**接關(guān)系镀梭。同樣的隱藏層與嵌入層也是這種關(guān)系刀森。所以輸入在每個weight矩陣里邊都有表示。

Q: 每個單詞是如何表示的报账？

A: 每個單詞一個索引研底。每個單詞都與50個神經(jīng)元有聯(lián)系。

Train

前向傳播

% TRAIN.
for epoch = 1:epochs
  fprintf(1, 'Epoch %d\n', epoch);
  this_chunk_CE = 0;
  trainset_CE = 0;
  % LOOP OVER MINI-BATCHES.
  for m = 1:numbatches
    input_batch = train_input(:, :, m);
    target_batch = train_target(:, :, m);

    % FORWARD PROPAGATE.
    % Compute the state of each layer in the network given the input batch
    % and all weights and biases
    [embedding_layer_state, hidden_layer_state, output_layer_state] = ...
      fprop(input_batch, ...
            word_embedding_weights, embed_to_hid_weights, ...
            hid_to_output_weights, hid_bias, output_bias);

input_batch是[3, 100]透罢，target_batch是[1, 100]榜晦。fprop的代碼中，首先計算word embedding layer的值：

[numwords, batchsize] = size(input_batch);
[vocab_size, numhid1] = size(word_embedding_weights);
numhid2 = size(embed_to_hid_weights, 2);

%% COMPUTE STATE OF WORD EMBEDDING LAYER.
% Look up the inputs word indices in the word_embedding_weights matrix.
embedding_layer_state = reshape(...
  word_embedding_weights(reshape(input_batch, 1, []),:)',...
  numhid1 * numwords, []);

這里用實際上做的還是矩陣乘法琐凭，但是用index縮小了計算量[1]芽隆。相當于：

inputs = zeros(numwords * batchszie, vocabsize) % [batch input Number, features number]
inputs * word_embedding_weights

最終得到的embedding_layer_state是[300, 50] -> [150, 100]的矩陣。

Q: reshape(input_batch, 1, [])中[]是什么意思统屈？

A: 首先看一下不用[]胚吁，MATLAB會報錯，看這里的api介紹 []的作用是讓matlab自己去推導(dǎo)個數(shù)愁憔。所以這一行意思是把3*100的input拉直成一個1 x 300的列向量腕扶。

Q: 怎么用input_batch的值查找weight矩陣？

A: word_embedding_weights是一個[250, 50]的矩陣吨掌，用[1, 300]的列向量索引~~使用了matlab的Assigning to Elements Outside Array Bounds 特性~~半抱，使用的是將input的值每個都作為行索引，最終將結(jié)果擴展成一個[300, 50]的矩陣膜宋。順便提這里matlab的matrix indexing窿侈，matlab中矩陣是按列在內(nèi)存中布局的，也就是說最好是按列計算秋茫。

Q: 還是不列舉這個過程史简，能不能舉個例子？

A: 例子如下肛著，input的作用就是選哪一行圆兵。

>> weights = magic(3)
weights = 
    8    1    6
    3    5    7
    4    9    2
>> input = [3, 2, 1, 3]' % the value should not exceed 3 or error occurs.
>> weights(input, :)
ans =
    4    9    2
    3    5    7
    8    1    6
    4    9    2

Q:例子中為什么用列向量作為input?而不是行向量

A: 實驗了一下是一樣的，列向量更有效率(memory layout)枢贿？殉农，重新測試如下：

>> input = [3, 2]'
>> weights(input, :)
ans =
    4    9    2
    3    5    7
>> input = [3, 2]
ans =
    4    9    2
    3    5    7

具體形狀如何變化參考：Accessing Multiple Elements

http://spaces.ac.cn/archives/4122/ "詞向量與Embedding究竟是怎么回事？" ?
https://zhuanlan.zhihu.com/p/27830489 "YJango的Word Embedding--介紹" ?

最后編輯于：2017.12.09 01:58:00

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末局荚，一起剝皮案震驚了整個濱河市超凳，隨后出現(xiàn)的幾起案子，更是在濱河造成了極大的恐慌耀态，老刑警劉巖轮傍，帶你破解...
沈念sama閱讀 218,858評論 6贊 508
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件，死亡現(xiàn)場離奇詭異茫陆，居然都是意外死亡金麸，警方通過查閱死者的電腦和手機，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 93,372評論 3贊 395
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進店門簿盅，熙熙樓的掌柜王于貴愁眉苦臉地迎上來挥下，“玉大人，你說我怎么就攤上這事桨醋∨镂粒” “怎么了？”我有些...
開封第一講書人閱讀 165,282評論 0贊 356
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵喜最，是天一觀的道長偎蘸。經(jīng)常有香客問我，道長，這世上最難降的妖魔是什么迷雪？我笑而不...
開封第一講書人閱讀 58,842評論 1贊 295
?港島之戀（遺憾婚禮）
正文為了忘掉前任限书，我火速辦了婚禮，結(jié)果婚禮上章咧，老公的妹妹穿的比我還像新娘倦西。我一直安慰自己，他們只是感情好赁严，可當我...
茶點故事閱讀 67,857評論 6贊 392
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布扰柠。她就那樣靜靜地躺著，像睡著了一般疼约。火紅的嫁衣襯著肌膚如雪卤档。梳的紋絲不亂的頭發(fā)上，一...
開封第一講書人閱讀 51,679評論 1贊 305
城市分裂傳說
那天程剥，我揣著相機與錄音劝枣，去河邊找鬼。笑死倡缠，一個胖子當著我的面吹牛哨免，可吹牛的內(nèi)容都是我干的。我是一名探鬼主播昙沦，決...
沈念sama閱讀 40,406評論 3贊 418
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼琢唾，長吁一口氣：“原來是場噩夢啊……” “哼！你這毒婦竟也來了盾饮？” 一聲冷哼從身側(cè)響起采桃，我...
開封第一講書人閱讀 39,311評論 0贊 276
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤，失蹤者是張志新（化名）和其女友劉穎丘损，沒想到半個月后普办，有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體，經(jīng)...
沈念sama閱讀 45,767評論 1贊 315
?護林員之死
正文獨居荒郊野嶺守林人離奇死亡徘钥，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點故事閱讀 37,945評論 3贊 336
?白月光啟示錄
正文我和宋清朗相戀三年衔蹲，在試婚紗的時候發(fā)現(xiàn)自己被綠了。大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片呈础。...
茶點故事閱讀 40,090評論 1贊 350
活死人
序言：一個原本活蹦亂跳的男人離奇死亡舆驶，死狀恐怖，靈堂內(nèi)的尸體忽然破棺而出而钞，到底是詐尸還是另有隱情沙廉，我是刑警寧澤，帶...
沈念sama閱讀 35,785評論 5贊 346
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布臼节，位于F島的核電站撬陵，受9級特大地震影響珊皿，放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜巨税，卻給世界環(huán)境...
茶點故事閱讀 41,420評論 3贊 331
男人毒藥：我在死后第九天來索命
文/蒙蒙一蟋定、第九天我趴在偏房一處隱蔽的房頂上張望。院中可真熱鬧垢夹，春花似錦溢吻、人聲如沸维费。這莊子的主人今日做“春日...
開封第一講書人閱讀 31,988評論 0贊 22
一樁弒父案，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽犀盟。三九已至而晒，卻和暖如春，著一層夾襖步出監(jiān)牢的瞬間阅畴，已是汗流浹背倡怎。一陣腳步聲響...
開封第一講書人閱讀 33,101評論 1贊 271
情欲美人皮
我被黑心中介騙來泰國打工，沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留贱枣，地道東北人监署。一個月前我還...
沈念sama閱讀 48,298評論 3贊 372
代替公主和親
正文我出身青樓，卻偏偏與公主長得像纽哥，于是被迫代替她去往敵國和親钠乏。傳聞我的和親對象是個殘疾皇子，可洞房花燭夜當晚...
茶點故事閱讀 45,033評論 2贊 355