結(jié)合 deeplearnjs 和之前的一篇神經(jīng)網(wǎng)絡(luò)的文章《小白成長記之神經(jīng)網(wǎng)絡(luò)》抛姑,來看看二者各自在最簡單情況下的一次應(yīng)用拣帽,以此繼續(xù)尋找一些學(xué)習(xí)線索换衬。
通過一些樣本 X炸卑,去預(yù)測一個一元二次方程:y = ax^2 + bx + c爷辱,實際上就是通過樣本數(shù)據(jù)的擬合真實曲線的過程录豺,通過不斷的調(diào)整方程系數(shù),并最終確定一組最優(yōu)系數(shù) a, b, c饭弓。
怎么動手看起來似乎有點麻煩双饥。如果將上述方程改寫成一個二元一次方程:y = ax1 + bx2 + c。這個問題就變成了一個簡單的線性規(guī)劃問題弟断。
在神經(jīng)網(wǎng)絡(luò)模型下咏花,上述問題就是一個單層網(wǎng)絡(luò),且該層只有一個神經(jīng)節(jié)點(節(jié)點偏移量為 c),且該層的激活函數(shù)是一個線性函數(shù)昏翰。在兩個變量 x1, x2 及其權(quán)重系數(shù) a, b 下苍匆,預(yù)測出 y 值。
現(xiàn)在看看 deeplearnjs 框架下棚菊,是如何編程實現(xiàn)的——實際上來自于它的教程浸踩。
import * as dl from 'deeplearn';
/**
* We want to learn the coefficients that give correct solutions to the
* following quadratic equation:
* y = a * x^2 + b * x + c
* In other words we want to learn values for:
* a
* b
* c
* Such that this function produces 'desired outputs' for y when provided
* with x. We will provide some examples of xs and ys to allow this model
* to learn what we mean by desired outputs and then use it to produce new
* values of y that fit the curve implied by our example.
*/
// Step 1. Set up variables, these are the things we want the model
// to learn in order to do prediction accurately. We will initialize
// them with random values.
const a = dl.variable(dl.scalar(Math.random()));
const b = dl.variable(dl.scalar(Math.random()));
const c = dl.variable(dl.scalar(Math.random()));
// 注:dl.scalar, dl.mul, dl.square 等等方法生成(或返回)的都是一個 Tensor
// dl.variable 創(chuàng)建了一個變量,實質(zhì)上 Variable 類繼承自 Tensor 類
// Step 2. Create an optimizer, we will use this later
const learningRate = 0.01;
const optimizer = dl.train.sgd(learningRate);
// Step 3. Write our training process functions.
/*
* This function represents our 'model'. Given an input 'x' it will try and predict
* the appropriate output 'y'.
*
* This could be as complicated a 'neural net' as we would like, but we can just
* directly model the quadratic equation we are trying to model.
*
* It is also sometimes referred to as the 'forward' step of our training process.
* Though we will use the same function for predictions later.
*
*
* @return number predicted y value
*/
function predict(input) { // y = a * x ^ 2 + b * x + c
return dl.tidy(() => {
const x = dl.scalar(input);
const ax2 = a.mul(x.square());
const bx = b.mul(x);
const y = ax2.add(bx).add(c);
return y;
});
}
/*
* This will tell us how good the 'prediction' is given what we actually expected.
*
* prediction is a tensor with our predicted y value.
* actual number is a number with the y value the model should have predicted.
*/
function loss(prediction, actual) {
// Having a good error metric is key for training a machine learning model
const error = dl.scalar(actual).sub(prediction).square();
return error;
}
/*
* This will iteratively train our model. We test how well it is doing
* after numIterations by calculating the mean error over all the given
* samples after our training.
*
* xs - training data x values
* ys — training data y values
*/
async function train(xs, ys, numIterations, done) {
let currentIteration = 0;
for (let iter = 0; iter < numIterations; iter++) {
for (let i = 0; i < xs.length; i++) {
// Minimize is where the magic happens, we must return a
// numerical estimate (i.e. loss) of how well we are doing using the
// current state of the variables we created at the start.
// This optimizer does the 'backward' step of our training data
// updating variables defined previously in order to minimize the
// loss.
optimizer.minimize(() => {
// Feed the examples into the model
const pred = predict(xs[i]);
const predLoss = loss(pred, ys[i]);
return predLoss;
});
}
// Use dl.nextFrame to not block the browser.
await dl.nextFrame();
}
done();
}
/*
* This function compare expected results with the predicted results from
* our model.
*/
function test(xs, ys) {
dl.tidy(() => {
const predictedYs = xs.map(predict);
console.log('Expected', ys);
console.log('Got', predictedYs.map((p) => p.dataSync()[0]));
})
}
// 樣本數(shù)據(jù)
const data = {
xs: [0, 1, 2, 3],
ys: [1.1, 5.9, 16.8, 33.9]
};
// Lets see how it does before training.
console.log('Before training: using random coefficients')
test(data.xs, data.ys);
train(data.xs, data.ys, 50, () => {
console.log(
`After training: a=${a.dataSync()}, b=${b.dataSync()}, c=${c.dataSync()}`)
test(data.xs, data.ys);
console.log('Start to predict the output of 4 through the trained model:', predict(4).dataSync()[0])
});
// Huzzah we have trained a simple machine learning model!
我們來看看代碼中幾個重要的過程统求。
變量初始化
我們并不關(guān)心變量的初始值检碗,因而用了一個隨機(jī)的標(biāo)量,作為變量的初始值码邻。變量初始化后后裸,整個模型就完成來初始化——這一初始化,是迭代的起點冒滩,必不可少微驶。因為隨機(jī)的初始化,故此時的模型與“真實”的模型有著非常大的差異开睡。
可以通過打印初始系數(shù)因苹,和訓(xùn)練后最終的系數(shù)做對比:
a.print(); b.print(); c.print();
預(yù)測 predict
輸入經(jīng)過模型產(chǎn)生輸出,這個輸出就是一個預(yù)測篇恒。所有的數(shù)據(jù)無論是訓(xùn)練集扶檐、測試集還是驗證集,走的都是同樣一個模型胁艰。
代碼中predict
方法實際上就調(diào)用了這個隱藏著的模型款筑,而模型在每一個樣本接受訓(xùn)練時都會做細(xì)微調(diào)整,模型所有的變動都存放在 CPU 或 GPU 的內(nèi)存中腾么。這也是為什么系數(shù) a奈梳、b、c 被聲明成 const
量解虱,而實際上它的值是一直在變化的攘须。
訓(xùn)練好了模型,就可以拿來輸入新數(shù)據(jù)殴泰,預(yù)測新輸出了于宙。
console.log('Start to predict the output of 4 through the trained model:', predict(4).dataSync()[0])
訓(xùn)練 train
訓(xùn)練過程其實很簡單,因為模型已經(jīng)初始化好了悍汛,訓(xùn)練只不過將樣本數(shù)據(jù)交給機(jī)器捞魁,讓它循環(huán)迭代而已。在每一輪迭代中离咐,都會產(chǎn)生一個損失值谱俭。代碼中的損失仍然是一個張量:
const error = dl.scalar(actual).sub(prediction).square();
通過損失反向回溯,從而不斷對權(quán)重系數(shù)進(jìn)行調(diào)整。這個過程就是交給優(yōu)化器optimizer
做的旺上。此處暫時不必追究優(yōu)化器是如何辦到的瓶蚂。
梯度下降 Gradient Descent
4 個樣本,依次經(jīng)過 50 次迭代訓(xùn)練宣吱,最終能訓(xùn)練出一個比較滿意(損失較星哉狻)的結(jié)果,這是為什么征候?梯度下降 起到了什么作用杭攻?
首先我們要知道,梯度下降一般有這些變體疤坝,它們彼此容易相互推導(dǎo)兆解。上例中用的是隨機(jī)梯度下降,后邊本文介紹最基本款“批量梯度下降”跑揉。
- 批量梯度下降
- 隨機(jī)梯度下降 sdl.train.sgd
- 小批量梯度下降
批量梯度下降算法
取神經(jīng)網(wǎng)絡(luò)模型中一個節(jié)點锅睛,其拓?fù)浣Y(jié)構(gòu)就是簡單的多輸入對單輸出。
取任意一個樣本 x 历谍,輸入到網(wǎng)絡(luò)后现拒,得到一個輸出值的 hw 。變量 x 包含有多個特征屬性值:x1, x2, ..., xj望侈,每個特征屬性都有一個權(quán)重系數(shù)與之對應(yīng)——于是進(jìn)一步將該計算準(zhǔn)確描述為一個加權(quán)運算:
假設(shè)該節(jié)點有一個已知真實值y印蔬,那么在該節(jié)點的損失就是:
實際中,對一個問題我們往往有非常多個樣本脱衙,比如m個樣本侥猬,任意一個樣本x(i)都應(yīng)該經(jīng)過這樣的運算,得到損失捐韩。最后所有m個樣本的總損失就是:
因為這是整個訓(xùn)練集的總損失退唠,它從整體上描述了模型的擬合度。如果找到了它的一個最小值奥帘,那么也就找到了最小損失铜邮。最小值問題,我們這里可以求導(dǎo)數(shù)寨蹋。
將權(quán)重系數(shù)w作為參數(shù),同樣它包含若干個分量 w1, w2, ..., wj, 對w求導(dǎo)意味著求每個分量的偏導(dǎo):
要想取到極小值扔茅,導(dǎo)數(shù)應(yīng)該趨向于0已旧。但是趨向于0并不是一步就能做到的,需要在反復(fù)迭代中調(diào)整權(quán)重系數(shù)召娜。那么每次權(quán)重系數(shù)w怎樣的變化运褪,才有助于做到這一點呢?事實上,只需這樣更新:
后記
通過一個簡單的線性規(guī)劃的例子秸讹,其實可以發(fā)現(xiàn)很多可以優(yōu)化檀咙、發(fā)散、深化的線索璃诀。這本身就是深度學(xué)習(xí)算法的樂趣所在~