【知識預備】: UFLDL教程 - 反向傳導算法
首先我們不講數(shù)學嘱么,先上圖解友雳,看完圖不懂再看后面:
"BP" Math Principle
======================================================================
Example:下面看一個簡單的三層神經(jīng)網(wǎng)絡(luò)模型,一層輸入層依鸥,一層隱藏層眶熬,一層輸出層侵佃。
注:定義輸入分別為x1, x2(對應圖中的i1,i2)踱蛀,期望輸出為y1窿给,y2,假設(shè)logistic函數(shù)采用sigmoid函數(shù):
![][40]
[40]:http://latex.codecogs.com/png.latex?y%20=%20f(x)=sigmoid(x)%20=\frac{1}{1%20+%20e^{-x}}
易知:
![][00]
[00]:http://latex.codecogs.com/png.latex?f%27(x)%20=%20f(x)%20*%20(1%20-%20f(x))
下面開始正式分析(純手打P歉凇L畲蟆!)俏橘。
======================================================================
前向傳播
首先分析神經(jīng)元h1:
![][01]
[01]:http://latex.codecogs.com/png.latex?input_{(h1)}%20=%20w1%20%20x1%20+%20w2%20%20x2%20+%20b1
![][02]
[02]:http://latex.codecogs.com/png.latex?output_{(h1)}%20=%20f(input_{(h1)})%20=%20\frac{1}{1%20+%20e^{-(w1x1+w2x2+b1)}}
同理可得神經(jīng)元h2:
![][03]
[03]:http://latex.codecogs.com/png.latex?input_{(h2)}%20=%20w3%20%20x1%20+%20w4%20%20x2%20+%20b1
![][04]
[04]:http://latex.codecogs.com/png.latex?output_{(h2)}%20=%20f(input_{(h2)})%20=%20\frac{1}{1%20+%20e^{-(w3x1+w4x2+b1)}}
對輸出層神經(jīng)元重復這個過程允华,使用隱藏層神經(jīng)元的輸出作為輸入。這樣就能給出o1,o2的輸入輸出:
![][05]
[05]:http://latex.codecogs.com/png.latex?input_{(o1)}%20=%20w5%20%20output_{(h1)}%20+%20w6%20%20output_{(h2)}%20+%20b2
![][06]
[06]:http://latex.codecogs.com/png.latex?output_{(o1)}%20=%20f(input_{(o1)})
![][07]
[07]:http://latex.codecogs.com/png.latex?input_{(o2)}%20=%20w7%20%20output_{(h1)}%20+%20w8%20%20output_{(h2)}%20+%20b2
![][08]
[08]:http://latex.codecogs.com/png.latex?output_{(o2)}%20=%20f(input_{(o2)})
現(xiàn)在開始統(tǒng)計所有誤差靴寂,如下:
![][09]
[09]:http://latex.codecogs.com/png.latex?J_{total}%20=%20\sum%20\frac{1}{2}(output%20-%20target)^2%20=%20J_{o1}+J_{o2}
![][10]
[10]:http://latex.codecogs.com/png.latex?J_{o1}%20=%20\frac{1}{2}(output(o1)-y1)^2
![][11]
[11]:http://latex.codecogs.com/png.latex?J_{o2}%20=%20\frac{1}{2}(output(o2)-y2)^2
======================================================================
反向傳播
【輸出層】
對于w5磷蜀,想知道其改變對總誤差有多少影響,于是求Jtotal對w5的偏導數(shù)百炬,如下:
![][12]
[12]:http://latex.codecogs.com/png.latex?\frac{\partial%20J_{total}}{\partial%20w5}=\frac{\partial%20J_{total}}{\partial%20output_{(o1)}}\frac{\partial%20output_{(o1)}}{\partial%20input_{(o1)}}\frac{\partial%20input_{(o1)}}{\partial%20w5}
分別求每一項:
![][13]
[13]:http://latex.codecogs.com/png.latex?\frac{\partial%20J_{total}}{\partial%20output_{(o1)}}=\frac{\partial%20J_{o1}}{\partial%20output_{(o1)}}=output_{(o1)}-y_1
![][14]
[14]:http://latex.codecogs.com/png.latex?\frac{\partial%20output_{(o1)}}{\partial%20input_{(o1)}}%20=%20f%27(input_{(o1)})=output_{(o1)}*(1%20-%20output_{(o1)})
![][15]
[15]:http://latex.codecogs.com/png.latex?\frac{\partial%20input_{(o1)}}{\partial%20w5}=\frac{\partial%20(w5%20%20output_{(h1)}%20+%20w6%20%20output_{(h2)}%20+%20b2)}{\partial%20w5}=output_{(h1)}
于是有Jtotal對w5的偏導數(shù):
![][16]
[16]:http://latex.codecogs.com/png.latex?\frac{\partial%20J_{total}}{\partial%20w5}=(output_{(o1)}-y1)[output_{(o1)}(1%20-%20output_{(o1)})]*output_{(h1)}
據(jù)此更新權(quán)重w5褐隆,有:
![][17]
[17]:http://latex.codecogs.com/png.latex?w5^+%20=%20w5%20-%20\eta*\frac{\partial%20J_{total}}{\partial%20w5}
同理可以更新參數(shù)w6,w7剖踊,w8庶弃。
在有新權(quán)重導入隱藏層神經(jīng)元(即,當繼續(xù)下面的反向傳播算法時德澈,使用原始權(quán)重歇攻,而不是更新的權(quán)重)之后,執(zhí)行神經(jīng)網(wǎng)絡(luò)中的實際更新梆造。
【隱藏層】
對于w1缴守,想知道其改變對總誤差有多少影響,于是求Jtotal對w1的偏導數(shù)镇辉,如下:
![][18]
[18]:http://latex.codecogs.com/png.latex?\frac{\partial%20J_{total}}{\partial%20w1}=\frac{\partial%20J_{total}}{\partial%20output_{(h1)}}\frac{\partial%20output_{(h1)}}{\partial%20input_{(h1)}}\frac{\partial%20input_{(h1)}}{\partial%20w1}
分別求每一項:
![][19]
[19]:http://latex.codecogs.com/png.latex?\frac{\partial%20J_{total}}{\partial%20output_{(h1)}}=\frac{\partial%20J_{o1}}{\partial%20output_{(h1)}}+\frac{\partial%20J_{o2}}{\partial%20output_{(h1)}}
![][20]
[20]:http://latex.codecogs.com/png.latex?\frac{\partial%20J_{o1}}{\partial%20output_{(h1)}}=\frac{\partial%20J_{o1}}{\partial%20output_{(o1)}}\frac{\partial%20output_{(o1)}}{\partial%20input_{(o1)}}\frac{\partial%20input_{(o1)}}{\partial%20output_{(h1)}}
![][21]
[21]:http://latex.codecogs.com/png.latex?=(output_{(o1)}-y1)[output_{(o1)}(1%20-%20output_{(o1)})]*w5
![][22]
[22]:http://latex.codecogs.com/png.latex?\frac{\partial%20J_{o2}}{\partial%20output_{(h1)}}=\frac{\partial%20J_{o2}}{\partial%20output_{(o2)}}\frac{\partial%20output_{(o2)}}{\partial%20input_{(o2)}}\frac{\partial%20input_{(o2)}}{\partial%20output_{(h1)}}
![][23]
[23]:http://latex.codecogs.com/png.latex?=(output_{(o2)}-y2)[output_{(o2)}(1%20-%20output_{(o2)})]*w7
![][24]
[24]:http://latex.codecogs.com/png.latex?\frac{\partial%20output_{(h1)}}{\partial%20input_{(h1)}}%20=%20f%27(input_{(h1)})=output_{(h1)}*(1%20-%20output_{(h1)})
![][25]
[25]:http://latex.codecogs.com/png.latex?\frac{\partial%20input_{(h1)}}{\partial%20w1}=\frac{\partial%20(w1%20%20x1%20+%20w2%20%20x2%20+%20b1)}{\partial%20w1}=x1
于是有Jtotal對w1的偏導數(shù):
![][26]
[26]:http://latex.codecogs.com/png.latex?\frac{\partial%20J_{total}}{\partial%20w1}={(output_{(o1)}-y1)[output_{(o1)}(1%20-%20output_{(o1)})]*w5
![][27]
[27]:http://latex.codecogs.com/png.latex?+%20(output_{(o2)}-y2)[output_{(o2)}(1%20-%20output_{(o2)})]w7}
![][28]
[28]:http://latex.codecogs.com/png.latex?[output_{(h1)}(1%20-%20output_{(h1)})]x1
據(jù)此更新w1屡穗,有:
![][29]
[29]:http://latex.codecogs.com/png.latex?w1^+%20=%20w1%20-%20\eta*\frac{\partial%20J_{total}}{\partial%20w1}
同理可以更新參數(shù)w2,w3忽肛,w4村砂。
======================================================================
應用實例
假設(shè)對于上述簡單三層網(wǎng)絡(luò)模型,按如下方式初始化權(quán)重和偏置:
根據(jù)上述推導的公式:
由
![][01]
得到:
input(h1) = 0.15 * 0.05 + 0.20 * 0.10 + 0.35 = 0.3775
output(h1) = f(input(h1)) = 1 / (1 + e^(-input(h1))) = 1 / (1 + e^-0.3775) = 0.593269992
同樣得到:
input(h2) = 0.25 * 0.05 + 0.30 * 0.10 + 0.35 = 0.3925
output(h2) = f(input(h2)) = 1 / (1 + e^(-input(h2))) = 1 / (1 + e^-0.3925) = 0.596884378
對輸出層神經(jīng)元重復這個過程麻裁,使用隱藏層神經(jīng)元的輸出作為輸入箍镜。這樣就能給出o1的輸出:
input(o1) = w5 * output(h1) + w6 * (output(h2)) + b2 = 0.40 * 0.593269992 + 0.45 * 0.596884378 + 0.60 = 1.105905967
output(o1) = f(input(o1)) = 1 / (1 + e^-1.105905967) = 0.75136507
同理output(o2) = 0.772928465
開始統(tǒng)計所有誤差,求代價函數(shù):
Jo1 = 1/2 * (0.75136507 - 0.01)^2 = 0.298371109
Jo2 = 1/2 * (0.772928465 - 0.99)^2 = 0.023560026
綜合所述煎源,可以得到總誤差為:Jtotal = Jo1 + Jo2 = 0.321931135
然后反向傳播,根據(jù)公式
![][16]
求出 Jtotal對w5的偏導數(shù)為:
a = (0.75136507 - 0.01)*0.75136507*(1-0.75136507)*0.593269992 = 0.082167041
為了減少誤差香缺,然后從當前的權(quán)重減去這個值(可選擇乘以一個學習率手销,比如設(shè)置為0.5),得:
w5+ = w5 - eta * a = 0.40 - 0.5 * 0.082167041 = 0.35891648
同理可以求出:
w6+ = 0.408666186
w7+ = 0.511301270
w8+ = 0.561370121
對于隱藏層图张,更新w1锋拖,求Jtotal對w1的偏導數(shù):
![][26]
![][27]
![][28]
偏導數(shù)為:
b = (tmp1 + tmp2) * tmp3
tmp1 = (0.75136507 - 0.01) * [0.75136507 * (1 - 0.75136507)] * 0.40 = 0.74136507 * 0.186815602 * 0.40 = 0.055399425
tmp2 = -0.019049119
tmp3 = 0.593269992 * (1 - 0.593269992) * 0.05 = 0.012065035
于是b = 0.000438568
更新權(quán)重w1為:
w1+ = w1 - eta * b = 0.15 - 0.5 * 0.000438568 = 0.149780716
同樣可以求得:
w2+ = 0.19956143
w3+ = 0.24975114
w4+ = 0.29950229
最后,更新了所有的權(quán)重祸轮! 當最初前饋傳播時輸入為0.05和0.1兽埃,網(wǎng)絡(luò)上的誤差是0.298371109。 在第一輪反向傳播之后适袜,總誤差現(xiàn)在下降到0.291027924柄错。 它可能看起來不太多,但是在重復此過程10,000次之后。例如售貌,錯誤傾斜到0.000035085给猾。
在這一點上,當前饋輸入為0.05和0.1時颂跨,兩個輸出神經(jīng)元產(chǎn)生0.015912196(相對于目標為0.01)和0.984065734(相對于目標為0.99)敢伸,已經(jīng)很接近了O(∩_∩)O~~
Reference
- https://zhuanlan.zhihu.com/p/23270674
- Principles of training multi-layer neural network using backpropagation
- [RNN] Simple LSTM代碼實現(xiàn) & BPTT理論推導
- 簡書中如何編輯Latex數(shù)學公式
(注:感謝您的閱讀,希望本文對您有所幫助恒削。如果覺得不錯歡迎分享轉(zhuǎn)載池颈,但請先點擊 這里 獲取授權(quán)。本文由 版權(quán)印 提供保護钓丰,禁止任何形式的未授權(quán)違規(guī)轉(zhuǎn)載躯砰,謝謝!)