深度學(xué)習(xí)與神經(jīng)網(wǎng)絡(luò)[notes]

[toc]

前面沒細(xì)看湿颅,這只是個(gè) mathjax 測試 $\delta^4$

Perceptron

感知器:step function

Tables Are Cool
col 3 is right-aligned $1600
col 2 is centered $12
zebra stripes are neat $1

參數(shù)的數(shù)學(xué)表示:weights和bias

矩陣

能實(shí)現(xiàn)各種邏輯運(yùn)算

用神經(jīng)網(wǎng)絡(luò)可以實(shí)現(xiàn)各種邏輯計(jì)算

可以通過算法自學(xué)習(xí)

最重要的特性载绿,通過輸入更新參數(shù)(hyper arg)

Sigmoid/Logistic neurons

重要特性: 因?yàn)樵摵瘮?shù)小的輸入變化產(chǎn)生小的輸出變化。

: Δoutput is a linear function of the changes Δwj and Δb in the weights and bias.

代數(shù)上方便:指數(shù)的微分性質(zhì)

The Architecture of neural network

MLP(Multiple Perceptron)

選取隱藏層沒有統(tǒng)一的經(jīng)驗(yàn)法則油航。

feedforward: 輸入不受輸出影響

Recurrent Neural Network:一種有限時(shí)間激發(fā)的崭庸,輸入受輸出影響的神經(jīng)網(wǎng)絡(luò)。

梯度下降

Cost Function: Loss function, Objective function.

quadratic cost function(mean squared error or just MSE.):

$$
\begin{eqnarray} C(w,b) \equiv
\frac{1}{2n} \sum_x | y(x) - a|^2.
\end{eqnarray}
$$

變量很多時(shí)谊囚,通過微分計(jì)算極值很麻煩的

Summing up, the way the gradient descent algorithm works is to repeatedly compute the gradient ?C, and then to move in the opposite direction,

$$
\begin{eqnarray}
\Delta C \approx \nabla C \cdot \Delta v.
\tag{9}\end{eqnarray}
$$

$$
\begin{eqnarray}
\Delta v = -\eta \nabla C,
\tag{10}\end{eqnarray}
$$

The rule doesn't always work - several things can go wrong and prevent gradient descent from finding the global minimum of C, a point we'll return to explore in later chapters. But, in practice gradient descent often works extremely well, and in neural networks we'll find that it's a powerful way of minimizing the cost function, and so helping the net learn.

rule

$$
\begin{eqnarray}
w_k & \rightarrow & w_k' = w_k-\eta \frac{\partial C}{\partial w_k} \tag{16}\end{eqnarray}
$$

$$
\begin{eqnarray}
b_l & \rightarrow & b_l' = b_l-\eta \frac{\partial C}{\partial b_l}.
\tag{17}\end{eqnarray}
$$

stochastic gradient descent,mini-batch

$$
\begin{eqnarray}
\nabla C \approx \frac{1}{m} \sum_{j=1}^m \nabla C_{X_{j}},
\tag{19}\end{eqnarray}
$$

$$
\begin{eqnarray}
w_k & \rightarrow & w_k' = w_k-\frac{\eta}{m}
\sum_j \frac{\partial C_{X_j}}{\partial w_k} \tag{20}
\end{eqnarray}
$$

$$
\begin{eqnarray}
b_l & \rightarrow & b_l' = b_l-\frac{\eta}{m}
\sum_j \frac{\partial C_{X_j}}{\partial b_l},
\tag{21}\end{eqnarray}
$$

online learning?

In general, debugging a neural network can be challenging. This is especially true when the initial choice of hyper-parameters produces results no better than random noise.

More generally, we need to develop heuristics for choosing good hyper-parameters and a good architecture

sophisticated algorithm ≤ simple learning algorithm + good training data. 

backpropagation

At the heart of backpropagation is an expression for the partial derivative ?C/?w of the cost function C with respect to any weight w (or bias b) in the network.

feedforward matrix-based computation

$$
\begin{eqnarray}
a^{l} = \sigma(w^l a{l-1}+bl).
\tag{25}\end{eqnarray}
$$

The goal of backpropagation is to compute the partial derivatives ?C/?w and ?C/?b of the cost function C with respect to any weight w or bias b in the network.

two assumptions

  1. This is the case for the quadratic cost function, where the cost for a single training example is $C_x=\frac{1}{2}∥y?aL∥2. $

    backpropagation actually lets us do is compute the partial derivatives ?Cx/?w and ?Cx/?b for a single training example.

  2. The second assumption we make about the cost is that it can be written as a function of the outputs from the neural network:

Hadamard product or Schur product: element-wise multiply

反向傳播的四個(gè)基礎(chǔ)公式

定義z的變化率 $\delta^l_j$

$$
\begin{eqnarray}
\delta^l_j \equiv \frac{\partial C}{\partial z^l_j}.
\tag{29}\end{eqnarray}
$$

之所以不用 $\frac{\partial{C}}{\partial{a^l_j}}$ 只是因?yàn)橛?jì)算方便怕享。

An equation for the error in the output layer

$$
\begin{eqnarray}
\delta^L_j = \frac{\partial C}{\partial a^L_j} \sigma'(z^L_j).
\tag{BP1}\end{eqnarray}
$$

$$
\begin{eqnarray}
\delta^L = \nabla_a C \odot \sigma'(z^L).
\tag{BP1a}\end{eqnarray}
$$

if cost function is quardratic function:

$$
\begin{eqnarray}
\delta^L = (a^L-y) \odot \sigma'(z^L).
\tag{30}\end{eqnarray}
$$

An equation for the error $ \delta_l $ ** in terms of the error in the next layer** $ δ_{l+1} $

$$
\begin{eqnarray}
\delta^l = ((w{l+1})T \delta^{l+1}) \odot \sigma'(z^l),
\tag{BP2}\end{eqnarray}
$$

By combining (BP2) with (BP1) we can compute the error δl for any layer in the network. We start by using (BP1) to compute δL, then apply Equation (BP2) to compute δL?1, then Equation (BP2) again to compute δL?2, and so on, all the way back through the network.

An equation for the rate of change of the cost with respect to any bias in the network

$$
\begin{eqnarray} \frac{\partial C}{\partial b^l_j} =
\delta^l_j.
\tag{BP3}\end{eqnarray}
$$

$$
\begin{eqnarray}
\frac{\partial C}{\partial b} = \delta,
\tag{31}\end{eqnarray}
$$

An equation for the rate of change of the cost with respect to any weight in the network

$$
\begin{eqnarray}
\frac{\partial C}{\partial w^l_{jk}} = a^{l-1}_k \delta^l_j.
\tag{BP4}\end{eqnarray}
$$

$$
\begin{eqnarray} \frac{\partial
C}{\partial w} = a_{\rm in} \delta_{\rm out},
\tag{32}\end{eqnarray}
$$

satureted: learn slowly , ?C/?w will also tend to be small

Summing up, we've learnt that a weight will learn slowly if either the input neuron is low-activation, or if the output neuron has saturated, i.e., is either high- or low-activation.

Summary: equations of backpropagation
Summary: equations of backpropagation

Input a set of training examples

結(jié)合向量化的minibatch來更新權(quán)重和偏差。

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末镰踏,一起剝皮案震驚了整個(gè)濱河市函筋,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌奠伪,老刑警劉巖跌帐,帶你破解...
    沈念sama閱讀 222,183評(píng)論 6 516
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場離奇詭異绊率,居然都是意外死亡谨敛,警方通過查閱死者的電腦和手機(jī),發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 94,850評(píng)論 3 399
  • 文/潘曉璐 我一進(jìn)店門滤否,熙熙樓的掌柜王于貴愁眉苦臉地迎上來脸狸,“玉大人,你說我怎么就攤上這事顽聂》什眩” “怎么了?”我有些...
    開封第一講書人閱讀 168,766評(píng)論 0 361
  • 文/不壞的土叔 我叫張陵紊搪,是天一觀的道長蜜葱。 經(jīng)常有香客問我,道長耀石,這世上最難降的妖魔是什么牵囤? 我笑而不...
    開封第一講書人閱讀 59,854評(píng)論 1 299
  • 正文 為了忘掉前任,我火速辦了婚禮滞伟,結(jié)果婚禮上揭鳞,老公的妹妹穿的比我還像新娘。我一直安慰自己梆奈,他們只是感情好野崇,可當(dāng)我...
    茶點(diǎn)故事閱讀 68,871評(píng)論 6 398
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著亩钟,像睡著了一般乓梨。 火紅的嫁衣襯著肌膚如雪鳖轰。 梳的紋絲不亂的頭發(fā)上,一...
    開封第一講書人閱讀 52,457評(píng)論 1 311
  • 那天扶镀,我揣著相機(jī)與錄音蕴侣,去河邊找鬼。 笑死臭觉,一個(gè)胖子當(dāng)著我的面吹牛昆雀,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播蝠筑,決...
    沈念sama閱讀 40,999評(píng)論 3 422
  • 文/蒼蘭香墨 我猛地睜開眼狞膘,長吁一口氣:“原來是場噩夢(mèng)啊……” “哼!你這毒婦竟也來了菱肖?” 一聲冷哼從身側(cè)響起客冈,我...
    開封第一講書人閱讀 39,914評(píng)論 0 277
  • 序言:老撾萬榮一對(duì)情侶失蹤,失蹤者是張志新(化名)和其女友劉穎稳强,沒想到半個(gè)月后场仲,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 46,465評(píng)論 1 319
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 38,543評(píng)論 3 342
  • 正文 我和宋清朗相戀三年,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了岔霸。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點(diǎn)故事閱讀 40,675評(píng)論 1 353
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡亦鳞,死狀恐怖,靈堂內(nèi)的尸體忽然破棺而出棒坏,到底是詐尸還是另有隱情燕差,我是刑警寧澤,帶...
    沈念sama閱讀 36,354評(píng)論 5 351
  • 正文 年R本政府宣布坝冕,位于F島的核電站徒探,受9級(jí)特大地震影響,放射性物質(zhì)發(fā)生泄漏喂窟。R本人自食惡果不足惜测暗,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 42,029評(píng)論 3 335
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望磨澡。 院中可真熱鬧碗啄,春花似錦、人聲如沸稳摄。這莊子的主人今日做“春日...
    開封第一講書人閱讀 32,514評(píng)論 0 25
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽厦酬。三九已至尉共,卻和暖如春褒傅,著一層夾襖步出監(jiān)牢的瞬間,已是汗流浹背袄友。 一陣腳步聲響...
    開封第一講書人閱讀 33,616評(píng)論 1 274
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留霹菊,地道東北人剧蚣。 一個(gè)月前我還...
    沈念sama閱讀 49,091評(píng)論 3 378
  • 正文 我出身青樓,卻偏偏與公主長得像旋廷,于是被迫代替她去往敵國和親鸠按。 傳聞我的和親對(duì)象是個(gè)殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 45,685評(píng)論 2 360

推薦閱讀更多精彩內(nèi)容