CS224n Assignment 1

https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1174/syllabus.html 第一次作業(yè)筆記

Softmax

softmax常數(shù)不變性

softmax(x_i) = \frac{e^{x_i}}{\sum_j{e^{x_j}}}

\begin{align} (softmax(x + c))_{i}= \frac{e^{x_{i} + c}}{\sum_{j} e^{x_{j} + c}} = \frac{e^{x_{i}} \times e^{c}}{e^{c} \times \sum_{j} e^{x_{j}}} \nonumber \\ = \frac{e^{x_{i}} \times {e^{c}}}{{e^{c}} \times \sum_{j} e^{x_{j}}} = (softmax(x))_{i} \nonumber \end{align}

由于e^{x+y}=e^x * e^y,因此多余的e^c可以上下消除,于是:
這里

softmax(x) = softmax(x + c)

發(fā)現(xiàn)了一個(gè)Softmax非常好的性質(zhì)笋轨,即使兩個(gè)數(shù)都很大比如 10001001了讨,其結(jié)果與 12的結(jié)果相同,即其只關(guān)注數(shù)字之間的差放棒,而不是差占的比例死讹。

Python實(shí)現(xiàn)

之所以介紹Softmax常數(shù)不變性推励,是因?yàn)榘l(fā)現(xiàn)給定的測(cè)試用例非常大悉默,直接計(jì)算e^x次方

import numpy as np


def softmax(x):
    orig_shape = x.shape

    if len(x.shape) > 1:
        # Matrix
        ### YOUR CODE HERE
        x_max = np.max(x, axis=1).reshape(x.shape[0], 1)
        x -= x_max
        exp_sum = np.sum(np.exp(x), axis=1).reshape(x.shape[0], 1)
        x = np.exp(x) / exp_sum 
        ### END YOUR CODE
    else:
        # Vector
        ### YOUR CODE HERE
        x_max = np.max(x)
        x -= x_max
        exp_sum = np.sum(np.exp(x))
        x = np.exp(x) / exp_sum
        ### END YOUR CODE
        #or:  x = (np.exp(x)/sum(np.exp(x)))   

    assert x.shape == orig_shape
    return x

def test_softmax_basic():
    """
    Some simple tests to get you started.
    Warning: these are not exhaustive.
    """
    print("Running basic tests...")
    test1 = softmax(np.array([1,2]))
    print(test1)
    ans1 = np.array([0.26894142,  0.73105858])
    assert np.allclose(test1, ans1, rtol=1e-05, atol=1e-06)

    test2 = softmax(np.array([[1001,1002],[3,4]]))
    print(test2)
    ans2 = np.array([
        [0.26894142, 0.73105858],
        [0.26894142, 0.73105858]])
    assert np.allclose(test2, ans2, rtol=1e-05, atol=1e-06)

    test3 = softmax(np.array([[-1001,-1002]]))
    print(test3)
    ans3 = np.array([0.73105858, 0.26894142])
    assert np.allclose(test3, ans3, rtol=1e-05, atol=1e-06)

    print("You should be able to verify these results by hand!\n")

if __name__ == "__main__":
    test_softmax_basic()

神經(jīng)網(wǎng)絡(luò)基礎(chǔ)

梯度檢查

image

Sigmoid導(dǎo)數(shù)

定義\sigma(x)如下城豁,發(fā)現(xiàn)\sigma(x) + \sigma(-x) = 1
\sigma(x) &= \frac{1}{1 + e^{-x}} = \frac{e^x}{e^x + 1}\\ \sigma(-x) &= \frac{1}{1 + e^{x}} = \frac{e^x+1}{e^x + 1} - \frac{e^x}{e^x + 1}=1-\sigma(x)

image

即: \sigma' = \sigma(x) \times (1-\sigma(x))

交叉熵定義

當(dāng)使用交叉熵作為評(píng)價(jià)指標(biāo)時(shí)抄课,求梯度:

  • 已知: \hat{y} = softmax(\theta)
  • 交叉熵: CE(y,\hat{y}) = - \sum_i{y_i \times log(\hat{y_i})}

其中\boldsymbol{y}是指示變量唱星,如果該類(lèi)別和樣本的類(lèi)別相同就是1,否則就是0跟磨。因?yàn)閥一般為one-hot類(lèi)型间聊。

\hat{y_i} 表示每種類(lèi)型的概率,概率已經(jīng)過(guò)softmax計(jì)算吱晒。

對(duì)于交叉熵其實(shí)有多重定義的方式,但含義相同:

分別為:

二分類(lèi)定義

\begin{align}J = ?[y\cdot log(p)+(1?y)\cdot log(1?p)]\end{align} \\

  • y——表示樣本的label沦童,正類(lèi)為1仑濒,負(fù)類(lèi)為0
  • p——表示樣本預(yù)測(cè)為正的概率

多分類(lèi)定義

\begin{align}J = -\sum_{c=1}^My_{c}\log(p_{c})\end{align} \\

  • y——指示變量(0或1),如果該類(lèi)別和樣本的類(lèi)別相同就是1,否則是0偷遗;
  • p——對(duì)于觀測(cè)樣本屬于類(lèi)別c的預(yù)測(cè)概率墩瞳。

但表示的意思都相同,交叉熵用于反映 分類(lèi)正確時(shí)的概率情況氏豌。

Softmax導(dǎo)數(shù)

進(jìn)入解答:

  • 首先定義S_i和分子分母喉酌。

S_i = softmax(\theta) = \frac{f_i}{g_i} = \frac{e^{\theta_i}}{\sum^{k}_{k=1}e^{\theta_k}}

  • S_i對(duì)\theta_j求導(dǎo):

    \begin{align} \frac{\partial{S_i}}{\partial{\theta_j}} &= \frac{f_i'g_i - f_ig_i'}{g_i}\\ &=\frac{(e^{\theta_i})'\sum^{k}_{k=1}e^{\theta_k} - e^{\theta_i}(\sum^{k}_{k=1}e^{\theta_k})'}{(\sum^{k}_{k=1}e^{\theta_k})^2}\end{align}

注意: S_i分子是\theta_i ,分母是所有的\theta 泵喘,而求偏微的是\theta_j 泪电。

  • 因此,根據(jù)i與j的關(guān)系纪铺,分為兩種情況:
  • 當(dāng) i == j 時(shí):
$f_i' = e^{\theta_i}$,$g_i' = e^{\theta_j}$ 



$\begin{align} \frac{\partial{S_i}}{\partial{\theta_j}} &=\frac{e^{\theta_i}\sum^{k}_{k=1}e^{\theta_k} - e^{\theta_i}e^{\theta_j}}{(\sum^{k}_{k=1}e^{\theta_k})^2} \\ &= \frac{e^{\theta_{i}}}{\sum_{k} e^{\theta_{k}}} \times \frac{\sum_{k} e^{\theta_{k}} – e^{\theta_{j}}}{\sum_{k} e^{\theta_{k}}} \nonumber \\  &= S_{i} \times (1 – S_{i})   \end{align}$
  • 當(dāng)i \not= j時(shí):
$f'_{i} = 0 $,$g'_{i} = e^{\theta_{j}}$ 



$\begin{align}  \frac{\partial{S_i}}{\partial{\theta_j}} &= \frac{0 – e^{\theta_{j}} e^{\theta_{i}}}{(\sum_{k} e^{\theta_{k}})^{2}}  \\&= – \frac{e^{\theta_{j}}}{\sum_{k} ^{\theta_{k}}} \times \frac{e^{\theta_{i}}}{\sum_{k} e^{\theta_{k}}} \\ &=-S_j \times S_i\end{align}$

交叉熵梯度

計(jì)算\frac{\partial{CE}}{\partial{\theta_i}} 相速,根據(jù)鏈?zhǔn)椒▌t,\frac{\partial{CE}}{\partial{\theta_i}} = \frac{\partial{CE}}{\partial{S_i}}\frac{\partial{S_i}}{\partial{\theta_j}}

  • \hat{y} = softmax(\theta)

  • CE(y,\hat{y}) = - \sum_k{y_k log(\hat{y_k})}

$\begin{align}      \frac{\partial CE}{\partial \theta_{i}} &= – \sum_{k} y_{k} \frac{\partial log S_{k}}{\partial \theta_{i}}  \\&= – \sum_{k} y_{k} \frac{1}{S_{k}} \frac{\partial S_{k}}{\partial \theta_{i}} \\ &= – y_{i} (1 – S_{i}) – \sum_{k \ne i} y_{k} \frac{1}{S_{k}} (-S_{k} \times S_{i}) \\ &= – y_{i} (1 – S_{i}) + \sum_{k \ne i} y_{k} S_{i} \\ &= S_{i}(\sum_{k} y_{k}) – y_{i}\end{align}$

因?yàn)?img class="math-inline" src="https://math.jianshu.com/math?formula=%5Csum_%7Bk%7D%20y_%7Bk%7D%3D1" alt="\sum_{k} y_{k}=1" mathimg="1">鲜锚,所以\frac{\partial CE}{\partial \theta_{i}} = S_i - y_i = \hat{y} - y

反向傳播計(jì)算神經(jīng)網(wǎng)絡(luò)梯度

根據(jù)題目給定的定義:

image

已知損失函數(shù)J = CE突诬,h = sigmoid(xW_1+b_1), \hat{y} = softmax(hW_2+b_2)

\frac{\partial{J}}{\partial{x}},\frac{\partial{J}}{\partial{W_2}},\frac{\partial{J}}{\partial{W1}},\frac{\partial{J}}{\partial{b2}},\frac{\partial{J}}{\partial{b_1}}

解答:

反向傳播,定義z_2 = hW_2 + b_2芜繁, z_1 = xW_1 + b_1

對(duì)于輸出層\hat{y}來(lái)說(shuō)旺隙,\hat{y}的輸入為 z_2 = hW_2+b_2,而輸出則為 \hat{y} = softmax(z_2)

上小節(jié)計(jì)算得到 CE(y,\hat{y}) = - \sum_k{y_k log(\hat{y_k})} 的梯度為 \frac{\partial CE}{\partial \theta_{i}} = \hat{y} - y,

可以使用 z_2 替代 \theta_i 骏令,得到

  • \delta_1 = \frac{\partial{CE}}{\partial{z_2}} = \hat{y} - y

  • \begin{align} \delta_2 = \frac{\partial{CE}}{\partial{h}} = \frac{\partial{CE}}{\partial{z_2}} \frac{\partial{z_2}}{\partial{h}} = \delta_1W_2^T \end{align}

  • \begin{align}\delta_3 = \frac{\partial{CE}}{z_1} = \frac{\partial{CE}}{\partial{h}}\frac{\partial{h}}{\partial{z_1}} = \delta_2 \frac{\partial{h}}{\partial{z_1}}= \delta_2 \circ \sigma'(z_1)\end{align} # 推測(cè)這里使用點(diǎn)乘的原因是\delta_2經(jīng)過(guò)計(jì)算后蔬捷,應(yīng)該是一個(gè)標(biāo)量,而不是向量榔袋。

  • 于是得到:\frac{\partial{CE}}{\partial{x}}=\delta_3\frac{\partial{z_1}}{\partial{x}} = \delta_3W_1^T

與計(jì)算\frac{\partial{CE}}{\partial{x}}相似抠刺,計(jì)算

  • \frac{\partial{CE}}{\partial{W_2}} = \frac{\partial{CE}}{\partial{z_2}}\frac{\partial{z_2}}{\partial{W_2}}=\delta_1 \cdot h
  • \frac{\partial{CE}}{\partial{b_2}} = \frac{\partial{CE}}{\partial{z_2}}\frac{\partial{z_2}}{\partial{b_2}}=\delta_1
  • \frac{\partial{CE}}{\partial{W_1}} = \frac{\partial{CE}}{\partial{z_1}}\frac{\partial{z_1}}{\partial{W_1}}=\delta_3 \cdot x
  • \frac{\partial{CE}}{\partial{b_1}} = \frac{\partial{CE}}{\partial{z_1}}\frac{\partial{z_1}}{\partial{b_1}}=\delta_3

如果仍然對(duì)反向傳播有疑惑

參數(shù)數(shù)量

\begin{align} n_{W_{1}} &= D_{x} \times H \\ n_{b_{1}} &= H \\ n_{W_{2}} &= H \times D_{y} \\ n_{b_{2}} &= D_{y} \\ N &= (D_{x} \times H) + H + (H \times D_{y}) + D_{y} \\ &=(D_x+1)\times H+(H+1)\times D_y \end{align}

代碼實(shí)現(xiàn)

  • sigmoid和對(duì)應(yīng)的梯度
def sigmoid(x):
    s = 1 / (1 + np.exp(-x))
    return s

def sigmoid_grad(s):
    ds = s * (1-s)
    return ds
  • 梯度檢查
import numpy as np
import random


# First implement a gradient checker by filling in the following functions
def gradcheck_naive(f, x):
    """ Gradient check for a function f.

    Arguments:
    f -- a function that takes a single argument and outputs the
         cost and its gradients
    x -- the point (numpy array) to check the gradient at
    """

    rndstate = random.getstate()
    random.setstate(rndstate)
    fx, grad = f(x) # Evaluate function value at original point
    h = 1e-4        # Do not change this!

    # Iterate over all indexes in x
    it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
    while not it.finished:
        ix = it.multi_index
        print(ix)
        # Try modifying x[ix] with h defined above to compute
        # numerical gradients. Make sure you call random.setstate(rndstate)
        # before calling f(x) each time. This will make it possible
        # to test cost functions with built in randomness later.

        ### YOUR CODE HERE:
        x[ix] += h
        new_f1 = f(x)[0]
        x[ix] -= 2*h
        random.setstate(rndstate)
        new_f2 = f(x)[0]
        x[ix] += h
        numgrad = (new_f1 - new_f2) / (2 * h)
        ### END YOUR CODE

        # Compare gradients
        reldiff = abs(numgrad - grad[ix]) / max(1, abs(numgrad), abs(grad[ix]))
        if reldiff > 1e-5:
            print("Gradient check failed.")
            print("First gradient error found at index %s" % str(ix))
            print("Your gradient: %f \t Numerical gradient: %f" % (
                grad[ix], numgrad))
            return

        it.iternext() # Step to next dimension

    print("Gradient check passed!")

  • 反向傳播
  def forward_backward_prop(data, labels, params, dimensions):
      """
      Forward and backward propagation for a two-layer sigmoidal network
      Compute the forward propagation and for the cross entropy cost,
      and backward propagation for the gradients for all parameters.
      Arguments:
      data -- M x Dx matrix, where each row is a training example.
      labels -- M x Dy matrix, where each row is a one-hot vector.
      params -- Model parameters, these are unpacked for you.
      dimensions -- A tuple of input dimension, number of hidden units
                    and output dimension
      """
      ### Unpack network parameters (do not modify)
      ofs = 0
      Dx, H, Dy = (dimensions[0], dimensions[1], dimensions[2])
      W1 = np.reshape(params[ofs:ofs+ Dx * H], (Dx, H))
      ofs += Dx * H
      b1 = np.reshape(params[ofs:ofs + H], (1, H))
      ofs += H
      W2 = np.reshape(params[ofs:ofs + H * Dy], (H, Dy))
      ofs += H * Dy
      b2 = np.reshape(params[ofs:ofs + Dy], (1, Dy))
      ### YOUR CODE HERE: forward propagation
      h = sigmoid(np.dot(data,W1) + b1)
      yhat = softmax(np.dot(h,W2) + b2)
      ### END YOUR CODE
      ### YOUR CODE HERE: backward propagation
      cost = np.sum(-np.log(yhat[labels==1])) 
      
      d1 = (yhat - labels)
      gradW2 = np.dot(h.T, d1)
      gradb2 = np.sum(d1,0,keepdims=True)
      
      d2 = np.dot(d1,W2.T)
      # h = sigmoid(z_1)
      d3 = sigmoid_grad(h) * d2
      gradW1 = np.dot(data.T,d3)
      gradb1 = np.sum(d3,0)
      
      ### END YOUR CODE
      ### Stack gradients (do not modify)
      grad = np.concatenate((gradW1.flatten(), gradb1.flatten(),
          gradW2.flatten(), gradb2.flatten()))
      return cost, grad

word2vec

關(guān)于詞向量的梯度

在以softmax為假設(shè)函數(shù)的word2vec中

\begin{align} \hat{\boldsymbol{y}}_{o} = p(\boldsymbol{o} \vert \boldsymbol{c}) =\frac{exp(\boldsymbol{u}_{0}^{T} \boldsymbol{v}_{c})}{\sum\limits_{w=1}^{W} exp(\boldsymbol{u}_{w}^{T} \boldsymbol{v}_{c})} \end{align}

\boldsymbol{v}_{c}是中央單詞的詞向量

\boldsymbol{u}_{w} (w = 1,...,W) 是第 w個(gè)詞語(yǔ)的詞向量高蜂。

假設(shè)使用交叉熵作為損失函數(shù), \boldsymbol{o} 為正確單詞 (one-hot向量的第 \boldsymbol{o}維為1)罕容,請(qǐng)推導(dǎo)損失函數(shù)關(guān)于\boldsymbol{v}_c的梯度备恤。

提示:

\begin{align}J_{softmax-CE}(\boldsymbol{o}, \boldsymbol{v}_{c}, \boldsymbol{U}) = CE(\boldsymbol{y}, \hat{\boldsymbol{y}})\end{align}

其中\boldsymbol{U} = [\boldsymbol{u}_{1},\boldsymbol{u}_{1},\dots, \boldsymbol{u}_{W}]是所有詞向量構(gòu)成的矩陣。

解答:

首先明確本題給定的模型是skip-gram 锦秒,通過(guò)給定中心詞露泊,來(lái)發(fā)現(xiàn)周?chē)~的。

定義z=U^T \cdot v_c 旅择,U 表示所有詞向量組成的矩陣惭笑,而v_c 也表示的是一個(gè)詞向量。

hint: 如果兩個(gè)向量相似性越高生真,則乘積也就越大沉噩。想象一下余弦?jiàn)A角,應(yīng)該比較好明白柱蟀。

因?yàn)?img class="math-inline" src="https://math.jianshu.com/math?formula=U" alt="U" mathimg="1">中所有的詞向量川蒙,都和v_c乘一下獲得z

z是干嘛用的呢长已? z內(nèi)就有W個(gè)值畜眨,每個(gè)值表示和v_c 相似程度,通過(guò)這個(gè)相似度softmax選出最大值术瓮,然后與實(shí)際對(duì)比康聂,進(jìn)行交叉熵的計(jì)算。

已知: \frac{\partial z}{\partial v_c} = U\frac{\partial J}{\partial \boldsymbol{z}} = (\hat{\boldsymbol{y}} -\boldsymbol{y})

因此:\frac{\partial J}{\partial{v_c}} =\frac{\partial J}{\partial \boldsymbol{z}} \frac{\partial z}{\partial v_c} = U(\hat{\boldsymbol{y}} -\boldsymbol{y})


除了上述表示之外胞四,還有另一種計(jì)算方法

image
image
image

[圖片上傳失敗...(image-53cc75-1557025564256)]

于是: \frac{\partial J}{\partial{v_c}} = -u_i + \sum^{W}_{w=1}\hat{y_w}u_w

仔細(xì)觀察這兩種寫(xiě)法早抠,會(huì)發(fā)現(xiàn)其實(shí)是一回事,都是 觀察與期望的差(\hat{y} - y)撬讽。

推導(dǎo)lookup-table梯度

與詞向量相似

\frac{\partial J}{\partial{U}} =\frac{\partial J}{\partial \boldsymbol{z}} \frac{\partial z}{\partial U} = v_c(\hat{\boldsymbol{y}} -\boldsymbol{y})^{T}

負(fù)采樣時(shí)的梯度推導(dǎo)

假設(shè)進(jìn)行負(fù)采樣蕊连,樣本數(shù)為\boldsymbol{K},正確答案為\boldsymbol{o}游昼,那么有o \notin \{1,...,K\}甘苍。負(fù)采樣損失函數(shù)定義如下:

\begin{align}J_{neg-sample}(\boldsymbol{o}, \boldsymbol{v}_{c}, \boldsymbol{U}) = ? log(\sigma(\boldsymbol{u}_{o}^{T} \boldsymbol{v}_{c})) - \sum\limits_{k=1}^{K} log(\sigma(-\boldsymbol{u}_{k}^{T} \boldsymbol{v}_{c})) \end{align}

其中:

\begin{align}\sigma(x) = \frac{1}{1 + e^{-x}} \nonumber \\= \frac{e^{x}}{1 + e^{x}} \nonumber \end{align}

\frac{\partial}{\partial x} \sigma(x) = \sigma(x) \times (1 - \sigma(x))

解答:

首先說(shuō)明一下,J_{neg-sample}從哪里來(lái)的烘豌,參考note1 第11頁(yè)载庭,會(huì)有一個(gè)非常詳細(xì)的解釋。

\begin{align} \frac{\partial J}{\partial v_c}&=\left(\sigma(u_o^Tv_c)-1\right)u_o-\sum_{k=1}^K\left(\sigma(-u_k^Tv_c)-1\right)u_k\\ \frac{\partial J}{\partial u_o}&=\left(\sigma(u_o^Tv_c)-1\right)v_c\\ \frac{\partial J}{\partial u_k}&=-\left(\sigma(-u_k^Tv_c)-1\right)v_c\\ \end{align}

全部梯度

推導(dǎo)窗口半徑m的上下文[word_{c?m} ,...,word_{c?1} ,word_{c} ,word_{c+1} ,...,word_{c+m} ]時(shí),skip-gram 和 CBOW的損失函數(shù)F(\boldsymbol{o}, \boldsymbol{v}_{c}) (\boldsymbol{o} 是正確答案的詞向量)或說(shuō)J_{softmax-CE}(\boldsymbol{o},\boldsymbol{v}_{c},\dots)J_{neg-sample}(\boldsymbol{o},\boldsymbol{v}_{c},\dots) 關(guān)于每個(gè)詞向量的梯度囚聚。

對(duì)于skip-gram來(lái)講靖榕,c的上下文對(duì)應(yīng)的損失函數(shù)是:

\begin{align} J_{skip-gram}(word_{c-m \dots c+m})= \sum\limits_{-m \leq j \leq m, j \ne 0} F(\boldsymbol{w}_{c+j}, \boldsymbol{v}_{c})\end{align}

這里?\boldsymbol{w}_{c+j} 是離中心詞距離j的那個(gè)單詞。

而CBOW稍有不同顽铸,不使用中心詞\boldsymbol{v}_{c}而使用上下文詞向量的和\hat{\boldsymbol{v}}作為輸入去預(yù)測(cè)中心詞:

\begin{align} \hat{\boldsymbol{v}} = \sum\limits_{-m \leq j \leq m, j \ne 0} \boldsymbol{v}_{c+j}\end{align}

然后CBOW的損失函數(shù)是:

\begin{align} J_{CBOW}(word_{c-m \dots c+m})= F(\boldsymbol{w}_{c}, \hat{\boldsymbol{v}})\end{align}

解答:

根據(jù)前面的推導(dǎo)茁计,知道如何得到梯度\begin{align} \frac{\partial J}{\partial \boldsymbol{v_c}} = \boldsymbol{U}^{T} (\hat{\boldsymbol{y}} - \boldsymbol{y}) \nonumber \end{align}\begin{align} \frac{\partial J}{\partial \boldsymbol{U}} = \boldsymbol{v}_{c} (\hat{\boldsymbol{y}} - \boldsymbol{y})^{T} \nonumber \end{align}。那么所求的梯度可以寫(xiě)作:

skip-gram

\begin{align} \frac{J_{skip-gram}(word_{c-m \dots c+m})}{\partial \boldsymbol{U}} &= \sum\limits_{-m \leq j \leq m, j \ne 0} \frac{\partial F(\boldsymbol{w}_{c+j}, \boldsymbol{v}_{c})}{\partial \boldsymbol{U}} \nonumber \\ \frac{J_{skip-gram}(word_{c-m \dots c+m})}{\partial \boldsymbol{v}_{c}} &= \sum\limits_{-m \leq j \leq m, j \ne 0} \frac{\partial F(\boldsymbol{w}_{c+j}, \boldsymbol{v}_{c})}{\partial \boldsymbol{v}_{c}} \nonumber \\ \frac{J_{skip-gram}(word_{c-m \dots c+m})}{\partial \boldsymbol{v}_{j}} &= 0, \forall j\ne c \nonumber\end{align}

CBOW

\begin{align} \frac{J_{CBOW}(word_{c-m \dots c+m})}{\partial \boldsymbol{U}}& = \frac{\partial F(\boldsymbol{w}_{c}, \hat{\boldsymbol{v}})}{\partial \boldsymbol{U}} \nonumber \\ \frac{J_{CBOW}(word_{c-m \dots c+m})}{\partial \boldsymbol{v}_{j}} &= \frac{\partial F(\boldsymbol{w}_{c}, \hat{\boldsymbol{v}})}{\partial \hat{\boldsymbol{v}}}, \forall (j \ne c) \in \{c-m \dots c+m\} \nonumber \\ \frac{J_{CBOW}(word_{c-m \dots c+m})}{\partial \boldsymbol{v}_{j}} &= 0, \forall (j \ne c) \notin \{c-m \dots c+m\} \nonumber\end{align}

補(bǔ)充部分:

  • 矩陣的每個(gè)行向量的長(zhǎng)度歸一化

     x = x/np.linalg.norm(x,axis=1,keepdims=True)
    
  • 在斯坦福情感樹(shù)庫(kù)上訓(xùn)練詞向量

    直接運(yùn)行q3_run即可

    image

情感分析

特征向量

最簡(jiǎn)單的特征選擇方法就是取所有詞向量的平均

    sentence_index = [tokens[i] for i in sentence]
    for index in sentence_index:
        sentVector += wordVectors[index, :]

    sentVector /= len(sentence)

正則化

values = np.logspace(-4, 2, num=100, base=10)

調(diào)參

bestResult = max(results, key= lambda x: x['dev'])

懲罰因子對(duì)效果的影響

image

confusion matrix

關(guān)聯(lián)性排序的一個(gè)東西谓松,對(duì)角線上的元素越多星压,預(yù)測(cè)越準(zhǔn)確。

image
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末鬼譬,一起剝皮案震驚了整個(gè)濱河市娜膘,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌优质,老刑警劉巖竣贪,帶你破解...
    沈念sama閱讀 211,123評(píng)論 6 490
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場(chǎng)離奇詭異巩螃,居然都是意外死亡演怎,警方通過(guò)查閱死者的電腦和手機(jī),發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 90,031評(píng)論 2 384
  • 文/潘曉璐 我一進(jìn)店門(mén)牺六,熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)颤枪,“玉大人汗捡,你說(shuō)我怎么就攤上這事淑际。” “怎么了扇住?”我有些...
    開(kāi)封第一講書(shū)人閱讀 156,723評(píng)論 0 345
  • 文/不壞的土叔 我叫張陵春缕,是天一觀的道長(zhǎng)。 經(jīng)常有香客問(wèn)我艘蹋,道長(zhǎng)锄贼,這世上最難降的妖魔是什么? 我笑而不...
    開(kāi)封第一講書(shū)人閱讀 56,357評(píng)論 1 283
  • 正文 為了忘掉前任女阀,我火速辦了婚禮宅荤,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘浸策。我一直安慰自己冯键,他們只是感情好,可當(dāng)我...
    茶點(diǎn)故事閱讀 65,412評(píng)論 5 384
  • 文/花漫 我一把揭開(kāi)白布庸汗。 她就那樣靜靜地躺著惫确,像睡著了一般。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上改化,一...
    開(kāi)封第一講書(shū)人閱讀 49,760評(píng)論 1 289
  • 那天掩蛤,我揣著相機(jī)與錄音,去河邊找鬼陈肛。 笑死揍鸟,一個(gè)胖子當(dāng)著我的面吹牛,可吹牛的內(nèi)容都是我干的燥爷。 我是一名探鬼主播蜈亩,決...
    沈念sama閱讀 38,904評(píng)論 3 405
  • 文/蒼蘭香墨 我猛地睜開(kāi)眼,長(zhǎng)吁一口氣:“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼前翎!你這毒婦竟也來(lái)了稚配?” 一聲冷哼從身側(cè)響起,我...
    開(kāi)封第一講書(shū)人閱讀 37,672評(píng)論 0 266
  • 序言:老撾萬(wàn)榮一對(duì)情侶失蹤港华,失蹤者是張志新(化名)和其女友劉穎道川,沒(méi)想到半個(gè)月后,有當(dāng)?shù)厝嗽跇?shù)林里發(fā)現(xiàn)了一具尸體立宜,經(jīng)...
    沈念sama閱讀 44,118評(píng)論 1 303
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡冒萄,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 36,456評(píng)論 2 325
  • 正文 我和宋清朗相戀三年,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了橙数。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片尊流。...
    茶點(diǎn)故事閱讀 38,599評(píng)論 1 340
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡,死狀恐怖灯帮,靈堂內(nèi)的尸體忽然破棺而出崖技,到底是詐尸還是另有隱情,我是刑警寧澤钟哥,帶...
    沈念sama閱讀 34,264評(píng)論 4 328
  • 正文 年R本政府宣布迎献,位于F島的核電站,受9級(jí)特大地震影響腻贰,放射性物質(zhì)發(fā)生泄漏吁恍。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 39,857評(píng)論 3 312
  • 文/蒙蒙 一播演、第九天 我趴在偏房一處隱蔽的房頂上張望冀瓦。 院中可真熱鬧,春花似錦写烤、人聲如沸翼闽。這莊子的主人今日做“春日...
    開(kāi)封第一講書(shū)人閱讀 30,731評(píng)論 0 21
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽(yáng)肄程。三九已至锣吼,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間蓝厌,已是汗流浹背玄叠。 一陣腳步聲響...
    開(kāi)封第一講書(shū)人閱讀 31,956評(píng)論 1 264
  • 我被黑心中介騙來(lái)泰國(guó)打工, 沒(méi)想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留拓提,地道東北人读恃。 一個(gè)月前我還...
    沈念sama閱讀 46,286評(píng)論 2 360
  • 正文 我出身青樓,卻偏偏與公主長(zhǎng)得像代态,于是被迫代替她去往敵國(guó)和親寺惫。 傳聞我的和親對(duì)象是個(gè)殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 43,465評(píng)論 2 348

推薦閱讀更多精彩內(nèi)容