Word2vec原理淺析及tensorflow實(shí)現(xiàn)

Word2vec簡介

Word2Vec是由Google的Mikolov等人提出的一個詞向量計算模型较曼。

  • 輸入:大量已分詞的文本
  • 輸出:用一個稠密向量來表示每個詞

詞向量的重要意義在于將自然語言轉(zhuǎn)換成了計算機(jī)能夠理解的向量。相對于詞袋模型郑气、TF-IDF等模型,詞向量能抓住詞的上下文乌昔、語義隙疚,衡量詞與詞的相似性,在文本分類磕道、情感分析等許多自然語言處理領(lǐng)域有重要作用供屉。

詞向量經(jīng)典例子:![][01]
[01]:http://latex.codecogs.com/png.latex?\vec{man}-\vec{woman}\approx\vec{king}-\vec{queen}

gensim已經(jīng)用python封裝好了word2vec的實(shí)現(xiàn),有語料的話可以直接訓(xùn)練了溺蕉,參考中英文維基百科語料上的Word2Vec實(shí)驗

會使用gensim訓(xùn)練詞向量伶丐,并不表示真的掌握了word2vec,只表示會讀文檔會調(diào)接口而已焙贷。

Word2vec詳細(xì)實(shí)現(xiàn)

word2vec的詳細(xì)實(shí)現(xiàn)撵割,簡而言之贿堰,就是一個三層的神經(jīng)網(wǎng)絡(luò)辙芍。要理解word2vec的實(shí)現(xiàn),需要的預(yù)備知識是神經(jīng)網(wǎng)絡(luò)和Logistic Regression羹与。

神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)

word2vec原理圖

上圖是Word2vec的簡要流程圖故硅。首先假設(shè),詞庫里的詞數(shù)為10000; 詞向量的長度為300(根據(jù)斯坦福CS224d的講解纵搁,詞向量一般為25-1000維吃衅,300維是一個好的選擇)。下面以單個訓(xùn)練樣本為例腾誉,依次介紹每個部分的含義徘层。

  1. 輸入層:輸入為一個詞的one-hot向量表示。這個向量長度為10000利职。假設(shè)這個詞為ants趣效,ants在詞庫中的ID為i,則輸入向量的第i個分量為1猪贪,其余為0跷敬。[0, 0, ..., 0, 0, 1, 0, 0, ..., 0, 0]
  2. 隱藏層:隱藏層的神經(jīng)元個數(shù)就是詞向量的長度。隱藏層的參數(shù)是一個[10000 热押,300]的矩陣西傀。 實(shí)際上,這個參數(shù)矩陣就是詞向量桶癣。回憶一下矩陣相乘拥褂,一個one-hot行向量和矩陣相乘,結(jié)果就是矩陣的第i行牙寞。經(jīng)過隱藏層饺鹃,實(shí)際上就是把10000維的one-hot向量映射成了最終想要得到的300維的詞向量。
    矩陣乘法
  3. 輸出層: 輸出層的神經(jīng)元個數(shù)為總詞數(shù)10000,參數(shù)矩陣尺寸為[300尤慰,10000]馏锡。詞向量經(jīng)過矩陣計算后再加上softmax歸一化,重新變?yōu)?0000維的向量伟端,每一維對應(yīng)詞庫中的一個詞與輸入的詞(在這里是ants)共同出現(xiàn)在上下文中的概率杯道。


    輸出層

    上圖中計算了car與ants共現(xiàn)的概率,car所對應(yīng)的300維列向量就是輸出層參數(shù)矩陣中的一列责蝠。輸出層的參數(shù)矩陣是[300党巾,10000],也就是計算了詞庫中所有詞與ants共現(xiàn)的概率霜医。輸出層的參數(shù)矩陣在訓(xùn)練完畢后沒有作用齿拂。

  4. 訓(xùn)練:訓(xùn)練樣本(x, y)有輸入也有輸出,我們知道哪個詞實(shí)際上跟ants共現(xiàn)肴敛,因此y也是一個10000維的向量署海。損失函數(shù)跟Logistic Regression相似,是神經(jīng)網(wǎng)絡(luò)的最終輸出向量和y的交叉熵(cross-entropy)医男。最后用隨機(jī)梯度下降來求解砸狞。


    交叉熵(cross-entropy)

上述步驟是一個詞作為輸入和一個上下文中的詞作為輸出的情況,但實(shí)際情況顯然更復(fù)雜镀梭,什么是上下文呢刀森?用一個詞去預(yù)測周圍的其他詞,還是用周圍的好多詞來預(yù)測一個詞报账?這里就要引入實(shí)際訓(xùn)練時的兩個模型skip-gram和CBOW研底。

skip-gram和CBOW

  • skip-gram: 核心思想是根據(jù)中心詞來預(yù)測周圍的詞。假設(shè)中心詞是cat透罢,窗口長度為2榜晦,則根據(jù)cat預(yù)測左邊兩個詞和右邊兩個詞。這時琐凭,cat作為神經(jīng)網(wǎng)絡(luò)的input芽隆,預(yù)測的詞作為label。下圖為一個例子:


    skip-gram

    在這里窗口長度為2统屈,中心詞一個一個移動胚吁,遍歷所有文本。每一次中心詞的移動愁憔,最多會產(chǎn)生4對訓(xùn)練樣本(input腕扶,label)。

  • CBOW(continuous-bag-of-words):如果理解了skip-gram吨掌,那CBOW模型其實(shí)就是倒過來半抱,用周圍的所有詞來預(yù)測中心詞脓恕。這時候,每一次中心詞的移動窿侈,只能產(chǎn)生一個訓(xùn)練樣本炼幔。如果還是用上面的例子,則CBOW模型會產(chǎn)生下列4個訓(xùn)練樣本:

    1. ([quick, brown], the)
    2. ([the, brown, fox], quick)
    3. ([the, quick, fox, jumps], brown)
    4. ([quick, brown, jumps, over], fox)

    這時候史简,input很可能是4個詞乃秀,label只是一個詞,怎么辦呢圆兵?其實(shí)很簡單跺讯,只要求平均就行了。經(jīng)過隱藏層后殉农,輸入的4個詞被映射成了4個300維的向量刀脏,對這4個向量求平均,然后就可以作為下一層的輸入了超凳。

兩個模型相比愈污,skip-gram模型能產(chǎn)生更多訓(xùn)練樣本,抓住更多詞與詞之間語義上的細(xì)節(jié)聪建,在語料足夠多足夠好的理想條件下钙畔,skip-gram模型是優(yōu)于CBOW模型的。在語料較少的情況下金麸,難以抓住足夠多詞與詞之間的細(xì)節(jié),CBOW模型求平均的特性簿盅,反而效果可能更好挥下。

負(fù)采樣(Negative Sampling)

實(shí)際訓(xùn)練時,還是假設(shè)詞庫有10000個詞桨醋,詞向量300維棚瘟,那么每一層神經(jīng)網(wǎng)絡(luò)的參數(shù)是300萬個,輸出層相當(dāng)于有一萬個可能類的多分類問題喜最≠苏海可以想象,這樣的計算量非常非常非常大瞬内。
作者M(jìn)ikolov等人提出了許多優(yōu)化的方法迷雪,在這里著重講一下負(fù)采樣。
負(fù)采樣的思想非常簡單虫蝶,簡單地令人發(fā)指:我們知道最終神經(jīng)網(wǎng)絡(luò)經(jīng)過softmax輸出一個向量章咧,只有一個概率最大的對應(yīng)正確的單詞,其余的稱為negative sample能真。現(xiàn)在只選擇5個negative sample赁严,所以輸出向量就只是一個6維的向量扰柠。要考慮的參數(shù)不是300萬個,而減少到了1800個疼约! 這樣做看上去很偷懶卤档,實(shí)際效果卻很好,大大提升了運(yùn)算效率程剥。
我們知道裆装,訓(xùn)練神經(jīng)網(wǎng)絡(luò)時,每一次訓(xùn)練會對神經(jīng)網(wǎng)絡(luò)的參數(shù)進(jìn)行微小的修改倡缠。在word2vec中哨免,每一個訓(xùn)練樣本并不會對所有參數(shù)進(jìn)行修改。假設(shè)輸入的詞是cat昙沦,我們的隱藏層參數(shù)有300萬個琢唾,但這一步訓(xùn)練只會修改cat相對應(yīng)的300個參數(shù),因為此時隱藏層的輸出只跟這300個參數(shù)有關(guān)盾饮!
負(fù)采樣是有效的采桃,我們不需要那么多negative sample。Mikolov等人在論文中說:對于小數(shù)據(jù)集丘损,負(fù)采樣的個數(shù)在5-20個普办;對于大數(shù)據(jù)集,負(fù)采樣的個數(shù)在2-5個徘钥。

那具體如何選擇負(fù)采樣的詞呢衔蹲?論文給出了如下公式:


負(fù)采樣的選擇

其中f(w)是詞頻〕蚀。可以看到舆驶,負(fù)采樣的選擇只跟詞頻有關(guān),詞頻越大而钞,越有可能選中沙廉。

Tensorflow實(shí)現(xiàn)

最后用tensorflow動手實(shí)踐一下。參考Udacity Deep Learning的一次作業(yè)

這里只是訓(xùn)練了128維的詞向量臼节,并通過TSNE的方法可視化撬陵。作為練手和深入理解word2vec不錯,實(shí)戰(zhàn)還是推薦gensim网缝。

# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
%matplotlib inline
from __future__ import print_function
import collections
import math
import numpy as np
import os
import random
import tensorflow as tf
import zipfile
from matplotlib import pylab
from six.moves import range
from six.moves.urllib.request import urlretrieve
from sklearn.manifold import TSNE

Download the data from the source website if necessary.

url = 'http://mattmahoney.net/dc/'

def maybe_download(filename, expected_bytes):
  """Download a file if not present, and make sure it's the right size."""
  if not os.path.exists(filename):
    filename, _ = urlretrieve(url + filename, filename)
  statinfo = os.stat(filename)
  if statinfo.st_size == expected_bytes:
    print('Found and verified %s' % filename)
  else:
    print(statinfo.st_size)
    raise Exception(
      'Failed to verify ' + filename + '. Can you get to it with a browser?')
  return filename

filename = maybe_download('text8.zip', 31344016)
Found and verified text8.zip

Read the data into a string.

def read_data(filename):
  """Extract the first file enclosed in a zip file as a list of words"""
  with zipfile.ZipFile(filename) as f:
    data = tf.compat.as_str(f.read(f.namelist()[0])).split()
  return data
  
words = read_data(filename)
print('Data size %d' % len(words))
Data size 17005207

Build the dictionary and replace rare words with UNK token.

vocabulary_size = 50000

def build_dataset(words):
  count = [['UNK', -1]]
  count.extend(collections.Counter(words).most_common(vocabulary_size - 1))
  dictionary = dict()
  for word, _ in count:
    dictionary[word] = len(dictionary)
  data = list()
  unk_count = 0
  for word in words:
    if word in dictionary:
      index = dictionary[word]
    else:
      index = 0  # dictionary['UNK']
      unk_count = unk_count + 1
    data.append(index)
  count[0][1] = unk_count
  reverse_dictionary = dict(zip(dictionary.values(), dictionary.keys())) 
  return data, count, dictionary, reverse_dictionary

data, count, dictionary, reverse_dictionary = build_dataset(words)
print('Most common words (+UNK)', count[:5])
print('Sample data', data[:10])
del words  # Hint to reduce memory.
Most common words (+UNK) [['UNK', 418391], ('the', 1061396), ('of', 593677), ('and', 416629), ('one', 411764)]
Sample data [5239, 3084, 12, 6, 195, 2, 3137, 46, 59, 156]

Function to generate a training batch for the skip-gram model.

data_index = 0

def generate_batch(batch_size, num_skips, skip_window):
    global data_index
    assert batch_size % num_skips == 0
    assert num_skips <= 2 * skip_window
    batch = np.ndarray(shape=(batch_size), dtype=np.int32)
    labels = np.ndarray(shape=(batch_size, 1), dtype=np.int32)
    span = 2 * skip_window + 1 # [ skip_window target skip_window ]
    buffer = collections.deque(maxlen=span)
    for _ in range(span):
        buffer.append(data[data_index])
        data_index = (data_index + 1) % len(data)
    for i in range(batch_size // num_skips):
        target = skip_window  # target label at the center of the buffer
        targets_to_avoid = [ skip_window ]
        for j in range(num_skips):
            while target in targets_to_avoid:
                target = random.randint(0, span - 1)
            targets_to_avoid.append(target)
            batch[i * num_skips + j] = buffer[skip_window]
            labels[i * num_skips + j, 0] = buffer[target]
        buffer.append(data[data_index])
        data_index = (data_index + 1) % len(data)
    return batch, labels

print('data:', [reverse_dictionary[di] for di in data[:8]])

for num_skips, skip_window in [(2, 1), (4, 2)]:
    data_index = 0
    batch, labels = generate_batch(batch_size=8, num_skips=num_skips, skip_window=skip_window)
    print('\nwith num_skips = %d and skip_window = %d:' % (num_skips, skip_window))
    print('    batch:', [reverse_dictionary[bi] for bi in batch])
    print('    labels:', [reverse_dictionary[li] for li in labels.reshape(8)])
data: ['anarchism', 'originated', 'as', 'a', 'term', 'of', 'abuse', 'first']

with num_skips = 2 and skip_window = 1:
    batch: ['originated', 'originated', 'as', 'as', 'a', 'a', 'term', 'term']
    labels: ['anarchism', 'as', 'originated', 'a', 'as', 'term', 'a', 'of']

with num_skips = 4 and skip_window = 2:
    batch: ['as', 'as', 'as', 'as', 'a', 'a', 'a', 'a']
    labels: ['originated', 'term', 'anarchism', 'a', 'of', 'as', 'originated', 'term']

Skip-Gram

Train a skip-gram model.

batch_size = 128
embedding_size = 128 # Dimension of the embedding vector.
skip_window = 1 # How many words to consider left and right.
num_skips = 2 # How many times to reuse an input to generate a label.
# We pick a random validation set to sample nearest neighbors. here we limit the
# validation samples to the words that have a low numeric ID, which by
# construction are also the most frequent. 
valid_size = 16 # Random set of words to evaluate similarity on.
valid_window = 100 # Only pick dev samples in the head of the distribution.
valid_examples = np.array(random.sample(range(valid_window), valid_size))

#######important#########
num_sampled = 64 # Number of negative examples to sample.

graph = tf.Graph()

with graph.as_default(), tf.device('/cpu:0'):

  # Input data.
  train_dataset = tf.placeholder(tf.int32, shape=[batch_size])
  train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])
  valid_dataset = tf.constant(valid_examples, dtype=tf.int32)
  
  # Variables.
  embeddings = tf.Variable(
    tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
  softmax_weights = tf.Variable(
    tf.truncated_normal([vocabulary_size, embedding_size],
                         stddev=1.0 / math.sqrt(embedding_size)))
  softmax_biases = tf.Variable(tf.zeros([vocabulary_size]))
  
  # Model.
  # Look up embeddings for inputs.
  embed = tf.nn.embedding_lookup(embeddings, train_dataset)
  # Compute the softmax loss, using a sample of the negative labels each time.
  loss = tf.reduce_mean(
    tf.nn.sampled_softmax_loss(weights=softmax_weights, biases=softmax_biases, inputs=embed,
                               labels=train_labels, num_sampled=num_sampled, num_classes=vocabulary_size))

  # Optimizer.
  # Note: The optimizer will optimize the softmax_weights AND the embeddings.
  # This is because the embeddings are defined as a variable quantity and the
  # optimizer's `minimize` method will by default modify all variable quantities 
  # that contribute to the tensor it is passed.
  # See docs on `tf.train.Optimizer.minimize()` for more details.
  optimizer = tf.train.AdagradOptimizer(1.0).minimize(loss)
  
  # Compute the similarity between minibatch examples and all embeddings.
  # We use the cosine distance:
  norm = tf.sqrt(tf.reduce_sum(tf.square(embeddings), 1, keep_dims=True))
  normalized_embeddings = embeddings / norm
  valid_embeddings = tf.nn.embedding_lookup(
    normalized_embeddings, valid_dataset)
  similarity = tf.matmul(valid_embeddings, tf.transpose(normalized_embeddings))
num_steps = 100001

with tf.Session(graph=graph) as session:
  tf.global_variables_initializer().run()
  print('Initialized')
  average_loss = 0
  for step in range(num_steps):
    batch_data, batch_labels = generate_batch(
      batch_size, num_skips, skip_window)
    feed_dict = {train_dataset : batch_data, train_labels : batch_labels}
    _, l = session.run([optimizer, loss], feed_dict=feed_dict)
    average_loss += l
    if step % 2000 == 0:
      if step > 0:
        average_loss = average_loss / 2000
      # The average loss is an estimate of the loss over the last 2000 batches.
      print('Average loss at step %d: %f' % (step, average_loss))
      average_loss = 0
    # note that this is expensive (~20% slowdown if computed every 500 steps)
    if step % 10000 == 0:
      sim = similarity.eval()
      for i in range(valid_size):
        valid_word = reverse_dictionary[valid_examples[i]]
        top_k = 8 # number of nearest neighbors
        nearest = (-sim[i, :]).argsort()[1:top_k+1]
        log = 'Nearest to %s:' % valid_word
        for k in range(top_k):
          close_word = reverse_dictionary[nearest[k]]
          log = '%s %s,' % (log, close_word)
        print(log)
  final_embeddings = normalized_embeddings.eval()
num_points = 400

tsne = TSNE(perplexity=30, n_components=2, init='pca', n_iter=5000)
two_d_embeddings = tsne.fit_transform(final_embeddings[1:num_points+1, :])
def plot(embeddings, labels):
  assert embeddings.shape[0] >= len(labels), 'More labels than embeddings'
  pylab.figure(figsize=(15,15))  # in inches
  for i, label in enumerate(labels):
    x, y = embeddings[i,:]
    pylab.scatter(x, y)
    pylab.annotate(label, xy=(x, y), xytext=(5, 2), textcoords='offset points',
                   ha='right', va='bottom')
  pylab.show()

words = [reverse_dictionary[i] for i in range(1, num_points+1)]
plot(two_d_embeddings, words)
skip-gram可視化

CBOW


data_index_cbow = 0

def get_cbow_batch(batch_size, num_skips, skip_window):
    global data_index_cbow
    assert batch_size % num_skips == 0
    assert num_skips <= 2 * skip_window
    batch = np.ndarray(shape=(batch_size), dtype=np.int32)
    labels = np.ndarray(shape=(batch_size, 1), dtype=np.int32)
    span = 2 * skip_window + 1 # [ skip_window target skip_window ]
    buffer = collections.deque(maxlen=span)
    for _ in range(span):
        buffer.append(data[data_index_cbow])
        data_index_cbow = (data_index_cbow + 1) % len(data)
    for i in range(batch_size // num_skips):
        target = skip_window  # target label at the center of the buffer
        targets_to_avoid = [ skip_window ]
        for j in range(num_skips):
            while target in targets_to_avoid:
                target = random.randint(0, span - 1)
            targets_to_avoid.append(target)
            batch[i * num_skips + j] = buffer[skip_window]
            labels[i * num_skips + j, 0] = buffer[target]
        buffer.append(data[data_index_cbow])
        data_index_cbow = (data_index_cbow + 1) % len(data)
    cbow_batch = np.ndarray(shape=(batch_size), dtype=np.int32)
    cbow_labels = np.ndarray(shape=(batch_size // (skip_window * 2), 1), dtype=np.int32)
    for i in range(batch_size):
        cbow_batch[i] = labels[i]
    cbow_batch = np.reshape(cbow_batch, [batch_size // (skip_window * 2), skip_window * 2])
    for i in range(batch_size // (skip_window * 2)):
        # center word
        cbow_labels[i] = batch[2 * skip_window * i]
    return cbow_batch, cbow_labels
# actual batch_size = batch_size // (2 * skip_window)
batch_size = 128
embedding_size = 128 # Dimension of the embedding vector.
skip_window = 1 # How many words to consider left and right.
num_skips = 2 # How many times to reuse an input to generate a label.
# We pick a random validation set to sample nearest neighbors. here we limit the
# validation samples to the words that have a low numeric ID, which by
# construction are also the most frequent. 
valid_size = 16 # Random set of words to evaluate similarity on.
valid_window = 100 # Only pick dev samples in the head of the distribution.
valid_examples = np.array(random.sample(range(valid_window), valid_size))

#######important#########
num_sampled = 64 # Number of negative examples to sample.

graph = tf.Graph()

with graph.as_default(), tf.device('/cpu:0'):

  # Input data.
    train_dataset = tf.placeholder(tf.int32, shape=[batch_size // (skip_window * 2), skip_window * 2])
    train_labels = tf.placeholder(tf.int32, shape=[batch_size // (skip_window * 2), 1])
    valid_dataset = tf.constant(valid_examples, dtype=tf.int32)
  
  # Variables.
    embeddings = tf.Variable(
      tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
    softmax_weights = tf.Variable(
      tf.truncated_normal([vocabulary_size, embedding_size],
                         stddev=1.0 / math.sqrt(embedding_size)))
    softmax_biases = tf.Variable(tf.zeros([vocabulary_size]))
  
  # Model.
  # Look up embeddings for inputs.
    embed = tf.nn.embedding_lookup(embeddings, train_dataset)
    
    # reshape embed
    embed = tf.reshape(embed, (skip_window * 2, batch_size // (skip_window * 2), embedding_size))
    # average embed
    embed = tf.reduce_mean(embed, 0)
    
  # Compute the softmax loss, using a sample of the negative labels each time.
    loss = tf.reduce_mean(
      tf.nn.sampled_softmax_loss(weights=softmax_weights, biases=softmax_biases, inputs=embed,
                               labels=train_labels, num_sampled=num_sampled, num_classes=vocabulary_size))

  # Optimizer.
  # Note: The optimizer will optimize the softmax_weights AND the embeddings.
  # This is because the embeddings are defined as a variable quantity and the
  # optimizer's `minimize` method will by default modify all variable quantities 
  # that contribute to the tensor it is passed.
  # See docs on `tf.train.Optimizer.minimize()` for more details.
    optimizer = tf.train.AdagradOptimizer(1.0).minimize(loss)
  
  # Compute the similarity between minibatch examples and all embeddings.
  # We use the cosine distance:
    norm = tf.sqrt(tf.reduce_sum(tf.square(embeddings), 1, keep_dims=True))
    normalized_embeddings = embeddings / norm
    valid_embeddings = tf.nn.embedding_lookup(
      normalized_embeddings, valid_dataset)
    similarity = tf.matmul(valid_embeddings, tf.transpose(normalized_embeddings))
num_steps = 100001

with tf.Session(graph=graph) as session:
  tf.global_variables_initializer().run()
  print('Initialized')
  average_loss = 0
  for step in range(num_steps):
    batch_data, batch_labels = get_cbow_batch(
      batch_size, num_skips, skip_window)
    feed_dict = {train_dataset : batch_data, train_labels : batch_labels}
    _, l = session.run([optimizer, loss], feed_dict=feed_dict)
    average_loss += l
    if step % 2000 == 0:
      if step > 0:
        average_loss = average_loss / 2000
      # The average loss is an estimate of the loss over the last 2000 batches.
      print('Average loss at step %d: %f' % (step, average_loss))
      average_loss = 0
    # note that this is expensive (~20% slowdown if computed every 500 steps)
    if step % 10000 == 0:
      sim = similarity.eval()
      for i in range(valid_size):
        valid_word = reverse_dictionary[valid_examples[i]]
        top_k = 8 # number of nearest neighbors
        nearest = (-sim[i, :]).argsort()[1:top_k+1]
        log = 'Nearest to %s:' % valid_word
        for k in range(top_k):
          close_word = reverse_dictionary[nearest[k]]
          log = '%s %s,' % (log, close_word)
        print(log)
  final_embeddings = normalized_embeddings.eval()
num_points = 400

tsne = TSNE(perplexity=30, n_components=2, init='pca', n_iter=5000)
two_d_embeddings = tsne.fit_transform(final_embeddings[1:num_points+1, :])
words = [reverse_dictionary[i] for i in range(200, num_points+1)]
plot(two_d_embeddings, words)
CBOW可視化

參考資料

  1. Le Q V, Mikolov T. Distributed Representations of Sentences and Documents[J]. 2014, 4:II-1188.
  2. Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and their Compositionality[J]. Advances in Neural Information Processing Systems, 2013, 26:3111-3119.
  3. Word2Vec Tutorial - The Skip-Gram Model
  4. Udacity Deep Learning
  5. Stanford CS224d Lecture2,3
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末巨税,一起剝皮案震驚了整個濱河市,隨后出現(xiàn)的幾起案子途凫,更是在濱河造成了極大的恐慌垢夹,老刑警劉巖,帶你破解...
    沈念sama閱讀 219,539評論 6 508
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件维费,死亡現(xiàn)場離奇詭異果元,居然都是意外死亡促王,警方通過查閱死者的電腦和手機(jī),發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 93,594評論 3 396
  • 文/潘曉璐 我一進(jìn)店門而晒,熙熙樓的掌柜王于貴愁眉苦臉地迎上來蝇狼,“玉大人,你說我怎么就攤上這事倡怎⊙冈牛” “怎么了?”我有些...
    開封第一講書人閱讀 165,871評論 0 356
  • 文/不壞的土叔 我叫張陵监署,是天一觀的道長颤专。 經(jīng)常有香客問我,道長钠乏,這世上最難降的妖魔是什么栖秕? 我笑而不...
    開封第一講書人閱讀 58,963評論 1 295
  • 正文 為了忘掉前任,我火速辦了婚禮晓避,結(jié)果婚禮上簇捍,老公的妹妹穿的比我還像新娘。我一直安慰自己俏拱,他們只是感情好暑塑,可當(dāng)我...
    茶點(diǎn)故事閱讀 67,984評論 6 393
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著锅必,像睡著了一般事格。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發(fā)上况毅,一...
    開封第一講書人閱讀 51,763評論 1 307
  • 那天分蓖,我揣著相機(jī)與錄音,去河邊找鬼尔许。 笑死,一個胖子當(dāng)著我的面吹牛终娃,可吹牛的內(nèi)容都是我干的味廊。 我是一名探鬼主播,決...
    沈念sama閱讀 40,468評論 3 420
  • 文/蒼蘭香墨 我猛地睜開眼棠耕,長吁一口氣:“原來是場噩夢啊……” “哼余佛!你這毒婦竟也來了?” 一聲冷哼從身側(cè)響起窍荧,我...
    開封第一講書人閱讀 39,357評論 0 276
  • 序言:老撾萬榮一對情侶失蹤辉巡,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后蕊退,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體郊楣,經(jīng)...
    沈念sama閱讀 45,850評論 1 317
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡憔恳,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 38,002評論 3 338
  • 正文 我和宋清朗相戀三年,在試婚紗的時候發(fā)現(xiàn)自己被綠了净蚤。 大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片钥组。...
    茶點(diǎn)故事閱讀 40,144評論 1 351
  • 序言:一個原本活蹦亂跳的男人離奇死亡,死狀恐怖今瀑,靈堂內(nèi)的尸體忽然破棺而出程梦,到底是詐尸還是另有隱情,我是刑警寧澤橘荠,帶...
    沈念sama閱讀 35,823評論 5 346
  • 正文 年R本政府宣布屿附,位于F島的核電站,受9級特大地震影響哥童,放射性物質(zhì)發(fā)生泄漏挺份。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,483評論 3 331
  • 文/蒙蒙 一如蚜、第九天 我趴在偏房一處隱蔽的房頂上張望压恒。 院中可真熱鬧,春花似錦错邦、人聲如沸探赫。這莊子的主人今日做“春日...
    開封第一講書人閱讀 32,026評論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽伦吠。三九已至,卻和暖如春魂拦,著一層夾襖步出監(jiān)牢的瞬間毛仪,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 33,150評論 1 272
  • 我被黑心中介騙來泰國打工芯勘, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留箱靴,地道東北人。 一個月前我還...
    沈念sama閱讀 48,415評論 3 373
  • 正文 我出身青樓荷愕,卻偏偏與公主長得像衡怀,于是被迫代替她去往敵國和親。 傳聞我的和親對象是個殘疾皇子安疗,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 45,092評論 2 355

推薦閱讀更多精彩內(nèi)容