本篇介紹的內容主要用于NLP(Nature Language Process, 自然語言處理)。Deep Learning 算法已經在圖像和音頻領域取得了驚人的成果,但是在 NLP 領域中尚未見到如此激動人心的結果,但就目前而言,Deep Learning 在 NLP 領域中的研究已經將高深莫測的人類語言撕開了一層神秘的面紗惭每。本篇內容主要就是用來做詞向量的映射與訓練。
一亏栈、Embedding
keras.layers.embeddings.Embedding(input_dim,output_dim, init='uniform', input_length=None, weights=None, W_regularizer=None, W_constraint=None, mask_zero=False)
將正整數(shù)轉換為固定size的denses向量台腥。比如[[4], [20]] -> [[0.25, 0.1], [0.6, -0.2]]
** input shape: 2維tensor宏赘,shape為(nb_samples,sequence_length)
** output shape: 3維tensor,shape為(nb_samples,sequence_length, output_dim)黎侈。
** 參數(shù)**:
input_dim : int>=0察署。Size of the vocabulary, ie.1+maximum integer index occuring in the input data
output_dim: int >= 0. Dimension ofthe dense embedding.
init: 初始化權值的函數(shù)名稱或Theano function【海可以使用Keras內置的(內置初始化權值函數(shù)見這里)箕母,也可以傳遞自己編寫的Theano function。如果不給weights傳遞參數(shù)時俱济,則該參數(shù)必須指明。
weights: 用于初始化權值的numpy arrays組成的list钙勃。這個List至少有1個元素蛛碌,shape為(input_dim, output_dim)
W_regularizer:權值的規(guī)則化項,必須傳入一個WeightRegularizer的實例(比如L1或L2規(guī)則化項辖源,詳細的內置規(guī)則化見這里)蔚携。
mask_zero: Whether or not the input value0 is a special "padding" value that should be masked out. This isuseful for recurrent layers which may take variable length input. If this isTrue then all subsequent layers in the model need to support masking or anexception will be raised.
input_length: Length of input sequences, whenit is constant. This argument is required if you are going to connect Flattenthen Dense layers upstream (without it, the shape of the dense outputs cannotbe computed).
二、WordContextProduct
keras.layers.embeddings.WordContextProduct(input_dim,proj_dim=128,
init='uniform', activation='sigmoid', weights=None)
這個層主要是把一對word轉換為兩個向量克饶。This layer turns a pair ofwords (a pivot word + a context word, ie. a word from the same context as apivot, or a random, out-of-context word), indentified by their indices in avocabulary, into two dense reprensentations (word representation and contextrepresentation).
Then it returnsactivation(dot(pivot_embedding, context_embedding)), which can be trained toencode the probability of finding the context word in the context of the pivotword (or reciprocally depending on your training procedure).
更多信息可以看這里:Efficient Estimation of Wordreprensentations in Vector Space
** inputshape: 2維tensor酝蜒,shape為(nb_samples, 2)
** outputshape: 2維tensor,shape為(nb_samples, 1)矾湃。
** 參數(shù)**:
input_dim : int>=0亡脑。Size of the vocabulary, ie.1+maximum integer index occuring in the input data
proj_dim: int >= 0. Dimension ofthe dense embedding used internally.
init: 初始化權值的函數(shù)名稱或Theano function⊙荆可以使用Keras內置的(內置初始化權值函數(shù)見這里)霉咨,也可以傳遞自己編寫的Theano function。如果不給weights傳遞參數(shù)時拍屑,則該參數(shù)必須指明途戒。
activation : 激活函數(shù)名稱或者Theano function〗┏郏可以使用Keras內置的(內置激活函數(shù)見這里)喷斋,也可以是傳遞自己編寫的Theano function。如果不明確指定蒜茴,那么將沒有激活函數(shù)會被應用星爪。
weights: 用于初始化權值的numpy arrays組成的list。這個List要有2個元素矮男,shape為(input_dim, proj_dim)移必。The first element is the wordembedding weights, the second one is the context embedding weights.