BERT 簡介
BERT是2018年google 提出來的預(yù)訓(xùn)練的語言模型,并且它打破很多NLP領(lǐng)域的任務(wù)記錄,其提出在nlp的領(lǐng)域具有重要意義。預(yù)訓(xùn)練的(pre-train)的語言模型通過無監(jiān)督的學(xué)習(xí)掌握了很多自然語言的一些語法或者語義知識,之后在做下游的nlp任務(wù)時(shí)就會顯得比較容易。BERT在做下游的有監(jiān)督nlp任務(wù)時(shí)就像一個(gè)做了充足預(yù)習(xí)的學(xué)生去上課寒矿,那效果肯定事半功倍。之前的word2vec若债,glove等Word Embedding技術(shù)也是通過無監(jiān)督的訓(xùn)練讓模型預(yù)先掌握了一些基礎(chǔ)的語言知識符相,但是word embeding技術(shù)無論從預(yù)訓(xùn)練的模型復(fù)雜度(可以理解成學(xué)習(xí)的能力),以及無監(jiān)督學(xué)習(xí)的任務(wù)難度都無法和BERT相比蠢琳。
模型部分
首先BERT模型采用的是12層或者24層的雙向的Transformer的Encoder作為特征提取器啊终,如下圖所示。要知道在nlp領(lǐng)域傲须,特征提取能力方面的排序大致是Transformer>RNN>CNN蓝牲。對Transformer不了解的同學(xué)可以看看筆者之前的這篇文章,而且一用就是12層泰讽,將nlp真正的往深度的方向推進(jìn)了一大步例衍。
預(yù)訓(xùn)練任務(wù)方面
BERT 為了讓模型能夠比較好的掌握自然語言方面的知識昔期,提出了下面兩種預(yù)訓(xùn)練的任務(wù):
1.遮蓋詞的預(yù)測任務(wù)(mask word prediction),如下圖所示:
將輸入文本中15%的token隨機(jī)遮蓋佛玄,然后輸入給模型硼一,最終希望模型能夠輸出遮蓋的詞是什么,這就是讓模型在做完形填空啊梦抢,而且還是選項(xiàng)的可能是所有詞的完形填空般贼,想想我們經(jīng)常考試時(shí)做完形填空奥吩,給你四個(gè)選項(xiàng)你都不一定能夠做對哼蛆,這個(gè)任務(wù)可以讓模型學(xué)到很多語法檐晕,甚至語義方面的知識裹刮。
2.下一個(gè)句子預(yù)測任務(wù)(next sentence prediction)
下一個(gè)句子預(yù)測任務(wù)如下圖所示:給模型輸入A,B兩個(gè)句子斗埂,讓模型判斷B句子是否是A句子的下一句端衰。這個(gè)任務(wù)是希望模型能夠?qū)W到句子間的關(guān)系萤厅,更近一步的加強(qiáng)模型對自然語言的理解。
這兩個(gè)頗具難度的預(yù)訓(xùn)練任務(wù)靴迫,讓模型在預(yù)訓(xùn)練階段就對自然語言有了比較深入的學(xué)習(xí)和認(rèn)知,而這些知識對下游的nlp任務(wù)有著巨大的幫助楼誓。當(dāng)然玉锌,想要模型通過預(yù)訓(xùn)練掌握知識,我們需要花費(fèi)大量的語料疟羹,大量的計(jì)算資源和大量的時(shí)間主守。但是訓(xùn)練一遍就可以一直使用,這種一勞永逸的工作榄融,依然很值得去做一做参淫。
BERT的NER實(shí)戰(zhàn)
這里筆者先介紹一下kashgari這個(gè)框架,此框架的github鏈接在這,封裝這個(gè)框架的作者希望大家能夠很方便的調(diào)用一些NLP領(lǐng)域高大上的技術(shù)愧杯,快速的進(jìn)行一些實(shí)驗(yàn)涎才。kashgari封裝了BERT embedingg模型,LSTM-CRF實(shí)體識別模型力九,還有一些經(jīng)典的文本分類的網(wǎng)絡(luò)模型耍铜。這里筆者就是利用這個(gè)框架五分鐘在自己的數(shù)據(jù)集上完成了基于BERT的NER實(shí)戰(zhàn)。
數(shù)據(jù)讀入
with open("train_data","rb") as f:
data = f.read().decode("utf-8")
train_data = data.split("\n\n")
train_data = [token.split("\n") for token in train_data]
train_data = [[j.split() for j in i ] for i in train_data]
train_data.pop()
數(shù)據(jù)預(yù)處理
train_x = [[token[0] for token in sen] for sen in train_data]
train_y = [[token[1] for token in sen] for sen in train_data]
這里 train_x和 train_y都是一個(gè)list跌前,
train_x: [[char_seq1],[char_seq2],[char_seq3],..... ]
train_y:[[label_seq1],[label_seq2],[label_seq3],..... ]
其中 char_seq1:["我"棕兼,"愛","荊"抵乓,"州"]
對應(yīng)的的label_seq1:["O"伴挚,"O"靶衍,"B_LOC","I_LOC"]
數(shù)據(jù)預(yù)處理成一個(gè)字對應(yīng)一個(gè)label就可以了茎芋,是不是很方便颅眶。kashgari已經(jīng)封裝了數(shù)據(jù)數(shù)值化,向量化的模塊了败徊,所以你已經(jīng)不用操心文本和label的數(shù)值化問題了帚呼。這里要強(qiáng)調(diào)一下,由于google開源的BERT中文預(yù)訓(xùn)練模型采用的是字符級別的輸入皱蹦,所以數(shù)據(jù)預(yù)處理部分只能將文本處理成字符煤杀。
載入BERT
只需通過下面三行就可以輕易的加載BERT模型。
from kashgari.embeddings import BERTEmbedding
from kashgari.tasks.seq_labeling import BLSTMCRFModel
embedding = BERTEmbedding("bert-base-chinese", 200)
運(yùn)行后沪哺,代碼會自動到BERT模型儲存的地方下載預(yù)訓(xùn)練模型的參數(shù)沈自,這里谷歌已經(jīng)幫我們預(yù)訓(xùn)練好BERT模型了,所以我們只需要用它做下游任務(wù)即可辜妓。
搭建模型并訓(xùn)練
使用kashgari封裝的LSTM+CRF模型枯途,將數(shù)據(jù)喂給模型,同時(shí)設(shè)置好batch-size,就可以訓(xùn)練起來了籍滴。感覺整個(gè)過程是不是不需要五分鐘(當(dāng)然如果網(wǎng)速慢酪夷,下載預(yù)訓(xùn)練的BERT模型可能就超過五分鐘了)。
model = BLSTMCRFModel(embedding)
model.fit(train_x,train_y,epochs=1,batch_size=100)
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
Input-Token (InputLayer) (None, 200) 0
__________________________________________________________________________________________________
Input-Segment (InputLayer) (None, 200) 0
__________________________________________________________________________________________________
Embedding-Token (TokenEmbedding [(None, 200, 768), ( 16226304 Input-Token[0][0]
__________________________________________________________________________________________________
Embedding-Segment (Embedding) (None, 200, 768) 1536 Input-Segment[0][0]
__________________________________________________________________________________________________
Embedding-Token-Segment (Add) (None, 200, 768) 0 Embedding-Token[0][0]
Embedding-Segment[0][0]
__________________________________________________________________________________________________
Embedding-Position (PositionEmb (None, 200, 768) 153600 Embedding-Token-Segment[0][0]
__________________________________________________________________________________________________
Embedding-Dropout (Dropout) (None, 200, 768) 0 Embedding-Position[0][0]
__________________________________________________________________________________________________
Embedding-Norm (LayerNormalizat (None, 200, 768) 1536 Embedding-Dropout[0][0]
__________________________________________________________________________________________________
Encoder-1-MultiHeadSelfAttentio (None, 200, 768) 2362368 Embedding-Norm[0][0]
__________________________________________________________________________________________________
Encoder-1-MultiHeadSelfAttentio (None, 200, 768) 0 Encoder-1-MultiHeadSelfAttention[
__________________________________________________________________________________________________
Encoder-1-MultiHeadSelfAttentio (None, 200, 768) 0 Embedding-Norm[0][0]
Encoder-1-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-1-MultiHeadSelfAttentio (None, 200, 768) 1536 Encoder-1-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-1-FeedForward (FeedForw (None, 200, 768) 4722432 Encoder-1-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-1-FeedForward-Dropout ( (None, 200, 768) 0 Encoder-1-FeedForward[0][0]
__________________________________________________________________________________________________
Encoder-1-FeedForward-Add (Add) (None, 200, 768) 0 Encoder-1-MultiHeadSelfAttention-
Encoder-1-FeedForward-Dropout[0][
__________________________________________________________________________________________________
Encoder-1-FeedForward-Norm (Lay (None, 200, 768) 1536 Encoder-1-FeedForward-Add[0][0]
__________________________________________________________________________________________________
Encoder-2-MultiHeadSelfAttentio (None, 200, 768) 2362368 Encoder-1-FeedForward-Norm[0][0]
__________________________________________________________________________________________________
Encoder-2-MultiHeadSelfAttentio (None, 200, 768) 0 Encoder-2-MultiHeadSelfAttention[
__________________________________________________________________________________________________
Encoder-2-MultiHeadSelfAttentio (None, 200, 768) 0 Encoder-1-FeedForward-Norm[0][0]
Encoder-2-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-2-MultiHeadSelfAttentio (None, 200, 768) 1536 Encoder-2-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-2-FeedForward (FeedForw (None, 200, 768) 4722432 Encoder-2-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-2-FeedForward-Dropout ( (None, 200, 768) 0 Encoder-2-FeedForward[0][0]
__________________________________________________________________________________________________
Encoder-2-FeedForward-Add (Add) (None, 200, 768) 0 Encoder-2-MultiHeadSelfAttention-
Encoder-2-FeedForward-Dropout[0][
__________________________________________________________________________________________________
Encoder-2-FeedForward-Norm (Lay (None, 200, 768) 1536 Encoder-2-FeedForward-Add[0][0]
__________________________________________________________________________________________________
Encoder-3-MultiHeadSelfAttentio (None, 200, 768) 2362368 Encoder-2-FeedForward-Norm[0][0]
__________________________________________________________________________________________________
Encoder-3-MultiHeadSelfAttentio (None, 200, 768) 0 Encoder-3-MultiHeadSelfAttention[
__________________________________________________________________________________________________
Encoder-3-MultiHeadSelfAttentio (None, 200, 768) 0 Encoder-2-FeedForward-Norm[0][0]
Encoder-3-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-3-MultiHeadSelfAttentio (None, 200, 768) 1536 Encoder-3-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-3-FeedForward (FeedForw (None, 200, 768) 4722432 Encoder-3-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-3-FeedForward-Dropout ( (None, 200, 768) 0 Encoder-3-FeedForward[0][0]
__________________________________________________________________________________________________
Encoder-3-FeedForward-Add (Add) (None, 200, 768) 0 Encoder-3-MultiHeadSelfAttention-
Encoder-3-FeedForward-Dropout[0][
__________________________________________________________________________________________________
Encoder-3-FeedForward-Norm (Lay (None, 200, 768) 1536 Encoder-3-FeedForward-Add[0][0]
__________________________________________________________________________________________________
Encoder-4-MultiHeadSelfAttentio (None, 200, 768) 2362368 Encoder-3-FeedForward-Norm[0][0]
__________________________________________________________________________________________________
Encoder-4-MultiHeadSelfAttentio (None, 200, 768) 0 Encoder-4-MultiHeadSelfAttention[
__________________________________________________________________________________________________
Encoder-4-MultiHeadSelfAttentio (None, 200, 768) 0 Encoder-3-FeedForward-Norm[0][0]
Encoder-4-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-4-MultiHeadSelfAttentio (None, 200, 768) 1536 Encoder-4-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-4-FeedForward (FeedForw (None, 200, 768) 4722432 Encoder-4-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-4-FeedForward-Dropout ( (None, 200, 768) 0 Encoder-4-FeedForward[0][0]
__________________________________________________________________________________________________
Encoder-4-FeedForward-Add (Add) (None, 200, 768) 0 Encoder-4-MultiHeadSelfAttention-
Encoder-4-FeedForward-Dropout[0][
__________________________________________________________________________________________________
Encoder-4-FeedForward-Norm (Lay (None, 200, 768) 1536 Encoder-4-FeedForward-Add[0][0]
__________________________________________________________________________________________________
Encoder-5-MultiHeadSelfAttentio (None, 200, 768) 2362368 Encoder-4-FeedForward-Norm[0][0]
__________________________________________________________________________________________________
Encoder-5-MultiHeadSelfAttentio (None, 200, 768) 0 Encoder-5-MultiHeadSelfAttention[
__________________________________________________________________________________________________
Encoder-5-MultiHeadSelfAttentio (None, 200, 768) 0 Encoder-4-FeedForward-Norm[0][0]
Encoder-5-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-5-MultiHeadSelfAttentio (None, 200, 768) 1536 Encoder-5-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-5-FeedForward (FeedForw (None, 200, 768) 4722432 Encoder-5-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-5-FeedForward-Dropout ( (None, 200, 768) 0 Encoder-5-FeedForward[0][0]
__________________________________________________________________________________________________
Encoder-5-FeedForward-Add (Add) (None, 200, 768) 0 Encoder-5-MultiHeadSelfAttention-
Encoder-5-FeedForward-Dropout[0][
__________________________________________________________________________________________________
Encoder-5-FeedForward-Norm (Lay (None, 200, 768) 1536 Encoder-5-FeedForward-Add[0][0]
__________________________________________________________________________________________________
Encoder-6-MultiHeadSelfAttentio (None, 200, 768) 2362368 Encoder-5-FeedForward-Norm[0][0]
__________________________________________________________________________________________________
Encoder-6-MultiHeadSelfAttentio (None, 200, 768) 0 Encoder-6-MultiHeadSelfAttention[
__________________________________________________________________________________________________
Encoder-6-MultiHeadSelfAttentio (None, 200, 768) 0 Encoder-5-FeedForward-Norm[0][0]
Encoder-6-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-6-MultiHeadSelfAttentio (None, 200, 768) 1536 Encoder-6-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-6-FeedForward (FeedForw (None, 200, 768) 4722432 Encoder-6-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-6-FeedForward-Dropout ( (None, 200, 768) 0 Encoder-6-FeedForward[0][0]
__________________________________________________________________________________________________
Encoder-6-FeedForward-Add (Add) (None, 200, 768) 0 Encoder-6-MultiHeadSelfAttention-
Encoder-6-FeedForward-Dropout[0][
__________________________________________________________________________________________________
Encoder-6-FeedForward-Norm (Lay (None, 200, 768) 1536 Encoder-6-FeedForward-Add[0][0]
__________________________________________________________________________________________________
Encoder-7-MultiHeadSelfAttentio (None, 200, 768) 2362368 Encoder-6-FeedForward-Norm[0][0]
__________________________________________________________________________________________________
Encoder-7-MultiHeadSelfAttentio (None, 200, 768) 0 Encoder-7-MultiHeadSelfAttention[
__________________________________________________________________________________________________
Encoder-7-MultiHeadSelfAttentio (None, 200, 768) 0 Encoder-6-FeedForward-Norm[0][0]
Encoder-7-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-7-MultiHeadSelfAttentio (None, 200, 768) 1536 Encoder-7-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-7-FeedForward (FeedForw (None, 200, 768) 4722432 Encoder-7-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-7-FeedForward-Dropout ( (None, 200, 768) 0 Encoder-7-FeedForward[0][0]
__________________________________________________________________________________________________
Encoder-7-FeedForward-Add (Add) (None, 200, 768) 0 Encoder-7-MultiHeadSelfAttention-
Encoder-7-FeedForward-Dropout[0][
__________________________________________________________________________________________________
Encoder-7-FeedForward-Norm (Lay (None, 200, 768) 1536 Encoder-7-FeedForward-Add[0][0]
__________________________________________________________________________________________________
Encoder-8-MultiHeadSelfAttentio (None, 200, 768) 2362368 Encoder-7-FeedForward-Norm[0][0]
__________________________________________________________________________________________________
Encoder-8-MultiHeadSelfAttentio (None, 200, 768) 0 Encoder-8-MultiHeadSelfAttention[
__________________________________________________________________________________________________
Encoder-8-MultiHeadSelfAttentio (None, 200, 768) 0 Encoder-7-FeedForward-Norm[0][0]
Encoder-8-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-8-MultiHeadSelfAttentio (None, 200, 768) 1536 Encoder-8-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-8-FeedForward (FeedForw (None, 200, 768) 4722432 Encoder-8-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-8-FeedForward-Dropout ( (None, 200, 768) 0 Encoder-8-FeedForward[0][0]
__________________________________________________________________________________________________
Encoder-8-FeedForward-Add (Add) (None, 200, 768) 0 Encoder-8-MultiHeadSelfAttention-
Encoder-8-FeedForward-Dropout[0][
__________________________________________________________________________________________________
Encoder-8-FeedForward-Norm (Lay (None, 200, 768) 1536 Encoder-8-FeedForward-Add[0][0]
__________________________________________________________________________________________________
Encoder-9-MultiHeadSelfAttentio (None, 200, 768) 2362368 Encoder-8-FeedForward-Norm[0][0]
__________________________________________________________________________________________________
Encoder-9-MultiHeadSelfAttentio (None, 200, 768) 0 Encoder-9-MultiHeadSelfAttention[
__________________________________________________________________________________________________
Encoder-9-MultiHeadSelfAttentio (None, 200, 768) 0 Encoder-8-FeedForward-Norm[0][0]
Encoder-9-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-9-MultiHeadSelfAttentio (None, 200, 768) 1536 Encoder-9-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-9-FeedForward (FeedForw (None, 200, 768) 4722432 Encoder-9-MultiHeadSelfAttention-
__________________________________________________________________________________________________
Encoder-9-FeedForward-Dropout ( (None, 200, 768) 0 Encoder-9-FeedForward[0][0]
__________________________________________________________________________________________________
Encoder-9-FeedForward-Add (Add) (None, 200, 768) 0 Encoder-9-MultiHeadSelfAttention-
Encoder-9-FeedForward-Dropout[0][
__________________________________________________________________________________________________
Encoder-9-FeedForward-Norm (Lay (None, 200, 768) 1536 Encoder-9-FeedForward-Add[0][0]
__________________________________________________________________________________________________
Encoder-10-MultiHeadSelfAttenti (None, 200, 768) 2362368 Encoder-9-FeedForward-Norm[0][0]
__________________________________________________________________________________________________
Encoder-10-MultiHeadSelfAttenti (None, 200, 768) 0 Encoder-10-MultiHeadSelfAttention
__________________________________________________________________________________________________
Encoder-10-MultiHeadSelfAttenti (None, 200, 768) 0 Encoder-9-FeedForward-Norm[0][0]
Encoder-10-MultiHeadSelfAttention
__________________________________________________________________________________________________
Encoder-10-MultiHeadSelfAttenti (None, 200, 768) 1536 Encoder-10-MultiHeadSelfAttention
__________________________________________________________________________________________________
Encoder-10-FeedForward (FeedFor (None, 200, 768) 4722432 Encoder-10-MultiHeadSelfAttention
__________________________________________________________________________________________________
Encoder-10-FeedForward-Dropout (None, 200, 768) 0 Encoder-10-FeedForward[0][0]
__________________________________________________________________________________________________
Encoder-10-FeedForward-Add (Add (None, 200, 768) 0 Encoder-10-MultiHeadSelfAttention
Encoder-10-FeedForward-Dropout[0]
__________________________________________________________________________________________________
Encoder-10-FeedForward-Norm (La (None, 200, 768) 1536 Encoder-10-FeedForward-Add[0][0]
__________________________________________________________________________________________________
Encoder-11-MultiHeadSelfAttenti (None, 200, 768) 2362368 Encoder-10-FeedForward-Norm[0][0]
__________________________________________________________________________________________________
Encoder-11-MultiHeadSelfAttenti (None, 200, 768) 0 Encoder-11-MultiHeadSelfAttention
__________________________________________________________________________________________________
Encoder-11-MultiHeadSelfAttenti (None, 200, 768) 0 Encoder-10-FeedForward-Norm[0][0]
Encoder-11-MultiHeadSelfAttention
__________________________________________________________________________________________________
Encoder-11-MultiHeadSelfAttenti (None, 200, 768) 1536 Encoder-11-MultiHeadSelfAttention
__________________________________________________________________________________________________
Encoder-11-FeedForward (FeedFor (None, 200, 768) 4722432 Encoder-11-MultiHeadSelfAttention
__________________________________________________________________________________________________
Encoder-11-FeedForward-Dropout (None, 200, 768) 0 Encoder-11-FeedForward[0][0]
__________________________________________________________________________________________________
Encoder-11-FeedForward-Add (Add (None, 200, 768) 0 Encoder-11-MultiHeadSelfAttention
Encoder-11-FeedForward-Dropout[0]
__________________________________________________________________________________________________
Encoder-11-FeedForward-Norm (La (None, 200, 768) 1536 Encoder-11-FeedForward-Add[0][0]
__________________________________________________________________________________________________
Encoder-12-MultiHeadSelfAttenti (None, 200, 768) 2362368 Encoder-11-FeedForward-Norm[0][0]
__________________________________________________________________________________________________
Encoder-12-MultiHeadSelfAttenti (None, 200, 768) 0 Encoder-12-MultiHeadSelfAttention
__________________________________________________________________________________________________
Encoder-12-MultiHeadSelfAttenti (None, 200, 768) 0 Encoder-11-FeedForward-Norm[0][0]
Encoder-12-MultiHeadSelfAttention
__________________________________________________________________________________________________
Encoder-12-MultiHeadSelfAttenti (None, 200, 768) 1536 Encoder-12-MultiHeadSelfAttention
__________________________________________________________________________________________________
Encoder-12-FeedForward (FeedFor (None, 200, 768) 4722432 Encoder-12-MultiHeadSelfAttention
__________________________________________________________________________________________________
Encoder-12-FeedForward-Dropout (None, 200, 768) 0 Encoder-12-FeedForward[0][0]
__________________________________________________________________________________________________
Encoder-12-FeedForward-Add (Add (None, 200, 768) 0 Encoder-12-MultiHeadSelfAttention
Encoder-12-FeedForward-Dropout[0]
__________________________________________________________________________________________________
Encoder-12-FeedForward-Norm (La (None, 200, 768) 1536 Encoder-12-FeedForward-Add[0][0]
__________________________________________________________________________________________________
non_masking_layer_4 (NonMasking (None, 200, 768) 0 Encoder-12-FeedForward-Norm[0][0]
__________________________________________________________________________________________________
bidirectional_3 (Bidirectional) (None, 200, 512) 2099200 non_masking_layer_4[0][0]
__________________________________________________________________________________________________
dense_3 (Dense) (None, 200, 128) 65664 bidirectional_3[0][0]
__________________________________________________________________________________________________
crf_3 (CRF) (None, 200, 10) 1410 dense_3[0][0]
==================================================================================================
Total params: 103,603,714
Trainable params: 2,166,274
Non-trainable params: 101,437,440
__________________________________________________________________________________________________
Epoch 1/1
506/506 [==============================] - 960s 2s/step - loss: 0.0377 - crf_accuracy: 0.9892 - acc: 0.7759
從模型的可視化輸出里面可以清晰的看到BERT的12層Transformer結(jié)構(gòu)孽惰,以及它的參數(shù)量晚岭。kashgari的作者做實(shí)驗(yàn)對比了基于BERT的NER和其他NER方法的效果,發(fā)現(xiàn)BERT確實(shí)強(qiáng)過其他方法勋功。預(yù)訓(xùn)練的語言模型確實(shí)展現(xiàn)出驚人的能力坦报。
結(jié)語
BERT 就像圖像領(lǐng)域的Imagenet,通過高難度的預(yù)訓(xùn)練任務(wù),以及強(qiáng)網(wǎng)絡(luò)模型去預(yù)先學(xué)習(xí)到領(lǐng)域相關(guān)的知識狂鞋,然后去做下游任務(wù)片择。 想較于一些比較于直接使用naive的模型去做深度學(xué)習(xí)任務(wù),BERT就像班里贏在起跑線上的孩子骚揍,肯定比其他孩子要強(qiáng)出一大截∽止埽現(xiàn)在是不是感受到BERT的威力了,嘗試用起來吧信不。
注意
由于kashgari框架的作者使用tensorflow2.0對整個(gè)框架進(jìn)行了重寫纤掸,導(dǎo)致有的接口不能使用,上訴代碼會bug浑塞,若想跑通基于BERT的NER借跪,請移步https://github.com/BrikerMan/Kashgari。
參考文獻(xiàn)
https://jalammar.github.io/illustrated-bert/
https://eliyar.biz/nlp_chinese_bert_ner/
https://github.com/BrikerMan/Kashgari