??在NLP中犹褒,序列標(biāo)注算法是常見(jiàn)的深度學(xué)習(xí)模型,但是弛针,對(duì)于序列標(biāo)注算法的評(píng)估叠骑,我們真的熟悉嗎?
??在本文中削茁,筆者將會(huì)序列標(biāo)注算法的模型效果評(píng)估方法和seqeval
的使用宙枷。
序列標(biāo)注算法的模型效果評(píng)估
??在序列標(biāo)注算法中,一般我們會(huì)形成如下的序列列表茧跋,如下:
['O', 'O', 'B-MISC', 'I-MISC', 'B-MISC', 'I-MISC', 'O', 'B-PER', 'I-PER']
一般序列標(biāo)注算法的格式有BIO
慰丛,IOBES
,BMES
等瘾杭。其中诅病,實(shí)體
指的是從B開(kāi)頭標(biāo)簽開(kāi)始的,同一類型(比如:PER/LOC/ORG)的粥烁,非O的連續(xù)標(biāo)簽序列贤笆。
??常見(jiàn)的序列標(biāo)注算法的模型效果評(píng)估指標(biāo)有準(zhǔn)確率(accuracy)、查準(zhǔn)率(percision)页徐、召回率(recall)、F1值等银萍,計(jì)算的公式如下:
- 準(zhǔn)確率: accuracy = 預(yù)測(cè)對(duì)的元素個(gè)數(shù)/總的元素個(gè)數(shù)
- 查準(zhǔn)率:precision = 預(yù)測(cè)正確的實(shí)體個(gè)數(shù) / 預(yù)測(cè)的實(shí)體總個(gè)數(shù)
- 召回率:recall = 預(yù)測(cè)正確的實(shí)體個(gè)數(shù) / 標(biāo)注的實(shí)體總個(gè)數(shù)
- F1值:F1 = 2 *準(zhǔn)確率 * 召回率 / (準(zhǔn)確率 + 召回率)
??舉個(gè)例子变勇,我們有如下的真實(shí)序列y_true
和預(yù)測(cè)序列y_pred
,如下:
y_true = ['O', 'O', 'O', 'B-MISC', 'I-MISC', 'I-MISC', 'O', 'B-PER', 'I-PER']
y_pred = ['O', 'O', 'B-MISC', 'I-MISC', 'B-MISC', 'I-MISC', 'O', 'B-PER', 'I-PER']
列表中一個(gè)有9個(gè)元素贴唇,其中預(yù)測(cè)對(duì)的元素個(gè)數(shù)為6個(gè)搀绣,那么準(zhǔn)確率為2/3。標(biāo)注的實(shí)體總個(gè)數(shù)為2個(gè)戳气,預(yù)測(cè)的實(shí)體總個(gè)數(shù)為3個(gè)链患,預(yù)測(cè)正確的實(shí)體個(gè)數(shù)為1個(gè),那么precision=1/3, recall=1/2, F1=0.4瓶您。
seqeval的使用
??一般我們的序列標(biāo)注算法麻捻,是用conlleval.pl
腳本實(shí)現(xiàn)纲仍,但這是用perl語(yǔ)言實(shí)現(xiàn)的。在Python中贸毕,也有相應(yīng)的序列標(biāo)注算法的模型效果評(píng)估的第三方模塊郑叠,那就是seqeval
,其官網(wǎng)網(wǎng)址為:https://pypi.org/project/seqeval/0.0.3/ 明棍。
??seqeval
支持BIO
乡革,IOBES
標(biāo)注模式,可用于命名實(shí)體識(shí)別摊腋,詞性標(biāo)注沸版,語(yǔ)義角色標(biāo)注等任務(wù)的評(píng)估。
??官網(wǎng)文檔中給出了兩個(gè)例子兴蒸,筆者修改如下:
??例子1:
# -*- coding: utf-8 -*-
from seqeval.metrics import f1_score
from seqeval.metrics import precision_score
from seqeval.metrics import accuracy_score
from seqeval.metrics import recall_score
from seqeval.metrics import classification_report
y_true = ['O', 'O', 'O', 'B-MISC', 'I-MISC', 'I-MISC', 'O', 'B-PER', 'I-PER']
y_pred = ['O', 'O', 'B-MISC', 'I-MISC', 'B-MISC', 'I-MISC', 'O', 'B-PER', 'I-PER']
print("accuary: ", accuracy_score(y_true, y_pred))
print("p: ", precision_score(y_true, y_pred))
print("r: ", recall_score(y_true, y_pred))
print("f1: ", f1_score(y_true, y_pred))
print("classification report: ")
print(classification_report(y_true, y_pred))
輸出結(jié)果如下:
accuary: 0.6666666666666666
p: 0.3333333333333333
r: 0.5
f1: 0.4
classification report:
precision recall f1-score support
MISC 0.00 0.00 0.00 1
PER 1.00 1.00 1.00 1
micro avg 0.33 0.50 0.40 2
macro avg 0.50 0.50 0.50 2
??例子2:
# -*- coding: utf-8 -*-
from seqeval.metrics import f1_score
from seqeval.metrics import precision_score
from seqeval.metrics import accuracy_score
from seqeval.metrics import recall_score
from seqeval.metrics import classification_report
y_true = [['O', 'O', 'O', 'B-MISC', 'I-MISC', 'I-MISC', 'O'], ['B-PER', 'I-PER']]
y_pred = [['O', 'O', 'B-MISC', 'I-MISC', 'B-MISC', 'I-MISC', 'O'], ['B-PER', 'I-PER']]
print("accuary: ", accuracy_score(y_true, y_pred))
print("p: ", precision_score(y_true, y_pred))
print("r: ", recall_score(y_true, y_pred))
print("f1: ", f1_score(y_true, y_pred))
print("classification report: ")
print(classification_report(y_true, y_pred))
輸出結(jié)果同上视粮。
在Keras中使用seqeval
??筆者一年多年寫(xiě)過(guò)文章:用深度學(xué)習(xí)實(shí)現(xiàn)命名實(shí)體識(shí)別(NER), 我們對(duì)模型訓(xùn)練部分的代碼加以改造类咧,使之在訓(xùn)練過(guò)程中能輸出F1值馒铃。
??在Github上下載項(xiàng)目DL_4_NER
,網(wǎng)址為:https://github.com/percent4/DL_4_NER 痕惋。修改utils.py中的文件夾路徑区宇,以及模型訓(xùn)練部分的代碼(DL_4_NER/Bi_LSTM_Model_training.py)如下:
# -*- coding: utf-8 -*-
import pickle
import numpy as np
import pandas as pd
from utils import BASE_DIR, CONSTANTS, load_data
from data_processing import data_processing
from keras.utils import np_utils, plot_model
from keras.models import Sequential
from keras.preprocessing.sequence import pad_sequences
from keras.layers import Bidirectional, LSTM, Dense, Embedding, TimeDistributed
# 模型輸入數(shù)據(jù)
def input_data_for_model(input_shape):
# 數(shù)據(jù)導(dǎo)入
input_data = load_data()
# 數(shù)據(jù)處理
data_processing()
# 導(dǎo)入字典
with open(CONSTANTS[1], 'rb') as f:
word_dictionary = pickle.load(f)
with open(CONSTANTS[2], 'rb') as f:
inverse_word_dictionary = pickle.load(f)
with open(CONSTANTS[3], 'rb') as f:
label_dictionary = pickle.load(f)
with open(CONSTANTS[4], 'rb') as f:
output_dictionary = pickle.load(f)
vocab_size = len(word_dictionary.keys())
label_size = len(label_dictionary.keys())
# 處理輸入數(shù)據(jù)
aggregate_function = lambda input: [(word, pos, label) for word, pos, label in
zip(input['word'].values.tolist(),
input['pos'].values.tolist(),
input['tag'].values.tolist())]
grouped_input_data = input_data.groupby('sent_no').apply(aggregate_function)
sentences = [sentence for sentence in grouped_input_data]
x = [[word_dictionary[word[0]] for word in sent] for sent in sentences]
x = pad_sequences(maxlen=input_shape, sequences=x, padding='post', value=0)
y = [[label_dictionary[word[2]] for word in sent] for sent in sentences]
y = pad_sequences(maxlen=input_shape, sequences=y, padding='post', value=0)
y = [np_utils.to_categorical(label, num_classes=label_size + 1) for label in y]
return x, y, output_dictionary, vocab_size, label_size, inverse_word_dictionary
# 定義深度學(xué)習(xí)模型:Bi-LSTM
def create_Bi_LSTM(vocab_size, label_size, input_shape, output_dim, n_units, out_act, activation):
model = Sequential()
model.add(Embedding(input_dim=vocab_size + 1, output_dim=output_dim,
input_length=input_shape, mask_zero=True))
model.add(Bidirectional(LSTM(units=n_units, activation=activation,
return_sequences=True)))
model.add(TimeDistributed(Dense(label_size + 1, activation=out_act)))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
return model
# 模型訓(xùn)練
def model_train():
# 將數(shù)據(jù)集分為訓(xùn)練集和測(cè)試集,占比為9:1
input_shape = 60
x, y, output_dictionary, vocab_size, label_size, inverse_word_dictionary = input_data_for_model(input_shape)
train_end = int(len(x)*0.9)
train_x, train_y = x[0:train_end], np.array(y[0:train_end])
test_x, test_y = x[train_end:], np.array(y[train_end:])
# 模型輸入?yún)?shù)
activation = 'selu'
out_act = 'softmax'
n_units = 100
batch_size = 32
epochs = 10
output_dim = 20
# 模型訓(xùn)練
lstm_model = create_Bi_LSTM(vocab_size, label_size, input_shape, output_dim, n_units, out_act, activation)
lstm_model.fit(train_x, train_y, validation_data=(test_x, test_y), epochs=epochs, batch_size=batch_size, verbose=1)
model_train()
模型訓(xùn)練的結(jié)果如下(中間過(guò)程省略):
......
12598/12598 [==============================] - 26s 2ms/step - loss: 0.0075 - acc: 0.9981 - val_loss: 0.2131 - val_acc: 0.9592
??我們修改代碼值戳,在lstm_model.fit那一行修改代碼如下:
lables = ['O', 'B-MISC', 'I-MISC', 'B-ORG', 'I-ORG', 'B-PER', 'B-LOC', 'I-PER', 'I-LOC', 'sO']
id2label = dict(zip(range(len(lables)), lables))
callbacks = [F1Metrics(id2label)]
lstm_model.fit(train_x, train_y, validation_data=(test_x, test_y), epochs=epochs,
batch_size=batch_size, verbose=1, callbacks=callbacks)
此時(shí)輸出結(jié)果為:
12598/12598 [==============================] - 26s 2ms/step - loss: 0.0089 - acc: 0.9978 - val_loss: 0.2145 - val_acc: 0.9560
- f1: 95.40
precision recall f1-score support
MISC 0.9707 0.9833 0.9769 15844
PER 0.9080 0.8194 0.8614 1157
LOC 0.7517 0.8095 0.7795 677
ORG 0.8290 0.7289 0.7757 745
sO 0.7757 0.8300 0.8019 100
micro avg 0.9524 0.9556 0.9540 18523
macro avg 0.9520 0.9556 0.9535 18523
這就是seqeval的強(qiáng)大之處议谷。
??關(guān)于seqeval在Keras的使用,有不清楚的地方可以參考該項(xiàng)目的Github網(wǎng)址:https://github.com/chakki-works/seqeval 堕虹。
總結(jié)
??感謝大家的閱讀卧晓,本次分享到此結(jié)束。
??歡迎大家關(guān)注我的微信公眾號(hào):Python爬蟲(chóng)與算法
赴捞。
參考網(wǎng)址
- 序列標(biāo)注的準(zhǔn)確率和召回率計(jì)算: https://zhuanlan.zhihu.com/p/56582082
- seqeval官方文檔: https://pypi.org/project/seqeval/0.0.3/