pytroch bert使用

安裝transformers

pip install transformers

常用import

from transformers import BertTokenizer, BertModel, BertForMaskedLM

定義下載bert模型

下載中文bert-wwm模型wwm的地址
將config文件条篷、vocab文件轻腺、bin文件放在/model/(bert)的下面

bert_config.json 改名為 config.json
chinese_wwm_pytorch.bin 改名為 pytorch_model.bin

bert_path='./model/bert-wwm/'
model_config = BertConfig.from_pretrained(bert_path)
bert = BertModel.from_pretrained(bert_path,config=model_config)

bert_path='./model/bert-wwm/'  #bert-wwm是針對中文的一種優(yōu)化版本

使用bert的分詞器

from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained(bert_path)

分詞

s1="我要去看北京太難夢"
print(tokenizer.tokenize(s1))

句子編碼向量化

print(tokenizer.encode('吾兒莫慌'))

句子處理編碼梦皮，加入分割

sen_code = tokenizer.encode_plus('這個故事沒有終點', "正如星空沒有彼岸")

編碼后的句子反過來

print(tokenizer.convert_ids_to_tokens(sen_code['input_ids']))

text_dict = tokenizer.encode_plus(
    text,  # Sentence to encode.
    add_special_tokens=True,  # Add '[CLS]' and '[SEP]'
    max_length=max_length,  # Pad & truncate all sentences.
    ad_to_max_length=True,
    return_attention_mask=True,  # Construct attn. masks.
    #                                                    return_tensors='pt',     # Return pytorch tensors.
)

encode_plus產(chǎn)生字典的數(shù)據(jù)索引

input_ids, attention_mask, token_type_ids = text_dict['input_ids'], text_dict['attention_mask'], text_dict['token_type_ids']

編碼輸出

result = bert(ids, output_all_encoded_layers=True)

bert模型返回

result = (
    [encoder_0_output, encoder_1_output, ..., encoder_11_output], 
    pool_output
)

返回bert的12層的transformer的輸出

output_all_encoded_layers參數(shù)設(shè)置為Fasle，那么result中的第一個元素就不是列表了，只是encoder_11_output着降，大小為
[batch_size, sequence_length, hidden_size]的張量，可以看作bert對于這句話的表示

transformers有任務(wù)pipeline

情感分類

from transformers import pipeline

nlp = pipeline("sentiment-analysis")

print(nlp("I hate you"))
print(nlp("I love you"))

抽取式問答

from transformers import pipeline

nlp = pipeline("question-answering")

context = r"""
Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a
question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune
a model on a SQuAD task, you may leverage the `run_squad.py`.
"""

print(nlp(question="What is extractive question answering?", context=context))
print(nlp(question="What is a good example of a question answering dataset?", context=context))

最后編輯于：2020.10.04 20:38:20

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者