安裝transformers
pip install transformers
常用import
from transformers import BertTokenizer, BertModel, BertForMaskedLM
定義下載bert模型
下載中文bert-wwm模型wwm的地址
將config文件条篷、vocab文件轻腺、bin文件放在/model/(bert)的下面
- bert_config.json 改名為 config.json
- chinese_wwm_pytorch.bin 改名為 pytorch_model.bin
bert_path='./model/bert-wwm/'
model_config = BertConfig.from_pretrained(bert_path)
bert = BertModel.from_pretrained(bert_path,config=model_config)
bert_path='./model/bert-wwm/' #bert-wwm是針對中文的一種優(yōu)化版本
使用bert的分詞器
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained(bert_path)
分詞
s1="我要去看北京太難夢"
print(tokenizer.tokenize(s1))
句子編碼向量化
print(tokenizer.encode('吾兒莫慌'))
句子處理編碼梦皮,加入分割
sen_code = tokenizer.encode_plus('這個故事沒有終點', "正如星空沒有彼岸")
編碼后的句子反過來
print(tokenizer.convert_ids_to_tokens(sen_code['input_ids']))
text_dict = tokenizer.encode_plus(
text, # Sentence to encode.
add_special_tokens=True, # Add '[CLS]' and '[SEP]'
max_length=max_length, # Pad & truncate all sentences.
ad_to_max_length=True,
return_attention_mask=True, # Construct attn. masks.
# return_tensors='pt', # Return pytorch tensors.
)
encode_plus產(chǎn)生字典的數(shù)據(jù)索引
input_ids, attention_mask, token_type_ids = text_dict['input_ids'], text_dict['attention_mask'], text_dict['token_type_ids']
編碼輸出
result = bert(ids, output_all_encoded_layers=True)
bert模型返回
result = (
[encoder_0_output, encoder_1_output, ..., encoder_11_output],
pool_output
)
返回bert的12層的transformer的輸出
output_all_encoded_layers參數(shù)設(shè)置為Fasle,那么result中的第一個元素就不是列表了,只是encoder_11_output着降,大小為
[batch_size, sequence_length, hidden_size]的張量,可以看作bert對于這句話的表示
transformers有任務(wù)pipeline
情感分類
from transformers import pipeline
nlp = pipeline("sentiment-analysis")
print(nlp("I hate you"))
print(nlp("I love you"))
抽取式問答
from transformers import pipeline
nlp = pipeline("question-answering")
context = r"""
Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a
question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune
a model on a SQuAD task, you may leverage the `run_squad.py`.
"""
print(nlp(question="What is extractive question answering?", context=context))
print(nlp(question="What is a good example of a question answering dataset?", context=context))