Rasa_NLU 源碼分析

周末找了個 nlp 相關(guān)的工具，使用起來還不錯照皆，它就是 rasa_nlu, 具有實體識別重绷，意圖分類等功能，在加上一個簡單的意圖操作即可實現(xiàn)簡單的 chatbot 功能纵寝，其類圖如下所示：

Rasa_NLU 類依賴圖

整體程序的入口是在 data_router.py 文件中的 DataRouter 類中论寨，主要作用是將模型以 project 的方式進行管理星立，控制數(shù)據(jù)的流向問題

component_classes 中包含所有 Component 類

# Classes of all known components. If a new component should be added,
# its class name should be listed here.
component_classes = [
    SpacyNLP, MitieNLP,
    SpacyEntityExtractor, MitieEntityExtractor, DucklingExtractor,
    CRFEntityExtractor, DucklingHTTPExtractor,
    EntitySynonymMapper,
    SpacyFeaturizer, MitieFeaturizer, NGramFeaturizer, RegexFeaturizer,
    CountVectorsFeaturizer,
    MitieTokenizer, SpacyTokenizer, WhitespaceTokenizer, JiebaTokenizer,
    SklearnIntentClassifier, MitieIntentClassifier, KeywordIntentClassifier,
    EmbeddingIntentClassifier
]

# Mapping from a components name to its class to allow name based lookup.
registered_components = {c.name: c for c in component_classes}

registered_components 通過將 component_classes 中的類進行迭代并遍歷出名稱 Map

get_component_class 函數(shù)將名稱轉(zhuǎn)為相應的 Component 類

主要架構(gòu)相關(guān)的文件：

registry.py 文件 主要作用是將 pipeline 中的名稱轉(zhuǎn)為相應的 類爽茴，以及導入相應的模型文件
config.py 配置文件轉(zhuǎn)換
model.py 文件 主要是模型相關(guān)內(nèi)容

類名	說明
RasaNLUModelConfig	用來存放訓練是使用的 pipeline 參數(shù)
Metadata	將 model 目錄下 metadata.json 文件進行解析，并緩存
Trainer	訓練所有相關(guān)的 Component 部分绰垂，通過 train 函數(shù)進行訓練室奏，通過 persist 函數(shù)進行持久化存儲
Interpreter	通過訓練好的 pipeline 模型解析文本字符串
Persistor	用于存儲模型在云端 aws，gcs劲装，azure等

在 persist 函數(shù)中胧沫，通過 self.pipeline 緩存內(nèi)容，加上各種參數(shù)以及相應模型文件配置到 metadata.json 文件中

Interpreter 初始化流程

1. 加載 MetaData 數(shù)據(jù)內(nèi)容
2. 根據(jù) metadata.json 中 pipeline 構(gòu)件Component 執(zhí)行序列
3. 初始化 Interpreter 參數(shù)列表

Interpreter 解析文本過程

1. 將文本通過 Message 進行封裝
2. 根據(jù) Component 執(zhí)行序列處理 Message 對象
3. 格式化輸出 Message 對象內(nèi)容

Message 中通過 Map 將所有計算結(jié)果存放在相應的地方最終格式化為輸出結(jié)果

最后編輯于：2018.08.06 16:32:17

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者