ES作為最強大的全文檢索工具(沒有之一),中英文分詞幾乎是必備功能,下面簡單說明下分詞器安裝步驟(詳細步驟網(wǎng)上很多,本文只提供整體思路和步驟):
1. 下載中文/拼音分詞器
IK中文分詞器:https://github.com/medcl/elasticsearch-analysis-ik
拼音分詞器:https://github.com/medcl/elasticsearch-analysis-pinyin
(竟然都是同一個作者的杰作,還有mmseg和簡繁轉(zhuǎn)換的類庫混弥,依然默默 watch)
2. 安裝
- 通過releases找到和es對應(yīng)版本的zip文件,或者source文件(自己通過mvn package打包)对省;當(dāng)然也可以下載最新master的代碼蝗拿。
- 進入elasticsearch安裝目錄/plugins;mkdir pinyin蒿涎;cd pinyin哀托;
- cp 剛才打包的zip文件到pinyin目錄;unzip解壓
- 部署后劳秋,記得重啟es節(jié)點
3. 配置
** settings配置 **
PUT my_index/_settings
"index" : {
"number_of_shards" : "3",
"number_of_replicas" : "1",
"analysis" : {
"analyzer" : {
"default" : {
"tokenizer" : "ik_max_word"
},
"pinyin_analyzer" : {
"tokenizer" : "my_pinyin"
}
},
"tokenizer" : {
"my_pinyin" : {
"keep_separate_first_letter" : "false",
"lowercase" : "true",
"type" : "pinyin",
"limit_first_letter_length" : "16",
"keep_original" : "true",
"keep_full_pinyin" : "true"
}
}
}
}
** mapping 配置 **
PUT my_index/index_type/_mapping
"ep" : {
"_all" : {
"analyzer" : "ik_max_word"
},
"properties" : {
"name" : {
"type" : "text",
"analyzer" : "ik_max_word",
"include_in_all" : true,
"fields" : {
"pinyin" : {
"type" : "text",
"term_vector" : "with_positions_offsets",
"analyzer" : "pinyin_analyzer",
"boost" : 10.0
}
}
}
}
}
4. 測試
通過_analyze測試下分詞器是否能正常運行:
GET my_index/_analyze
{
"text":["劉德華"],
"ananlyzer":"pinyin_analyzer"
}
向index中put中文數(shù)據(jù):
POST my_index/index_type -d'
{
"name":"劉德華"
}
'
中文分詞測試(通過查詢字符串)
curl http://localhost:9200/my_index/index_type/_search?q=name:劉
curl http://localhost:9200/my_index/index_type/_search?q=name:劉德
拼音測試 (通過查詢字符串)
curl http://localhost:9200/my_index/index_type/_search?q=name.pinyin:liu
curl http://localhost:9200/my_index/index_type/_search?q=name.pinyin:ldh
curl http://localhost:9200/my_index/index_type/_search?q=name.pinyin:de+hua