3.1 下載IK分詞器插件
https://github.com/medcl/elasticsearch-analysis-ik/releases
找到與自己ElasticSearch對應(yīng)版本的進(jìn)行下載宴凉,下載之后解壓委煤,在ElasticSearch目錄中的plugins目錄下新建一個名為的ik文件夾淮捆,將解壓后文件夾中的所有內(nèi)容復(fù)制到ik文件夾下
如果你的ElasticSearch部署了多個節(jié)點(diǎn)碉哑,則別忘了在每個節(jié)點(diǎn)都要重復(fù)上述操作熬荆,即將ik文件夾下的所有文件復(fù)制到其他節(jié)點(diǎn)鸵赫。
在linux環(huán)境下:
[elastic@node1 plugins]$ scp -r ik elastic@node2:/opt/elasticsearch-6.2.3/plugins/
3.2 重啟ElasticSearch
可以在倒數(shù)第三行的可以看出炬太,ik分詞器插件已經(jīng)被加載拆讯。
3.3 測試IK中文分詞器的基本功能
(1)ik_smart
其中pretty本意”漂亮的”屏富,表示以美觀的形式打印出JSON格式響應(yīng)晴竞。
GET _analyze?pretty
{
"analyzer": "ik_smart",
"text":"安徽省長江流域"
}
分詞結(jié)果
{
"tokens": [
{
"token": "安徽省",
"start_offset": 0,
"end_offset": 3,
"type": "CN_WORD",
"position": 0
},
{
"token": "長江流域",
"start_offset": 3,
"end_offset": 7,
"type": "CN_WORD",
"position": 1
}
]
}
(2)ik_max_word
GET _analyze?pretty
{
"analyzer": "ik_max_word",
"text":"安徽省長江流域"
}
分詞結(jié)果
{
"tokens": [
{
"token": "安徽省",
"start_offset": 0,
"end_offset": 3,
"type": "CN_WORD",
"position": 0
},
{
"token": "安徽",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 1
},
{
"token": "省長",
"start_offset": 2,
"end_offset": 4,
"type": "CN_WORD",
"position": 2
},
{
"token": "長江流域",
"start_offset": 3,
"end_offset": 7,
"type": "CN_WORD",
"position": 3
},
{
"token": "長江",
"start_offset": 3,
"end_offset": 5,
"type": "CN_WORD",
"position": 4
},
{
"token": "江流",
"start_offset": 4,
"end_offset": 6,
"type": "CN_WORD",
"position": 5
},
{
"token": "流域",
"start_offset": 5,
"end_offset": 7,
"type": "CN_WORD",
"position": 6
}
]
}
(3)新詞
GET _analyze?pretty
{
"analyzer": "ik_smart",
"text": "王者榮耀"
}
分詞結(jié)果
{
"tokens": [
{
"token": "王者",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 0
},
{
"token": "榮耀",
"start_offset": 2,
"end_offset": 4,
"type": "CN_WORD",
"position": 1
}
]
}
3.4 擴(kuò)展字典
(1)查看已有詞典
已有詞典在ik文件夾下的config下
自定義詞典:
linux環(huán)境下:
[es@node1 analysis-ik]$ mkdir custom
[es@node1 analysis-ik]$ vi custom/new_word.dic
[es@node1 analysis-ik]$ cat custom/new_word.dic
老鐵
王者榮耀
洪荒之力
共有產(chǎn)權(quán)房
一帶一路
[es@node1 analysis-ik]$
在windows環(huán)境下只需要在對應(yīng)文件夾下創(chuàng)建對應(yīng)字典文件(xxxx.dic)即可
(3)更新配置
[es@node1 analysis-ik]$ vi IKAnalyzer.cfg.xml
[es@node1 analysis-ik]$ cat IKAnalyzer.cfg.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer 擴(kuò)展配置</comment>
<!--用戶可以在這里配置自己的擴(kuò)展字典 -->
<entry key="ext_dict">custom/new_word.dic</entry>
<!--用戶可以在這里配置自己的擴(kuò)展停止詞字典-->
<entry key="ext_stopwords"></entry>
<!--用戶可以在這里配置遠(yuǎn)程擴(kuò)展字典 -->
<!-- <entry key="remote_ext_dict">words_location</entry> -->
<!--用戶可以在這里配置遠(yuǎn)程擴(kuò)展停止詞字典-->
<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>
[es@node1 analysis-ik]$
(4)重啟elasticsearch
(5)重啟Kibana
重啟Kibana后,從新執(zhí)行下面命令:
GET _analyze?pretty
{
"analyzer": "ik_smart",
"text":"王者榮耀"
}
分詞結(jié)果
{
"tokens": [
{
"token": "王者榮耀",
"start_offset": 0,
"end_offset": 4,
"type": "CN_WORD",
"position": 0
}
]
}