下載
到官網(wǎng)下載https://github.com/medcl/elasticsearch-analysis-ik對應版本的ik(直接下載releases版本攒暇,避免maven打包J└摇2芸辐脖!
如果不是這個版本,則需要進入解壓后的目錄使用mvn package打包皆愉,然后在target->releases目錄下會生成對應的zip文件)
安裝
上傳zip包到plugins目錄嗜价,然后解壓:
[es@Centos-51 plugins]$ ls
elasticsearch-analysis-ik-7.4.2.zip
[es@Centos-51 plugins]$ mkdir elasticsearch-analysis-ik-7.4.2
[es@Centos-51 plugins]$ ls
elasticsearch-analysis-ik-7.4.2? elasticsearch-analysis-ik-7.4.2.zip
[es@Centos-51 plugins]$ unzip elasticsearch-analysis-ik-7.4.2.zip -d ./elasticsearch-analysis-ik-7.4.2
Archive:? elasticsearch-analysis-ik-7.4.2.zip
? inflating: ./elasticsearch-analysis-ik-7.4.2/elasticsearch-analysis-ik-7.4.2.jar?
? inflating: ./elasticsearch-analysis-ik-7.4.2/httpclient-4.5.2.jar?
? inflating: ./elasticsearch-analysis-ik-7.4.2/httpcore-4.4.4.jar?
? inflating: ./elasticsearch-analysis-ik-7.4.2/commons-logging-1.2.jar?
? inflating: ./elasticsearch-analysis-ik-7.4.2/commons-codec-1.9.jar?
? inflating: ./elasticsearch-analysis-ik-7.4.2/plugin-descriptor.properties?
? inflating: ./elasticsearch-analysis-ik-7.4.2/plugin-security.policy?
? creating: ./elasticsearch-analysis-ik-7.4.2/config/
? inflating: ./elasticsearch-analysis-ik-7.4.2/config/surname.dic?
? inflating: ./elasticsearch-analysis-ik-7.4.2/config/quantifier.dic?
? inflating: ./elasticsearch-analysis-ik-7.4.2/config/extra_stopword.dic?
? inflating: ./elasticsearch-analysis-ik-7.4.2/config/suffix.dic?
? inflating: ./elasticsearch-analysis-ik-7.4.2/config/extra_single_word_full.dic?
? inflating: ./elasticsearch-analysis-ik-7.4.2/config/extra_single_word.dic?
? inflating: ./elasticsearch-analysis-ik-7.4.2/config/preposition.dic?
? inflating: ./elasticsearch-analysis-ik-7.4.2/config/IKAnalyzer.cfg.xml?
? inflating: ./elasticsearch-analysis-ik-7.4.2/config/main.dic?
? inflating: ./elasticsearch-analysis-ik-7.4.2/config/stopword.dic?
? inflating: ./elasticsearch-analysis-ik-7.4.2/config/extra_main.dic?
? inflating: ./elasticsearch-analysis-ik-7.4.2/config/extra_single_word_low_freq.dic?
[es@Centos-51 plugins]$ ls
elasticsearch-analysis-ik-7.4.2? elasticsearch-analysis-ik-7.4.2.zip
[es@Centos-51 plugins]$ cd elasticsearch-analysis-ik-7.4.2
[es@Centos-51 elasticsearch-analysis-ik-7.4.2]$ ls
commons-codec-1.9.jar? commons-logging-1.2.jar? config? elasticsearch-analysis-ik-7.4.2.jar? httpclient-4.5.2.jar? httpcore-4.4.4.jar? plugin-descriptor.properties? plugin-security.policy
[es@Centos-51 elasticsearch-analysis-ik-7.4.2]$ cd ..
[es@Centos-51 plugins]$ ls
elasticsearch-analysis-ik-7.4.2? elasticsearch-analysis-ik-7.4.2.zip
[es@Centos-51 plugins]$ rm -rf elasticsearch-analysis-ik-7.4.2.zip
驗證
使用ik_smart分詞結果:
{
????"tokens":?[
????????{
????????????"token":?"我",
????????????"start_offset":?0,
????????????"end_offset":?1,
????????????"type":?"CN_CHAR",
????????????"position":?0
????????},
????????{
????????????"token":?"是",
????????????"start_offset":?1,
????????????"end_offset":?2,
????????????"type":?"CN_CHAR",
????????????"position":?1
????????},
????????{
????????????"token":?"中國人",
????????????"start_offset":?2,
????????????"end_offset":?5,
????????????"type":?"CN_WORD",
????????????"position":?2
????????}
????]
}
使用ik_max_word分詞結果:
{
????"tokens":?[
????????{
????????????"token":?"我",
????????????"start_offset":?0,
????????????"end_offset":?1,
????????????"type":?"CN_CHAR",
????????????"position":?0
????????},
????????{
????????????"token":?"是",
????????????"start_offset":?1,
????????????"end_offset":?2,
????????????"type":?"CN_CHAR",
????????????"position":?1
????????},
????????{
????????????"token":?"中國人",
????????????"start_offset":?2,
????????????"end_offset":?5,
????????????"type":?"CN_WORD",
????????????"position":?2
????????},
????????{
????????????"token":?"中國",
????????????"start_offset":?2,
????????????"end_offset":?4,
????????????"type":?"CN_WORD",
????????????"position":?3
????????},
????????{
????????????"token":?"國人",
????????????"start_offset":?3,
????????????"end_offset":?5,
????????????"type":?"CN_WORD",
????????????"position":?4
????????}
????]
}
ik_max_word: 會將文本做最細粒度的拆分艇抠,比如會將“我是中國人”拆分為“我,是久锥,中國人家淤,中國,國人”瑟由,會窮盡各種可能的組合絮重。
ik_smart: 會做最粗粒度的拆分,比如會將“我是中國人”拆分為“我歹苦,是青伤,中國人”。