目錄: https://github.com/dolyw/ProjectStudy/tree/master/Elasticsearch
項目地址
- Github:https://github.com/dolyw/ProjectStudy/tree/master/Elasticsearch
- Gitee(碼云):https://gitee.com/dolyw/ProjectStudy/tree/master/Elasticsearch
安裝本地Elasticsearch
的IK分詞插件
去https://github.com/medcl/elasticsearch-analysis-ik/releases下載對應Elasticsearch
版本的IK分詞插件elasticsearch-analysis-ik-7.3.0.zip
這個文件六孵,打開可以看到如下文件
commons-codec-1.9.jar
commons-logging-1.2.jar
config/
elasticsearch-analysis-ik-7.2.0.jar
httpclient-4.5.2.jar
httpcore-4.4.4.jar
plugin-descriptor.properties
plugin-security.policy
沒問題溉旋,就解壓到你安裝的Elasticsearch
目錄的plugins
目錄下,例如我的路徑是這樣的D:\Tools\elasticsearch-7.2.0\plugins\elasticsearch-analysis-ik-7.2.0
重啟Elasticsearch
旷坦,可以看到控制臺打印日志
loaded plugin [analysis-ik]
測試一下
POST /_analyze
{
"text":"中華人民共和國國徽",
"analyzer":"ik_smart"
}
返回
{
"tokens": [
{
"token": "中華人民共和國",
"start_offset": 0,
"end_offset": 7,
"type": "CN_WORD",
"position": 0
},
{
"token": "國徽",
"start_offset": 7,
"end_offset": 9,
"type": "CN_WORD",
"position": 1
}
]
}
POST /_analyze
{
"text":"中華人民共和國國徽",
"analyzer":"ik_max_word"
}
返回
{
"tokens": [
{
"token": "中華人民共和國",
"start_offset": 0,
"end_offset": 7,
"type": "CN_WORD",
"position": 0
},
{
"token": "中華人民",
"start_offset": 0,
"end_offset": 4,
"type": "CN_WORD",
"position": 1
},
{
"token": "中華",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 2
},
{
"token": "華人",
"start_offset": 1,
"end_offset": 3,
"type": "CN_WORD",
"position": 3
},
{
"token": "人民共和國",
"start_offset": 2,
"end_offset": 7,
"type": "CN_WORD",
"position": 4
},
{
"token": "人民",
"start_offset": 2,
"end_offset": 4,
"type": "CN_WORD",
"position": 5
},
{
"token": "共和國",
"start_offset": 4,
"end_offset": 7,
"type": "CN_WORD",
"position": 6
},
{
"token": "共和",
"start_offset": 4,
"end_offset": 6,
"type": "CN_WORD",
"position": 7
},
{
"token": "國",
"start_offset": 6,
"end_offset": 7,
"type": "CN_CHAR",
"position": 8
},
{
"token": "國徽",
"start_offset": 7,
"end_offset": 9,
"type": "CN_WORD",
"position": 9
}
]
}
IK分詞插件就這樣安裝成功了
安裝本地Elasticsearch
的拼音分詞插件
去https://github.com/medcl/elasticsearch-analysis-pinyin/releases下載對應Elasticsearch
版本的IK分詞插件elasticsearch-analysis-pinyin-7.2.0.zip
這個文件,打開可以看到如下文件
elasticsearch-analysis-pinyin-7.2.0.jar
nlp-lang-1.7.jar
plugin-descriptor.properties
沒問題构拳,就解壓到你安裝的Elasticsearch
目錄的plugins
目錄下店诗,例如我的路徑是這樣的D:\Tools\elasticsearch-7.2.0\plugins\elasticsearch-analysis-pinyin-7.2.0
重啟Elasticsearch
,可以看到控制臺打印日志
loaded plugin [analysis-pinyin]
測試一下
POST /_analyze
{
"text":"中華人民共和國國徽",
"analyzer":"pinyin"
}
返回
{
"tokens": [
{
"token": "zhong",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 0
},
{
"token": "zhrmghggh",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 0
},
{
"token": "hua",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 1
},
{
"token": "ren",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 2
},
{
"token": "min",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 3
},
{
"token": "gong",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 4
},
{
"token": "he",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 5
},
{
"token": "guo",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 6
},
{
"token": "guo",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 7
},
{
"token": "hui",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 8
}
]
}
拼音分詞插件就這樣安裝成功了
使用IK和拼音插件(詳細使用可以查看Github的文檔)
- 創(chuàng)建Index奢讨,拼音分詞過濾
PUT /book
{
"settings": {
"analysis": {
"analyzer": {
"pinyin_analyzer": {
"tokenizer": "my_pinyin"
}
},
"tokenizer": {
"my_pinyin": {
"type": "pinyin",
"keep_separate_first_letter": false,
"keep_full_pinyin": true,
"keep_original": true,
"limit_first_letter_length": 16,
"lowercase": true,
"remove_duplicated_term": true
}
}
}
}
}
返回
{
"acknowledged": true,
"shards_acknowledged": true,
"index": "book"
}
- 創(chuàng)建Mapping,屬性使用過濾焰薄,name開啟拼音分詞拿诸,content開啟IK分詞,describe開啟拼音加IK分詞
POST /book/_mapping
{
"properties": {
"name": {
"type": "keyword",
"fields": {
"pinyin": {
"type": "text",
"store": false,
"term_vector": "with_offsets",
"analyzer": "pinyin_analyzer",
"boost": 10
}
}
},
"content": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
},
"describe": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart",
"fields": {
"pinyin": {
"type": "text",
"store": false,
"term_vector": "with_offsets",
"analyzer": "pinyin_analyzer",
"boost": 10
}
}
},
"id": {
"type": "long"
}
}
}
返回
{
"acknowledged": true
}
這樣Index以及屬性分詞就開啟了