為什么需要中文分詞
不造輪子了, 這一篇文檔寫得挺易懂的: elasticsearch 利用ik分詞搜索
安裝ik
去到ik的github, 里面會(huì)說如何安裝
install
由于我想使用docker作為部署方式, 所以我的路子是: 到releases下載插件, 并打包成一個(gè)docker鏡像.
// 新建一個(gè)目錄用于構(gòu)建docker鏡像
mkdir elasticsearch-ik
cd elasticsearch-ik
寫入Dockerfile
$ echo "FROM elasticsearch:5.5.1
RUN mkdir -p /usr/share/elasticsearch/plugins/ik
COPY ./ik /usr/share/elasticsearch/plugins/ik/" >> Dockerfile
到Releases頁面https://github.com/medcl/elasticsearch-analysis-ik/releases/tag/v5.5.1 下載編譯好的文件. (注意選擇自己的ES版本).
將下載的文件解壓, 并放在此目錄, 如下
├── Dockerfile
└── ik
├── commons-codec-1.9.jar
├── commons-logging-1.2.jar
├── config
│ ├── extra_main.dic
│ ├── extra_single_word.dic
│ ├── extra_single_word_full.dic
│ ├── extra_single_word_low_freq.dic
│ ├── extra_stopword.dic
│ ├── IKAnalyzer.cfg.xml
│ ├── main.dic
│ ├── preposition.dic
│ ├── quantifier.dic
│ ├── stopword.dic
│ ├── suffix.dic
│ └── surname.dic
├── elasticsearch-analysis-ik-5.5.1.jar
├── httpclient-4.5.2.jar
├── httpcore-4.4.4.jar
└── plugin-descriptor.properties
開始構(gòu)建
sudo docker build . -t bysir/elasticsearch:ik-5.5.1
// push 到遠(yuǎn)程倉庫
sudo docker push bysir/elasticsearch:ik-5.5.1
完成, 使用方法和原來的官方鏡像一樣.
將ik設(shè)置為analyzer.
安裝好插件之后如何使用插件呢?
需要新建Index并指定使用ik作為analyzer, 如下
curl -XPUT http://localhost:9200/content -H 'Content-Type:application/json' -d'
{
"mappings": {
"content": {
"properties": {
"data": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_max_word"
},
"id": {
"type": "text"
}
}
}
}
}
那么問題來了, 如果已經(jīng)有Index了該怎么辦?
修改已有Index的Mapping
ES也有api可以修改mapping
curl -XPOST http://localhost:9200/content/content/_mapping -H 'Content-Type:application/json' -d'
{
"properties": {
"data": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_max_word"
},
"id": {
"type": "text"
}
}
}
但如果有property的analyzer有變化 則會(huì)修改不成功, 報(bào)錯(cuò) Mapper for [content] conflicts with existing mapping in other types:\n[mapper [data] has different [analyzer]]
那這個(gè)時(shí)候怎么辦?
使用reindex解決mapping修改沖突
用報(bào)錯(cuò)信息去Google就能得到答案, 參看: change analyzer for an elasticsearch index?
- Create a new index with the mapping you want
- Use "reindex" to copy the data from the old index to the new one
- Drop the old index, but create an alias with the name of the old index that points to the new index (because ElasticSearch does not allow you to rename an index.)
Tips:
- reindex會(huì)復(fù)制老Index的數(shù)據(jù)到新的Index, 這時(shí)新Index的mapping就會(huì)生效.
- alias的概念和mysql的視圖類似, 通過alias也能訪問的index的數(shù)據(jù), 這樣我們?cè)诖a中就不用修改index名字的了.