1、創(chuàng)建索引终惑,并設(shè)計mapping
全拼和首拼需要分兩個字段。一開始想要用一個字段解決门扇,結(jié)果怎么弄都無法滿足需求雹有。
PUT aikg_test
{
"mappings": {
"properties": {
"name": {
"type": "keyword",
"fields": {
"full_pinyin": {
"type": "text",
"store": false,
"term_vector": "with_offsets",
"analyzer": "full_pinyin_analyzer",
"boost": 10
},
"first_pinyin": {
"type": "text",
"store": false,
"term_vector": "with_offsets",
"analyzer": "first_pinyin_analyzer",
"boost": 10
}
}
}
}
},
"settings": {
"analysis": {
"analyzer": {
"first_pinyin_analyzer": {
"tokenizer": "first_pinyin_letter"
},
"full_pinyin_analyzer": {
"tokenizer": "full_pinyin_letter"
}
},
"tokenizer": {
"first_pinyin_letter": {
"type": "pinyin",
"keep_first_letter": true,
"keep_full_pinyin": false,
"keep_none_chinese": false,
"keep_none_chinese_in_first_letter": true,
"none_chinese_pinyin_tokenize": false
},
"full_pinyin_letter": {
"type": "pinyin",
"keep_first_letter": false,
"keep_full_pinyin": false,
"keep_none_chinese": true,
"keep_none_chinese_in_first_letter": false,
"none_chinese_pinyin_tokenize": false,
"keep_joined_full_pinyin": true,
"keep_none_chinese_in_joined_full_pinyin": true
}
}
}
}
}
2、分詞例子
2.1臼寄、全拼分詞
具體參數(shù)設(shè)置:
"full_pinyin_letter": {
"type": "pinyin",
"keep_first_letter": false,
"keep_full_pinyin": false,
"keep_none_chinese": true,
"keep_none_chinese_in_first_letter": false,
"none_chinese_pinyin_tokenize": false,
"keep_joined_full_pinyin": true,
"keep_none_chinese_in_joined_full_pinyin": true
}
分詞:
GET aikg_test/_analyze
{
"text": ["劉德華at2016"],
"analyzer": "full_pinyin_analyzer"
}
分詞結(jié)果:
關(guān)鍵參數(shù):"keep_joined_full_pinyin": true
和 "keep_none_chinese_in_joined_full_pinyin": true
霸奕,前者保證漢字全拼連接在一起,后者保證漢字全拼和其他字符連在一起吉拳。注意參數(shù):"keep_full_pinyin": false
质帅。
2.2、首拼分詞
具體參數(shù)設(shè)置:
"first_pinyin_letter": {
"type": "pinyin",
"keep_first_letter": true,
"keep_full_pinyin": false,
"keep_none_chinese": true,
"keep_none_chinese_in_first_letter": true,
"none_chinese_pinyin_tokenize": false
}
分詞:
GET aikg_test/_analyze
{
"text": ["劉德華at2016"],
"analyzer": "first_pinyin_analyzer"
}
分詞結(jié)果:
關(guān)鍵參數(shù):"keep_none_chinese": false
,如果該值設(shè)置為 true临梗,“劉德華at2016”會拆分為兩個詞,其中非中文會分成一個詞稼跳。這種情況下輸入 at 前綴匹配盟庞,會查詢到該詞,而實際上該詞并不是以 at 開頭汤善。分詞結(jié)果如下圖:
當設(shè)置參數(shù)"keep_none_chinese_in_first_letter": true
什猖,就會把漢字首拼和其他字符連接在一起。
3红淡、大小寫問題
當參數(shù)為大寫“LDH”時不狮,無法匹配到劉德華幸缕。解決方法很簡單续室,在程序里把參數(shù)統(tǒng)一轉(zhuǎn)為小寫。
其他參數(shù)詳見:GitHub拼音分詞插件 elasticsearch-analysis-pinyin