問題描述
電商網(wǎng)站的搜索是最基礎(chǔ)最重要的功能之一,搜索框上面的良好體驗(yàn)?zāi)転殡娚處砀叩氖找娣梗覀兿葋砜纯刺詫氁鲵尽⒕〇|宋梧、亞馬遜網(wǎng)站的搜索建議。
在淘寶的搜索框輸入【衛(wèi)衣】時(shí)狰挡,下方的搜索建議包括建議詞以及相關(guān)的標(biāo)簽:
在京東的搜索框輸入【衛(wèi)衣】時(shí)捂龄,下方搜索建議右方顯示建議詞關(guān)聯(lián)的商品數(shù)量:
在亞馬遜的搜索框輸入【衛(wèi)衣】時(shí),搜索建議上部分能支持在特定的分類下進(jìn)行搜索:
通過上述對(duì)比可以看出加叁,不同的電商對(duì)于搜索建議的側(cè)重點(diǎn)略有不同倦沧,但核心的問題包括:
- 建議詞的來源可以是商品的分類名稱、品牌名稱它匕、商品屬性展融、商品名稱的高頻詞、熱搜詞豫柬,也可以是一些組合詞告希,比如“分類 + 性別”和“分類 + 屬性”扑浸,還可以是一些自定義添加的詞;
- 建議詞維護(hù)的時(shí)候需要考慮去重暂雹,比如“Nike”和“nike”應(yīng)該是相同的首装;
關(guān)鍵詞索引映射:
curl -XPUT http://192.168.138.210:9200/keyword/ -d'
{
"settings" : {
"analysis" : {
"analyzer" : {
"first_py_letter_analyzer" : {
"tokenizer" : "first_py_letter",
"filter":"edgeNGram_filter"
},
"full_pinyin_letter_analyzer" : {
"tokenizer" : "full_pinyin_letter",
"filter":"edgeNGram_filter"
},
"edgeNGram_analyzer":{
"tokenizer" : "edgeNGram_tokenizer"
}
},
"tokenizer" : {
"first_py_letter" : {
"type" : "pinyin",
"keep_first_letter" : true,
"keep_full_pinyin" : false,
"keep_original" : false,
"limit_first_letter_length" : 16,
"lowercase" : true,
"trim_whitespace" : true,
"keep_none_chinese_in_first_letter": false,
"none_chinese_pinyin_tokenize": false,
"keep_none_chinese": true,
"keep_none_chinese_in_joined_full_pinyin": true
},
"full_pinyin_letter" : {
"type": "pinyin",
"keep_separate_first_letter": false,
"keep_full_pinyin": false,
"keep_original": false,
"limit_first_letter_length": 16,
"lowercase": true,
"keep_first_letter": false,
"keep_none_chinese_in_first_letter": false,
"none_chinese_pinyin_tokenize": false,
"keep_none_chinese": true,
"keep_joined_full_pinyin": true,
"keep_none_chinese_in_joined_full_pinyin": true
},
"edgeNGram_tokenizer":{
"type": "edgeNGram",
"min_gram": 1,
"max_gram": 15,
"token_chars": ["letter", "digit"]
}
},
"filter":{
"edgeNGram_filter":{
"type": "edgeNGram",
"min_gram": 1,
"max_gram": 50,
"token_chars": ["letter", "digit"]
}
}
},
"number_of_shards": 5,
"number_of_replicas": 1
},
"mappings":{
"suggestion":{
"properties": {
"keyword": {
"type": "keyword",
"fields": {
"keyword_ik": {
"type": "text",
"analyzer": "edgeNGram_analyzer"
},
"keyword_pinyin": {
"type": "text",
"analyzer": "full_pinyin_letter_analyzer"
},
"keyword_first_py": {
"type": "text",
"analyzer": "first_py_letter_analyzer"
}
}
},
"count": {
"type": "long",
"index": "not_analyzed"
},
"weight": {
"type": "integer",
"index": "not_analyzed"
}
}
}
}
}'
搜索語句:
{
"sort": [
{
"weight": "desc"
},
{
"count": "desc"
},
{
"_score": "desc"
}
],
"query": {
"dis_max": {
"queries": [
{
"match_phrase": {
"keyword.keyword_ik": {
"query": "衛(wèi)衣"
}
}
},
{
"match_phrase": {
"keyword.keyword_pinyin": {
"query": "衛(wèi)衣",
"boost": 2
}
}
}
],
"tie_breaker": 1
}
}
}
如果Elasticsearch返回的是空結(jié)果,此時(shí)應(yīng)該需要增加拼寫糾錯(cuò)的處理(拼寫糾錯(cuò)也可以在調(diào)用Elasticsearch搜索的時(shí)候帶上杭跪,但是通常情況下用戶并沒有拼寫錯(cuò)誤仙逻,所以建議還是在后面單獨(dú)調(diào)用suggester);如果返回的suggest不為空涧尿,則根據(jù)新的詞調(diào)用建議詞服務(wù)系奉;比如用戶輸入了【adidss】,調(diào)用Elasticsearch的suggester獲取到的結(jié)果是【adidas】姑廉,則再根據(jù)adidas進(jìn)行搜索建議詞處理缺亮。
{
"suggest": {
"keyword_suggestion": {
"text": "adidss",
"phrase": {
"field": "keyword.keyword_pinyin",
"size": 1,
"analyzer":"standard"
}
}
}
}
關(guān)于排序:在我們的實(shí)現(xiàn)里面是通過weight和count進(jìn)行排序的,weight目前只考慮了建議詞的類型(比如分類 > 品牌 > 屬性)桥言;
下面為測(cè)試數(shù)據(jù):
下面輸入錯(cuò)誤的關(guān)鍵詞“adidss”萌踱,獲得的結(jié)果是:
到此搜索建議基本上完成,但我估計(jì)大家會(huì)有個(gè)疑惑号阿,因?yàn)獒槍?duì)自動(dòng)補(bǔ)全并鸵,elasticsearch
提供了 completion suggestion
來解決,但為什么我沒用呢扔涧?主要是 completion suggestion
不支持按字段來排序园担,比如我這需求就是要按關(guān)鍵詞權(quán)重(weight)和搜索次數(shù)(count)來排序的