一簿训、四種Suggester
介紹
Suggesters
基本的運(yùn)作原理是將輸入的文本分解為token
闰渔,然后在索引的字典里查找相似的term
并返回晰房。 根據(jù)使用場景的不同滨嘱,Elasticsearch
里設(shè)計(jì)了4種類別的Suggester
,分別是:
?Term Suggester
?Completion Suggester
?Phrase Suggester
?Context Suggester
二灿椅、四個(gè)Suggester
比較[1]
Term Suggester
——基于編輯距離算法實(shí)現(xiàn)套蒂。在提供建議之前,對輸入的文本進(jìn)行分析
Phrase suggester
——在 term suggester
之上添加額外的邏輯以選擇整個(gè)經(jīng)校正的短語茫蛹,而不是基于 ngram-language
模型加權(quán)的單個(gè) token
Completion Suggester
——只能用于前綴查詢操刀,速度很快,性能要求高
?需求場景是:輸入一個(gè)字符麻惶,即時(shí)發(fā)送一個(gè)請求查詢匹配項(xiàng)?數(shù)據(jù)結(jié)構(gòu):并非是倒排索引實(shí)現(xiàn)的馍刮,而是將分詞的數(shù)據(jù)編碼成FST
和索引一起存放;FST
會(huì)被加載進(jìn)內(nèi)存窃蹋,速度很快?限制:需要對查詢字段指定為Completion
Context Suggester
——可以通過篩選提供建議,context
支持兩種類型静稻,分別是category
(任意字符串)警没,geo
(地理位置信息)
準(zhǔn)確度:completion > phrase > term
三、Completion Suggester Mapping
的設(shè)置
因?yàn)?code>Completion Suggester的搜索補(bǔ)全和搜索提示是要求查詢的字段type
是Completion
類型的振湾。所以在定義Mapping
時(shí)候需要將被查詢的字段type
定義為completion
類型杀迹。查詢的Mapping
如下:
PUT document {
"mappings": {
"properties": {
"id": {
"type": "keyword"
},
"doc_name": {
"type": "completion",
"analyzer": "ik_max_word"
},
"doc_number": {
"type": "text",
"analyzer": "ik_max_word"
},
"doc_type": {
"type": "text",
"analyzer": "ik_max_word"
},
"keywords": {
"type": "completion",
"analyzer": "ik_max_word"
},
"pubdate": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
},
"attachment": {
"properties": {
"content": {
"type": "text",
"analyzer": "ik_max_word"
}
}
}
}
}
}
因?yàn)樾枰M(jìn)行全文檢索添加了attachment
的內(nèi)容
四、TransportClient
和REST client
的區(qū)別[2]
Elasticsearch
計(jì)劃在Elasticsearch 7.0
中棄用TransportClient
押搪,
在8.0中完全刪除它树酪。相反,您應(yīng)該使用Java
高級REST client
大州,rest client
執(zhí)行HTTP
請求來執(zhí)行操作续语,無需再序列化的Java
請求。
TransportClient 是ElasticSearch(java)客戶端封裝對象厦画,使用transport模塊遠(yuǎn)程連接到Elasticsearch集群疮茄,該transport node并不會(huì)加入集群滥朱,而是簡單的向ElasticSearch集群上的節(jié)點(diǎn)發(fā)送請求。transport node使用輪詢機(jī)制進(jìn)行集群內(nèi)的節(jié)點(diǎn)進(jìn)行負(fù)載均衡力试,盡管大多數(shù)操作(請求)可能是“兩跳操作”徙邻。(圖片來源于Elasticsearch權(quán)威指南)
Java REST
客戶端有兩種風(fēng)格:
?Java Low Level REST Client
:elasticsearch client
低級別客戶端。它允許通過http
請求與Elasticsearch
集群進(jìn)行通信畸裳。API
本身不負(fù)責(zé)數(shù)據(jù)的編碼解碼缰犁,由用戶去編碼解碼。它與所有的ElasticSearch
版本兼容怖糊。
?Java High Level REST Client
:Elasticsearch client
官方高級客戶端民鼓。基于低級客戶端蓬抄,它定義的API
,已經(jīng)對請求與響應(yīng)數(shù)據(jù)包進(jìn)行編碼解碼丰嘉。
五、基于ElasticSearch Java REST Client API的自動(dòng)補(bǔ)全
/**
* @param suggestField 查詢搜索補(bǔ)全的字段
* @param suggestValue 查詢搜索補(bǔ)全的值
* @return 返回搜索補(bǔ)全list
* @throws IOException IO異常
*/
public List<String> suggestSearchList(String suggestField, String suggestValue) throws IOException {
/**
* ElasticSearch 7.X版本以上 不在使用TransportClient進(jìn)行客戶端連接 所以使用client進(jìn)行連接客戶端無法進(jìn)行使用
* 7.X版本將搜索補(bǔ)全(completion)合并到SuggestBuilders中進(jìn)行使用嚷缭,在SuggestBuilders中構(gòu)建completionSuggestion搜索參數(shù)
*/
// 構(gòu)建SearchRequest饮亏、SearchSourceBuilder 指定查詢的庫
// SearchRequest searchRequest = new SearchRequest(ESConst.ES_INDEX);
SearchRequest searchRequest = new SearchRequest("testdata");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
// 控制顯示內(nèi)容 (優(yōu)化查詢效率將所有無關(guān)查詢提示字段都不顯示)
String[] excludeFields = new String[] {"doc_number","doc_type","attachment","doc_keywords","id","pubdate","doc_name"};
String[] includeFields = new String[] {""};
searchSourceBuilder.fetchSource(includeFields, excludeFields);
// 構(gòu)建completionSuggestionBuilder傳入查詢的參數(shù)
CompletionSuggestionBuilder completionSuggestionBuilder = SuggestBuilders.completionSuggestion(suggestField).prefix(suggestValue).size(10);
SuggestBuilder suggestBuilder = new SuggestBuilder();
// 定義查詢的suggest名稱
suggestBuilder.addSuggestion(suggestField+"_suggest", completionSuggestionBuilder);
searchSourceBuilder.suggest(suggestBuilder);
searchRequest.source(searchSourceBuilder);
// 執(zhí)行查詢
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
// 獲取查詢的結(jié)果
Suggest suggest = searchResponse.getSuggest();
Set<String> suggestSet = new HashSet<>();
int maxSuggest = 0;
if (suggest != null) {
// 獲取Suggestion的結(jié)果
Suggest.Suggestion result = suggest.getSuggestion(suggestField+"_suggest");
// 遍歷獲得查詢結(jié)果的Text
for (Object term : result.getEntries()) {
if (term instanceof CompletionSuggestion.Entry) {
CompletionSuggestion.Entry item = (CompletionSuggestion.Entry) term;
if (!item.getOptions().isEmpty()) {
// 若item的option不為空,循環(huán)遍歷
for (CompletionSuggestion.Entry.Option option : item.getOptions()) {
String tip = option.getText().toString();
if (!suggestSet.contains(tip)) {
suggestSet.add(tip);
++maxSuggest;
}
}
}
}
if (maxSuggest >= 10) {
break;
}
}
}
return Arrays.asList(suggestSet.toArray(new String[]{}));
}
代碼思路:
1、首先實(shí)例化構(gòu)建SearchRequest
和SearchSourceBuilder
阅爽,查詢document
文檔路幸;
2、控制查詢顯示的內(nèi)容付翁,使用searchSourceBuilder.fetchSource
控制excludeFields
和includeFields
(無關(guān)的要素不進(jìn)行查詢)简肴;
3、構(gòu)建completionSuggestionBuilder
百侧,以參數(shù)形式傳入suggestField
和suggestValue
砰识,默認(rèn)設(shè)置size
為10;
4佣渴、定義查詢的suggest_name
辫狼,通過suggestBuilder.addSuggestion
進(jìn)行添加;
5辛润、執(zhí)行查詢膨处,searchResponse.getSuggest
獲得查詢的結(jié)果;
6砂竖、遍歷獲得Suggest
中的text
真椿,輸出傳入list
返回給前端。
六乎澄、實(shí)現(xiàn)效果截圖
var code = "22f45cca-2cf5-490b-bffa-c99574616258"
References
[1]
四個(gè)Suggester
比較: http://www.reibang.com/p/34db35d13cd3
[2]
TransportClient
和REST client
的區(qū)別: https://blog.csdn.net/prestigeding/article/details/83188043