導語
比如用戶索索“住宿”逊桦,但是沒有一家酒店的名字中包含住宿柜思,然后給用戶返回了一個空白頁。除了給“住宿”添加同義詞“酒店”外 写隶,這個問題還可以這樣解決:為每一個 catetory_id 維護一組關鍵詞玫氢,這組關系維護在一個 Map 中帚屉,當一個 keyword 進入 Service 層后,先 Analyze 它漾峡,得到的 token 去 Map 中查找對應的 category_id涮阔,然后可以將這個 category_id 放在 Query 里影響召回,也可以放在 filter 中影響排序灰殴。一般的做法是:先影響排序敬特,如果得不到結果,再影響召回牺陶;
category_id 和關鍵詞的映射
// category_id 下的所有關鍵詞
private Map<Integer,List<String>> categoryWorkMap = new HashMap<>();
@PostConstruct
public void init(){
categoryWorkMap.put(1,new ArrayList<>());
categoryWorkMap.put(2,new ArrayList<>());
categoryWorkMap.get(1).add("吃飯");
categoryWorkMap.get(1).add("下午茶");
categoryWorkMap.get(2).add("休息");
categoryWorkMap.get(2).add("睡覺");
categoryWorkMap.get(2).add("住宿");
}
查詢 keyword 對應的 category_id
/**
* GET /shop/_analyze
* {
* "field": "name",
* "text": "凱悅"
* }
* 先分詞 Keyword伟阔,再看分詞后的每個 token 是否對應了某個 category_id,最后將每個 token 對應哪個 category_id 返回出來掰伸;
* @param keyword
* @return
* @throws IOException
*/
private Map<String,Object> analyzeCategoryKeyword(String keyword) throws IOException {
Map<String,Object> res = new HashMap<>();
Request request = new Request("GET","/shop/_analyze");
request.setJsonEntity("{" + " \"field\": \"name\"," + " \"text\":\"" + keyword + "\"\n" + "}");
Response response = highLevelClient.getLowLevelClient().performRequest(request);
String responseStr = EntityUtils.toString(response.getEntity());
JSONObject jsonObject = JSONObject.parseObject(responseStr);
JSONArray jsonArray = jsonObject.getJSONArray("tokens");
for(int i = 0; i < jsonArray.size(); i++){
String token = jsonArray.getJSONObject(i).getString("token");
Integer categoryId = getCategoryIdByToken(token);
if(categoryId != null){
res.put(token, categoryId);
}
}
return res;
}
private Integer getCategoryIdByToken(String token){
for(Integer key : categoryWorkMap.keySet()){
List<String> tokenList = categoryWorkMap.get(key);
if(tokenList.contains(token)){
return key;
}
}
return null;
}
影響召回的 Query DSL
- 使用 bool 查詢的 should 子句皱炉;
GET /shop/_search
{
"_source": "*",
"script_fields": {
"distance": {
"script": {
"source": "haversin(lat,lon,doc['location'].lat,doc['location'].lon)",
"lang": "expression",
"params": {"lat":31.23916171,"lon":121.48789949}
}
}
},
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{"match": {"name": {"query": "住宿","boost": 0.1}}},
{"term":{"category_id":2}}
]
}
},
{"term": {"seller_disabled_flag": 0}}
]
}
},
"functions": [
{
"gauss": {
"location": {
"origin": "31.23916171,121.48789949",
"scale": "100km",
"offset": "0km",
"decay": 0.5
}
},
"weight": 9
},
{
"field_value_factor": {
"field": "remark_score"
},
"weight": 0.2
},
{
"field_value_factor": {
"field": "seller_remark_score"
},
"weight": 0.1
}
],
"score_mode": "sum",
"boost_mode": "sum"
}
},
"sort": [
{
"_score": {
"order":"desc"
}
}
]
}
影響排序的 Query DSL
- 在 Function Query 的 Function 中添加一個 filer:
GET /shop/_search
{
"_source": "*",
"script_fields": {
"distance": {
"script": {
"source": "haversin(lat,lon,doc['location'].lat,doc['location'].lon)",
"lang": "expression",
"params": {"lat":31.23916171,"lon":121.48789949}
}
}
},
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{"match": {"name": {"query": "住宿","boost": 0.1}}}
]
}
},
{"term": {"seller_disabled_flag": 0}}
]
}
},
"functions": [
{
"gauss": {
"location": {
"origin": "31.23916171,121.48789949",
"scale": "100km",
"offset": "0km",
"decay": 0.5
}
},
"weight": 9
},
{
"field_value_factor": {
"field": "remark_score"
},
"weight": 0.2
},
{
"field_value_factor": {
"field": "seller_remark_score"
},
"weight": 0.1
},
{
"filter": {"term":{"category_id": 2}},
"weight": 0.2
}
],
"score_mode": "sum",
"boost_mode": "sum"
}
},
"sort": [
{
"_score": {
"order":"desc"
}
}
]
}