0覆致、背景:
解決了Elasticsearch聚類結(jié)果分頁(yè)的問題后的某一天霞扬,產(chǎn)品找到了我洼畅。
產(chǎn)品:這里需要加一個(gè)搜索功能胖替!明天和其他功能一起上線焊唬!
我:好的(wdnmd故痊,你是拉屎的時(shí)候突然來靈感了顶瞳?之前給原型的時(shí)候怎么沒有?)愕秫。
心里問候完產(chǎn)品后慨菱,開始思考怎么實(shí)現(xiàn)。
1戴甩、Terms aggregation之include
ES版本7.9.2
想獲取demo數(shù)據(jù)符喝,請(qǐng)點(diǎn)擊這篇文章
先看看之前分頁(yè)的解決辦法:
GET employees/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"myTerms": {
"terms": {
"field": "job.keyword",
"size": 10
},
"aggs": {
"myBucketSort": {
"bucket_sort": {
"from": 0,
"size": 5,
"gap_policy": "SKIP"
}
}
}
},
"termsCount": {
"cardinality": {
"field": "job.keyword",
"precision_threshold": 30000
}
}
}
}
這里是利用bucket_sort
來分頁(yè),cardinality
來獲取total甜孤。
在官方文檔里邊逛了一圈洲劣,發(fā)現(xiàn)terms aggregation有include好像可以解決這個(gè)問題闷愤,直接開始:
GET employees/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"myTerms": {
"terms": {
"include": ".*Programmer.*",
"field": "job.keyword",
"size": 10
},
"aggs": {
"myBucketSort": {
"bucket_sort": {
"from": 0,
"size": 5,
"gap_policy": "SKIP"
}
}
}
},
"termsCount": {
"cardinality": {
"field": "job.keyword",
"precision_threshold": 30000
}
}
}
}
include
:為字符串時(shí)支持正則表達(dá)式。為數(shù)組的時(shí)候支持多字段精確
過濾悲伶。
如:
...
"aggs": {
"myTerms": {
"terms": {
"include": ".*Programmer.*", #支持正則秕铛,但部分字符為保留字
"field": "job.keyword",
"size": 10
}
}
...
...
"aggs": {
"myTerms": {
"terms": {
"include": ["Programmer","DBA"], #支持多值,但是不支持正則
"field": "job.keyword",
"size": 10
}
}
正在暗爽的時(shí)候战惊,發(fā)現(xiàn)上面獲取到的total是不帶include過濾條件時(shí)的total流昏,不符合要求。
2吞获、cardinality之script
利用cardinality 的script來達(dá)到過濾效果:
GET employees/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"myTerms": {
"terms": {
"include": ".*Programmer.*",
"field": "job.keyword",
"size": 10
},
"aggs": {
"myBucketSort": {
"bucket_sort": {
"from": 0,
"size": 5,
"gap_policy": "SKIP"
}
}
}
},
"termsCount": {
"cardinality": {
"script": {
"source": """if(doc['job.keyword'].value.contains('Programmer')) {doc['job.keyword'].value }"""
},
"precision_threshold": 30000
}
}
}
}
對(duì)應(yīng)結(jié)果:
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 20,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"myTerms" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Java Programmer",
"doc_count" : 7
},
{
"key" : "Javascript Programmer",
"doc_count" : 4
}
]
},
"termsCount" : {
"value" : 2
}
}
}
3况凉、問題
關(guān)鍵字為英文時(shí)的大小寫問題,terms aggregations的include雖然支持正則各拷,但是正則中的(?i)不支持刁绒,所以大小寫敏感是個(gè)問題。
比如:關(guān)鍵詞為“ai”或者“Ai”想要檢索出來AI烤黍。當(dāng)然terms aggregations也可以用script如:
GET employees/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"myTerms": {
"terms": {
"script": {
"source": """if(doc['job.keyword'].value.contains('Programmer')) {doc['job.keyword'].value }"""
},
"size": 10
},
"aggs": {
"myBucketSort": {
"bucket_sort": {
"from": 0,
"size": 5,
"gap_policy": "SKIP"
}
}
}
},
"termsCount": {
"cardinality": {
"script": {
"source": """if(doc['job.keyword'].value.contains('Programmer')) {doc['job.keyword'].value }"""
},
"precision_threshold": 30000
}
}
}
}
雖然可以在if條件中知市,編寫滿足大寫或者小寫的條件,但類似Ai這樣的仍然不能滿足命中AI速蕊。
這個(gè)問題的其他解決辦法利用normalizer嫂丙,但仍會(huì)有缺陷。
本來就想大小寫敏感檢索规哲,可以忽略以上問題
4跟啤、總結(jié)
- 方法1:
GET employees/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"myTerms": {
"terms": {
"include": ".*programmer.*",
"field": "job.keyword",
"size": 10
},
"aggs": {
"myBucketSort": {
"bucket_sort": {
"from": 0,
"size": 5,
"gap_policy": "SKIP"
}
}
}
},
"termsCount": {
"cardinality": {
"script": {
"source": """if(doc['job.keyword'].value.contains('Programmer')) {doc['job.keyword'].value }"""
},
"precision_threshold": 30000
}
}
}
}
- 方法2:
GET employees/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"myTerms": {
"terms": {
"script": {
"source": """if(doc['job.keyword'].value.contains('Programmer')) {doc['job.keyword'].value }"""
},
"size": 10
},
"aggs": {
"myBucketSort": {
"bucket_sort": {
"from": 0,
"size": 5,
"gap_policy": "SKIP"
}
}
}
},
"termsCount": {
"cardinality": {
"script": {
"source": """if(doc['job.keyword'].value.contains('Programmer')) {doc['job.keyword'].value }"""
},
"precision_threshold": 30000
}
}
}
}
- 其他方法:利用normalizer或者analyzer,但同樣有弊端唉锌,如何實(shí)現(xiàn)可以google搜索一下隅肥。