最近在業(yè)務(wù)中需要使用ES來進(jìn)行數(shù)據(jù)查詢绰播,在某些場景下需要對數(shù)據(jù)進(jìn)行去重,以及去重后的統(tǒng)計尚困。為了方便大家理解蠢箩,特意從SQL角度,方便大家能夠理解ES查詢語句事甜。
1 - distinct
SELECT DISTINCT(user_id) FROM table WHERE user_id_type = 3;
{
"query": {
"term": {
"user_id_type": 3
}
},
"collapse": {
"field": "user_id"
}
}
{
...
"hits": {
"hits": [
{
"_index": "es_qd_mkt_visitor_packet_dev_v1_20180621",
"_type": "ad_crowd",
"_source": {
"user_id": "wx2af8414b502d4ca2_oHtrD0Vxv-_8c678figJNHmtaVQQ",
"user_id_type": 3
},
"fields": {
"user_id": [
"wx2af8414b502d4ca2_oHtrD0Vxv-_8c678figJNHmtaVQQ"
]
}
}
]
}
}
總結(jié):使用collapse字段后谬泌,查詢結(jié)果中[hits]中會出現(xiàn)[fields]字段,其中包含了去重后的user_id
2 - count + distinct
SELECT COUNT(DISTINCT(user_id)) FROM table WHERE user_id_type = 3;
{
"query": {
"term": {
"user_id_type": 3
}
},
"aggs": {
"count": {
"cardinality": {
"field": "user_id"
}
}
}
}
{
...
"hits": {
...
},
"aggregations": {
"count": {
"value": 121
}
}
}
總結(jié):aggs中cardinality的字段代表需要distinct的字段
3 - count + group by
SELECT COUNT(user_id) FROM table GROUP BY user_id_type;
{
"aggs": {
"user_type": {
"terms": {
"field": "user_id_type"
}
}
}
}
{
...
"hits": {
...
},
"aggregations": {
"user_type": {
...
"buckets": [
{
"key": 4,
"doc_count": 1220
},
{
"key": 3,
"doc_count": 488
}
]
}
}
}
總結(jié):aggs中terms的字段代表需要gruop by的字段
4 - count + distinct + group by
SELECT COUNT(DISTINCT(user_id)) FROM table GROUP BY user_id_type;
{
"aggs": {
"user_type": {
"terms": {
"field": "user_id_type"
},
"aggs": {
"count": {
"cardinality": {
"field": "user_id"
}
}
}
}
}
}
{
...
"hits": {
...
},
"aggregations": {
"user_type": {
...
"buckets": [
{
"key": 4,
"doc_count": 1220, //去重前數(shù)據(jù)1220條
"count": {
"value": 276 //去重后數(shù)據(jù)276條
}
},
{
"key": 3,
"doc_count": 488, //去重前數(shù)據(jù)488條
"count": {
"value": 121 //去重后數(shù)據(jù)121條
}
}
]
}
}
}
5 - count + distinct + group by + where
SELECT COUNT(DISTINCT(user_id)) FROM table WHERE user_id_type = 2 GROUP BY user_id;
總結(jié):對于既有group by又有distinct的查詢要求逻谦,需要在aggs中嵌套子aggs
6 - 注意事項
collapse關(guān)鍵字
- 折疊功能ES5.3版本之后才發(fā)布的掌实。
- 聚合&折疊只能針對keyword類型有效;