ES提供了強大的聚合分析功能,按照操作上細化绍弟,可以主要分為四種锐锣,如下表所示:
聚合方式 | 解釋 |
---|---|
Bucket Aggregation | 一些滿足特定條件的文檔的集合 |
Metric Aggregation | 一些數(shù)學(xué)計算腌闯,可以對文檔字段統(tǒng)計分析 |
Pipeline Aggregation | 對其他的聚合結(jié)果進行二次聚合 |
Metrix Aggregation | 支持對多個字段的操作并提供一個結(jié)果矩陣 |
???? 在我個人看來這些只是理論意義上的細化,在實際的應(yīng)用過程中雕憔,我們并沒有說針對那種場景使用那種聚合分析姿骏。都是為了滿足我們的業(yè)務(wù),在實現(xiàn)的過程中同時會使用到多種聚合的方式斤彼。
一. 四種聚合方式
1.1 Bucket(分桶)
???? 分桶就是將具有某一類共同特征的數(shù)據(jù)歸為一類分瘦,然后求其總數(shù),例如: 男女琉苇、公司同一工作崗位的員工嘲玫、商品高中低檔等。在對數(shù)據(jù)分桶后還可以進一步分桶并扇,例如:0~ 20歲男性去团、21~50歲男性、50歲以上男性穷蛹;同一工作崗位男性土陪、女性;高檔商品好評肴熏、中評鬼雀、差評的商品。
1.2 Metric(計算)
???? 計算具有一類特征的數(shù)據(jù)的統(tǒng)計值蛙吏,例如平均值取刃、最大值蹋肮、最小值等。
1.3 Pipeline(管道)
???? pipeline與Linux操作系統(tǒng)中的管道操作(將上一步操作的結(jié)果作為下一步操作的數(shù)據(jù)源)類似璧疗。即將上一次聚合操作的結(jié)果作為下一次聚合操作的數(shù)據(jù)源坯辩。
1.4 Metrix(矩陣)
???? 矩陣就是同時可以支持多值的輸出,例如對分桶的數(shù)據(jù)同時求平均崩侠、最大漆魔、最小值;
二. 具體的案例
???? 在說具體的案例的時候筆者并不會嚴(yán)格的去按照四種聚合方式去講解却音。首先在ES中插入一批的測試數(shù)據(jù)改抡,在插入測試數(shù)據(jù)之前先定義mapping.
2.1 mapping的定義
PUT employee
{
"mappings": {
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": "keyword"
},
"job": {
"type": "keyword"
},
"age": {
"type": "integer"
},
"gender": {
"type": "keyword"
}
}
}
}
2.2 插入數(shù)據(jù)
PUT employee/_bulk
{"index": {"_id": 1}}
{"id": 1, "name": "Bob", "job": "java", "age": 21, "sal": 8000, "gender": "male"}
{"index": {"_id": 2}}
{"id": 2, "name": "Rod", "job": "html", "age": 31, "sal": 18000, "gender": "female"}
{"index": {"_id": 3}}
{"id": 3, "name": "Gaving", "job": "java", "age": 24, "sal": 12000, "gender": "male"}
{"index": {"_id": 4}}
{"id": 4, "name": "King", "job": "dba", "age": 26, "sal": 15000, "gender": "female"}
{"index": {"_id": 5}}
{"id": 5, "name": "Jonhson", "job": "dba", "age": 29, "sal": 16000, "gender": "male"}
{"index": {"_id": 6}}
{"id": 6, "name": "Douge", "job": "java", "age": 41, "sal": 20000, "gender": "female"}
{"index": {"_id": 7}}
{"id": 7, "name": "cutting", "job": "dba", "age": 27, "sal": 7000, "gender": "male"}
{"index": {"_id": 8}}
{"id": 8, "name": "Bona", "job": "html", "age": 22, "sal": 14000, "gender": "female"}
{"index": {"_id": 9}}
{"id": 9, "name": "Shyon", "job": "dba", "age": 20, "sal": 19000, "gender": "female"}
{"index": {"_id": 10}}
{"id": 10, "name": "James", "job": "html", "age": 18, "sal": 22000, "gender": "male"}
{"index": {"_id": 11}}
{"id": 11, "name": "Golsling", "job": "java", "age": 32, "sal": 23000, "gender": "female"}
{"index": {"_id": 12}}
{"id": 12, "name": "Lily", "job": "java", "age": 24, "sal": 2000, "gender": "male"}
{"index": {"_id": 13}}
{"id": 13, "name": "Jack", "job": "html", "age": 23, "sal": 3000, "gender": "female"}
{"index": {"_id": 14}}
{"id": 14, "name": "Rose", "job": "java", "age": 36, "sal": 6000, "gender": "female"}
{"index": {"_id": 15}}
{"id": 15, "name": "Will", "job": "dba", "age": 38, "sal": 4500, "gender": "male"}
{"index": {"_id": 16}}
{"id": 16, "name": "smith", "job": "java", "age": 32, "sal": 23000, "gender": "male"}
數(shù)據(jù)說明:插入的數(shù)據(jù)為員工信息,name是員工的姓名系瓢,job是員工的工種阿纤,age為員工的年齡,sal為員工的薪水夷陋,gender為員工的性別
2.3 聚合查詢
查詢工種的數(shù)量
GET employee/_search
{
"size": 0,
"aggs": {
"job_category_count": {
"cardinality": {
"field": "job"
}
}
}
}
查詢每個工種的分桶信息
GET employee/_search
{
"size": 0,
"aggs": {
"job_category_num": {
"cardinality": {
"field": "job"
}
}
}
}
查詢不同工種的員工的數(shù)量欠拾,并查詢每個工種最大年齡的員工信息。
GET employee/_search
{
"size": 0,
"aggs": {
"job_analysis": {
"terms": {
"field": "job"
},
"aggs": {
"age_top_1": {
"top_hits": {
"size": 1,
"sort": [
{
"age": {
"order": "desc"
}
}
]
}
}
}
}
}
}
查詢工資范圍在 0~5000, 5001~8000, 8001~12000, 12001~18000, 18001+ 員工的人數(shù)
GET employee/_search
{
"size": 0,
"aggs": {
"sal_range_info": {
"range": {
"field": "sal",
"ranges": [
{
"to": 5000
},
{
"from": 5001,
"to": 8000
},
{
"from": 8001,
"to": 12000
},
{
"from": 12001,
"to": 18000
},
{
"from": 18001
}
]
}
}
}
}
以每5000為一個區(qū)間骗绕,查詢工資在對應(yīng)范圍內(nèi)的員工的數(shù)量
GET employee/_search
{
"size": 0,
"aggs": {
"sal_histogram": {
"histogram": {
"field": "sal",
"interval": 5000,
"extended_bounds": {
"min": 0,
"max": 25000
}
}
}
}
}
查詢每個工種的數(shù)量藐窄,以及不同工種的工資統(tǒng)計信息
GET employee/_search
{
"size": 0,
"aggs": {
"job_and_salary_info": {
"terms": {
"field": "job"
},
"aggs": {
"sal_info": {
"stats": {
"field": "sal"
}
}
}
}
}
}
不同工種下男女員工的數(shù)量,以及男女員工的薪資信息
GET employee/_search
{
"size": 0,
"aggs": {
"job_gender_sal_info": {
"terms": {
"field": "job"
},
"aggs": {
"gender_info": {
"terms": {
"field": "gender"
},
"aggs": {
"sal_info": {
"stats": {
"field": "sal"
}
}
}
}
}
}
}
}
查詢平均工資最低的部門的平均工資酬土,以及最低工資荆忍。
GET employee/_search
{
"size": 0,
"aggs": {
"jobs": {
"terms": {
"field": "job"
},
"aggs": {
"sal_info": {
"avg": {
"field": "sal"
}
}
}
},
"min_avg_sal": {
"max_bucket": {
"buckets_path": "jobs>sal_info"
}
}
}
}
三. ES自帶航空數(shù)據(jù)案例
查詢到達各目的地的航班的數(shù)量
GET kibana_sample_data_flights/_search
{
"size": 0,
"aggs": {
"dest_info": {
"terms": {
"field": "DestCountry"
}
}
}
}
查詢到達各航班的的數(shù)量撤缴,以及票價的最大值刹枉,平均值。
GET kibana_sample_data_flights/_search
{
"size": 0,
"aggs": {
"dest_info": {
"terms": {
"field": "DestCountry"
},
"aggs": {
"max_ticket_price": {
"max": {
"field": "AvgTicketPrice"
}
},
"avg_ticket_price": {
"avg": {
"field": "AvgTicketPrice"
}
}
}
}
}
}
查詢到達各航班的的數(shù)量屈呕,以及票價的聚合信息以及天氣的基本信息微宝。
GET kibana_sample_data_flights/_search
{
"size": 0,
"aggs": {
"dest_info": {
"terms": {
"field": "DestCountry"
},
"aggs": {
"ticket_info": {
"stats": {
"field": "AvgTicketPrice"
}
},
"weather_info": {
"terms": {
"field": "DestWeather"
}
}
}
}
}
}
鳴謝
全文幾乎沒有自己原創(chuàng)的內(nèi)容,只是對極客時間中阮一鳴
老師的 Elasticsearch核心技術(shù)與實戰(zhàn) 自己稍稍的做了下總結(jié)凉袱。