聚合在Elasticsearch中的使用
系統(tǒng)中使用的ES環(huán)境不一定每篇文章都有慢叨,但是可以在合集中找到,關(guān)注《醉魚(yú)Java》一起進(jìn)步
環(huán)境
- elasticsearch 8.1
搭建
version: '3.8'
services:
cerebro:
image: lmenezes/cerebro:0.8.3
container_name: cerebro
ports:
- "9000:9000"
command:
- -Dhosts.0.host=http://eshot:9200
networks:
- elastic
kibana:
image: docker.elastic.co/kibana/kibana:8.1.3
container_name: kibana
environment:
- I18N_LOCALE=zh-CN
- XPACK_GRAPH_ENABLED=true
- TIMELION_ENABLED=true
- XPACK_MONITORING_COLLECTION_ENABLED="true"
- ELASTICSEARCH_HOSTS=http://eshot:9200
- server.publicBaseUrl=http://192.168.160.234:5601
ports:
- "5601:5601"
networks:
- elastic
eshot:
image: elasticsearch:8.1.3
container_name: eshot
environment:
- node.name=eshot
- cluster.name=es-docker-cluster
- discovery.seed_hosts=eshot,eswarm,escold
- cluster.initial_master_nodes=eshot,eswarm,escold
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- xpack.security.enabled=false
- node.attr.node_type=hot
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- D:\zuiyuftp\docker\es8.1\eshot\data:/usr/share/elasticsearch/data
- D:\zuiyuftp\docker\es8.1\eshot\logs:/usr/share/elasticsearch/logs
- D:\zuiyuftp\docker\es8.1\eshot\plugins:/usr/share/elasticsearch/plugins
ports:
- 9200:9200
networks:
- elastic
eswarm:
image: elasticsearch:8.1.3
container_name: eswarm
environment:
- node.name=eswarm
- cluster.name=es-docker-cluster
- discovery.seed_hosts=eshot,eswarm,escold
- cluster.initial_master_nodes=eshot,eswarm,escold
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- xpack.security.enabled=false
- node.attr.node_type=warm
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- D:\zuiyuftp\docker\es8.1\eswarm\data:/usr/share/elasticsearch/data
- D:\zuiyuftp\docker\es8.1\eswarm\logs:/usr/share/elasticsearch/logs
- D:\zuiyuftp\docker\es8.1\eshot\plugins:/usr/share/elasticsearch/plugins
networks:
- elastic
escold:
image: elasticsearch:8.1.3
container_name: escold
environment:
- node.name=escold
- cluster.name=es-docker-cluster
- discovery.seed_hosts=eshot,eswarm,escold
- cluster.initial_master_nodes=eshot,eswarm,escold
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- xpack.security.enabled=false
- node.attr.node_type=cold
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- D:\zuiyuftp\docker\es8.1\escold\data:/usr/share/elasticsearch/data
- D:\zuiyuftp\docker\es8.1\escold\logs:/usr/share/elasticsearch/logs
- D:\zuiyuftp\docker\es8.1\eshot\plugins:/usr/share/elasticsearch/plugins
networks:
- elastic
# volumes:
# eshotdata:
# driver: local
# eswarmdata:
# driver: local
# escolddata:
# driver: local
networks:
elastic:
driver: bridge
什么是聚合?
在Elasticsearch中,聚合是一種功能強(qiáng)大的數(shù)據(jù)處理技術(shù),它允許我們對(duì)索引中的數(shù)據(jù)進(jìn)行多種計(jì)算和分析操作。聚合可以理解為對(duì)數(shù)據(jù)集進(jìn)行分組躺率,并在每個(gè)分組上執(zhí)行各種指標(biāo)計(jì)算,類(lèi)似于SQL中的GROUP BY和聚合函數(shù)万矾。
示例數(shù)據(jù)
為了驗(yàn)證聚合功能悼吱,我們將使用一個(gè)示例數(shù)據(jù)集,假設(shè)我們有一個(gè)存儲(chǔ)了商品信息的索引良狈,包含以下字段:
-
product_name
:商品名稱(chēng) -
category
:商品分類(lèi) -
price
:商品價(jià)格 -
quantity
:商品數(shù)量 -
manufacturer
:制造商 -
timestamp
:記錄時(shí)間戳
下面我們導(dǎo)入測(cè)試數(shù)據(jù)
創(chuàng)建索引
PUT /zfc-doc-000001
{
"settings": {
"index":{
"number_of_shards":3,
"number_of_replicas":2
}
},
"mappings": {
"properties": {
"product_name":{
"type":"keyword"
},
"category":{
"type":"keyword"
},
"price":{
"type": "integer"
},
"quantity":{
"type": "integer"
},
"manufacturer":{
"type": "keyword"
},
"timestamp":{
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
}
}
}
}
添加數(shù)據(jù)
PUT _bulk
{"index":{"_index":"zfc-doc-000002","_id":"1"}}
{"product_name": "iPhone 12","category": "Electronics","price": 999,"quantity": 50,"manufacturer": "Apple","timestamp": "2023-07-24 10:00:00"}
{"index":{"_index":"zfc-doc-000002","_id":"2"}}
{"product_name": "Samsung Galaxy S21","category": "Electronics","price": 799,"quantity": 30,"manufacturer": "Samsung","timestamp": "2023-07-24 11:30:00"}
{"index":{"_index":"zfc-doc-000002","_id":"3"}}
{"product_name": "Sony Bravia 65-inch TV","category": "Electronics","price": 1499,"quantity": 20,"manufacturer": "Sony","timestamp": "2023-07-24 13:15:00"}
{"index":{"_index":"zfc-doc-000002","_id":"4"}}
{"product_name": "HP Spectre x360","category": "Electronics","price": 1299,"quantity": 25,"manufacturer": "HP","timestamp": "2023-07-24 15:45:00"}
{"index":{"_index":"zfc-doc-000002","_id":"5"}}
{"product_name": "Dell XPS 15", "category": "Electronics","price": 1399,"quantity": 15,"manufacturer": "Dell","timestamp": "2023-07-24 17:20:00"}
{"index":{"_index":"zfc-doc-000002","_id":"6"}}
{"product_name": "Nike Air Zoom Pegasus 38", "category": "Sports","price": 119,"quantity": 100,"manufacturer": "Nike","timestamp": "2023-07-24 09:30:00"}
{"index":{"_index":"zfc-doc-000002","_id":"7"}}
{"product_name": "Adidas Ultraboost 21","category": "Sports","price": 129,"quantity": 80,"manufacturer": "Adidas","timestamp": "2023-07-24 10:45:00"}
{"index":{"_index":"zfc-doc-000002","_id":"8"}}
{"product_name": "Canon EOS Rebel T7i","category": "Electronics","price": 699,"quantity": 10,"manufacturer": "Canon","timestamp": "2023-07-24 14:05:00"}
{"index":{"_index":"zfc-doc-000002","_id":"9"}}
{"product_name": "LG 55-inch 4K TV", "category": "Electronics","price": 899,"quantity": 30,"manufacturer": "LG","timestamp": "2023-07-24 16:30:00"}
{"index":{"_index":"zfc-doc-000002","_id":"10"}}
{"product_name": "Lenovo ThinkPad X1 Carbon", "category": "Electronics","price": 1599,"quantity": 18,"manufacturer": "Lenovo","timestamp": "2023-07-24 18:10:00"}
聚合示例
1. 詞條聚合(Terms Aggregation)
詞條聚合是一種用于對(duì)文本字段進(jìn)行分組的聚合方式后添,它會(huì)將相同值的文檔分到同一個(gè)桶(Bucket)中,并計(jì)算每個(gè)桶中文檔的數(shù)量薪丁。
示例查詢:
GET zfc-doc-000002/_search
{
"size": 0,
"aggs": {
"category_count": {
"terms": {
"field": "category",
"size": 10
}
}
}
}
解釋?zhuān)?/p>
"size": 0
:表示只返回聚合結(jié)果遇西,不返回實(shí)際文檔數(shù)據(jù)馅精。"aggs"
:定義聚合操作。"category_count"
:自定義的聚合名稱(chēng)粱檀,用于標(biāo)識(shí)結(jié)果洲敢。"terms"
:指定使用詞條聚合。"field": "category"
:指定要進(jìn)行聚合的字段茄蚯。
2. 嵌套聚合(Nested Aggregation)
嵌套聚合允許在一個(gè)桶內(nèi)進(jìn)行更深層次的聚合操作压彭。例如,我們可以先按分類(lèi)分組渗常,然后在每個(gè)分類(lèi)內(nèi)再按制造商進(jìn)行分組壮不,并計(jì)算每個(gè)分類(lèi)下的平均價(jià)格。
示例查詢:
GET zfc-doc-000002/_search
{
"size": 0,
"aggs": {
"category_group": {
"terms": {
"field": "category",
"size": 10
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
解釋?zhuān)?/p>
-
"aggs"
:定義聚合操作皱碘。 -
"category_group"
:自定義的聚合名稱(chēng)忆畅,用于標(biāo)識(shí)結(jié)果。 -
"terms"
:指定使用詞條聚合尸执。 -
"field": "category"
:指定要進(jìn)行聚合的字段。 -
"avg_price"
:自定義的聚合名稱(chēng)缓醋,用于標(biāo)識(shí)結(jié)果如失。 -
"avg"
:指定使用平均值聚合。 -
"field": "price"
:指定要進(jìn)行聚合的數(shù)值字段送粱。
3.直方圖聚合示例(Histogram)
假設(shè)我們希望根據(jù)商品價(jià)格(price
字段)創(chuàng)建一個(gè)價(jià)格區(qū)間的直方圖褪贵,將商品按照價(jià)格范圍進(jìn)行分組,并統(tǒng)計(jì)每個(gè)價(jià)格區(qū)間內(nèi)的商品數(shù)量抗俄。
示例查詢:
GET zfc-doc-000002/_search
{
"size": 0,
"aggs": {
"price_histogram": {
"histogram": {
"field": "price",
"interval": 200
}
}
}
}
解釋?zhuān)?/p>
-
"aggs"
:定義聚合操作脆丁。 -
"price_histogram"
:自定義的聚合名稱(chēng),用于標(biāo)識(shí)結(jié)果动雹。 -
"histogram"
:指定使用直方圖聚合槽卫。 -
"field": "price"
:指定要進(jìn)行聚合的數(shù)值字段,即商品價(jià)格胰蝠。 -
"interval": 200
:指定直方圖的間隔大小歼培,這里設(shè)置為200表示將價(jià)格范圍劃分為200的區(qū)間,例如:0-200茸塞、200-400躲庄、400-600等。
4.范圍聚合示例(Range)
范圍聚合允許我們根據(jù)指定的范圍條件將文檔分組钾虐,例如:按價(jià)格范圍進(jìn)行分組并統(tǒng)計(jì)每個(gè)價(jià)格范圍內(nèi)的商品數(shù)量。
示例查詢:
GET zfc-doc-000002/_search
{
"size": 0,
"aggs": {
"price_ranges": {
"range": {
"field": "price",
"ranges": [
{ "from": 0, "to": 200 },
{ "from": 200, "to": 500 },
{ "from": 500, "to": 1000 },
{ "from": 1000 }
]
}
}
}
}
解釋?zhuān)?/p>
-
"aggs"
:定義聚合操作。 -
"price_ranges"
:自定義的聚合名稱(chēng)旷太,用于標(biāo)識(shí)結(jié)果。 -
"range"
:指定使用范圍聚合直砂。 -
"field": "price"
:指定要進(jìn)行聚合的數(shù)值字段,即商品價(jià)格丐枉。 -
"ranges"
:指定價(jià)格范圍的條件數(shù)組哆键。-
{ "from": 0, "to": 200 }
:表示價(jià)格從0到200之間的商品。 -
{ "from": 200, "to": 500 }
:表示價(jià)格從200到500之間的商品瘦锹。 -
{ "from": 500, "to": 1000 }
:表示價(jià)格從500到1000之間的商品籍嘹。 -
{ "from": 1000 }
:表示價(jià)格大于等于1000的商品。
-
5. 統(tǒng)計(jì)聚合(Stats Aggregation)
統(tǒng)計(jì)聚合可以對(duì)數(shù)值字段進(jìn)行計(jì)算弯院,包括最小值辱士、最大值、平均值听绳、總和和文檔數(shù)量颂碘。
示例查詢:
GET zfc-doc-000002/_search
{
"size": 0,
"aggs": {
"price_stats": {
"stats": {
"field": "price"
}
}
}
}
解釋?zhuān)?/p>
-
"aggs"
:定義聚合操作。 -
"price_stats"
:自定義的聚合名稱(chēng)椅挣,用于標(biāo)識(shí)結(jié)果头岔。 -
"stats"
:指定使用統(tǒng)計(jì)聚合。 -
"field": "price"
:指定要進(jìn)行聚合的數(shù)值字段鼠证。
我們上面在統(tǒng)計(jì)聚合中可以獲取很多值峡竣,那么我們也可以細(xì)化單獨(dú)獲取某一個(gè)的聚合結(jié)果。
6. 平均值聚合(Avg Aggregation)
GET zfc-doc-000002/_search
{
"size": 0,
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
7. 總和聚合(Sum Aggregation)
GET zfc-doc-000002/_search
{
"size": 0,
"aggs": {
"total_price": {
"sum": {
"field": "price"
}
}
}
}
8. 最小值聚合(Min Aggregation)
GET zfc-doc-000002/_search
{
"size": 0,
"aggs": {
"min_price": {
"min": {
"field": "price"
}
}
}
}
9. 最大值聚合(Max Aggregation)
GET zfc-doc-000002/_search
{
"size": 0,
"aggs": {
"max_price": {
"max": {
"field": "price"
}
}
}
}
10. 擴(kuò)展統(tǒng)計(jì)聚合(Extended Stats Aggregation)
GET zfc-doc-000002/_search
{
"size": 0,
"aggs": {
"price_stats_extended": {
"extended_stats": {
"field": "price"
}
}
}
}
11. 百分位數(shù)聚合(Percentiles Aggregation)
GET zfc-doc-000002/_search
{
"size": 0,
"aggs": {
"price_percentiles": {
"percentiles": {
"field": "price",
"percents": [25, 50, 75, 90]
}
}
}
}
12. 日期直方圖聚合(Date Histogram Aggregation)
假設(shè)有一個(gè)名為timestamp
的日期字段量九,我們可以進(jìn)行日期直方圖聚合适掰,按照日期進(jìn)行分組并統(tǒng)計(jì)每個(gè)時(shí)間段內(nèi)的文檔數(shù)量。
GET zfc-doc-000002/_search
{
"size": 0,
"aggs": {
"date_histogram_agg": {
"date_histogram": {
"field": "timestamp",
"fixed_interval": "1h"
}
}
}
}
本文由mdnice多平臺(tái)發(fā)布