個(gè)人專(zhuān)題目錄](méi)(http://www.reibang.com/p/140e2a59db2c)
1. elasticsearch高級(jí)搜索聚合
聚合分析是數(shù)據(jù)庫(kù)中重要的功能特性瑟枫,完成對(duì)一個(gè)查詢的數(shù)據(jù)集中數(shù)據(jù)的聚合計(jì)算,如:找出某字段(或計(jì)算表達(dá)式的結(jié)果)的最大值、最小值,計(jì)算和帅刊、平均值等。ES作為搜索引擎兼數(shù)據(jù)庫(kù)欲逃,同樣提供了強(qiáng)大的聚合分析能力灰瞻。
1.1 cardinality去重計(jì)數(shù)
其作用是對(duì)選擇字段先執(zhí)行類(lèi)似sql中的distinct操作,去掉集合中的重復(fù)項(xiàng)干跛,然后統(tǒng)計(jì)排重后的集合長(zhǎng)度子姜。
總共有多少不同的值 相當(dāng)于SQL中的 select count(distinct clusterId) from table
POST /book-index/_search
{
"from": 0,
"size": 100,
"aggregations": {
"agg": {
"cardinality": {
"field": "categoryName"
}
}
}
}
@Override
public void cardinalityAggregations(String indexName, String field) throws Exception {
CardinalityAggregationBuilder aggregationBuilder = AggregationBuilders.cardinality("agg").field(field);
baseQuery.builder(indexName, null, null, aggregationBuilder);
}
@Test
public void testCardinalityAggregations() throws Exception {
aggregationQuery.cardinalityAggregations(Constants.INDEX_NAME, "categoryName");
aggregationQuery.cardinalityAggregations(Constants.INDEX_NAME, "brandName");
}
1.2 range統(tǒng)計(jì)
range統(tǒng)計(jì)能夠獲取得到一個(gè)屬于指定范圍集的文檔的個(gè)數(shù)。除些之外楼入,還能夠獲取指定字段的聚合數(shù)據(jù)哥捕。例如,我們可以某個(gè)數(shù)值字段中小于100浅辙,100200扭弧,200300三外范圍內(nèi)的文檔個(gè)數(shù),還可以用在日期记舆,IP地址范圍統(tǒng)計(jì) 鸽捻。
統(tǒng)計(jì)2011以前,2011~2019泽腮,2019及以后的文檔數(shù):
POST /book-index/_search
{
"from": 0,
"size": 100,
"aggregations": {
"agg": {
"date_range": {
"field": "createTime",
"format": "yyyy",
"ranges": [
{
"to": "2011"
},
{
"from": "2011",
"to": "2019"
},
{
"from": "2019"
}
],
"keyed": false
}
}
}
}
@Override
public void dateRangeAggregation(String indexName, String field) throws Exception {
AggregationBuilder agg1 = AggregationBuilders.dateRange("agg").field(field).format("yyyy").
addUnboundedTo("2011").
addRange("2011", "2019")
.addUnboundedFrom("2019");
baseQuery.builder(indexName, null, null, agg1);
}
@Test
public void testDateRangeAggregation() throws Exception {
aggregationQuery.dateRangeAggregation(Constants.INDEX_NAME, "createTime");
}
1.3 histogram 統(tǒng)計(jì)
histogram 統(tǒng)計(jì)能夠?qū)ψ侄稳≈蛋撮g隔統(tǒng)計(jì)建立直方圖(針對(duì)數(shù)值型和日期型字段)御蒲。
比如我們以5為間隔,統(tǒng)計(jì)不同區(qū)間的诊赊,現(xiàn)在想每隔5就創(chuàng)建一個(gè)桶厚满,統(tǒng)計(jì)每隔區(qū)間都有多少個(gè)文檔:
POST /book-index/_search
{
"from": 0,
"size": 100,
"aggregations": {
"agg": {
"histogram": {
"field": "price",
"interval": 1000,
"offset": 0,
"order": {
"_key": "asc"
},
"keyed": false,
"min_doc_count": 0
}
}
}
}
/**
* histogram 統(tǒng)計(jì)能夠?qū)ψ侄稳≈蛋撮g隔統(tǒng)計(jì)建立直方圖
*
* @param indexName 索引名稱(chēng)
* @param field 字段名稱(chēng)
* @param interval 間段值
* @throws Exception
*/
@Override
public void histogramAggregation(String indexName, String field, int interval) throws Exception {
AggregationBuilder agg1 = AggregationBuilders.histogram("agg").field(field).interval(interval);
baseQuery.builder(indexName, null, null, agg1);
}
@Test
public void testHistogramAggregation() throws Exception {
aggregationQuery.histogramAggregation(Constants.INDEX_NAME, "price", 1000);
}
1.4 date_histogram統(tǒng)計(jì)
histogram 除了對(duì)數(shù)值統(tǒng)計(jì)外,還提供了date_histogram統(tǒng)計(jì)類(lèi)型碧磅,可以應(yīng)用于日期字段類(lèi)型碘箍。date_histogram允許我們使用year,month,week,day,hour或minute等常量作為interval屬性的取值遵馆。
支持的日期格式:
public static final DateHistogramInterval SECOND = new DateHistogramInterval("1s");
public static final DateHistogramInterval MINUTE = new DateHistogramInterval("1m");
public static final DateHistogramInterval HOUR = new DateHistogramInterval("1h");
public static final DateHistogramInterval DAY = new DateHistogramInterval("1d");
public static final DateHistogramInterval WEEK = new DateHistogramInterval("1w");
public static final DateHistogramInterval MONTH = new DateHistogramInterval("1M");
public static final DateHistogramInterval QUARTER = new DateHistogramInterval("1q");
public static final DateHistogramInterval YEAR = new DateHistogramInterval("1y");
例如創(chuàng)建時(shí)間以天為單位來(lái)統(tǒng)計(jì)文檔數(shù)量:
POST /book-index/_search
{
"from": 0,
"size": 100,
"aggregations": {
"agg": {
"date_histogram": {
"field": "createTime",
"calendar_interval": "1d",
"offset": 0,
"order": {
"_key": "asc"
},
"keyed": false,
"min_doc_count": 0
}
}
}
}
/**
* histogram 統(tǒng)計(jì)能夠?qū)ψ侄稳≈蛋撮g隔統(tǒng)計(jì)建立直方圖
*
* @param indexName 索引名稱(chēng)
* @param field 字段名稱(chēng)
* @param interval 間段值
* @throws Exception
*/
@Override
public void histogramDateAggregation(String indexName, String field, int interval) throws Exception {
AggregationBuilder agg1 = AggregationBuilders.dateHistogram("agg").field(field)
.calendarInterval(DateHistogramInterval.DAY)
.calendarInterval(DateHistogramInterval.days(interval));
baseQuery.builder(indexName, null, null, agg1);
}
@Test
public void testHistogramDateAggregation() throws Exception {
aggregationQuery.histogramDateAggregation(Constants.INDEX_NAME, "createTime", 1);
}
1.5 extended_stats統(tǒng)計(jì)聚合
extended_stats統(tǒng)計(jì)使得我們可以對(duì)一個(gè)數(shù)值型字段計(jì)算統(tǒng)計(jì)信息。我們能夠得到個(gè)數(shù)丰榴、總和货邓、平方和、均值四濒、最小值换况、最大值、方差及標(biāo)準(zhǔn)差盗蟆。
POST /book-index/_search
{
"from": 0,
"size": 100,
"aggregations": {
"agg": {
"extended_stats": {
"field": "price",
"sigma": 2
}
}
}
}
@Override
public void extendedStatsAggregation(String indexName, String field) throws Exception {
ExtendedStatsAggregationBuilder agg1 = AggregationBuilders.extendedStats("agg").field(field);
baseQuery.builder(indexName, null, null, agg1);
}
@Test
public void testExtendedStatsAggregation() throws Exception {
aggregationQuery.extendedStatsAggregation(Constants.INDEX_NAME, "price");
}
1.6 terms_stats統(tǒng)計(jì)
terms_stats統(tǒng)計(jì)提供了在一個(gè)字段上基于另一個(gè)字段獲得的取值進(jìn)行統(tǒng)計(jì)的能力戈二。
例如對(duì)fee字段進(jìn)行平均值統(tǒng)計(jì),同時(shí)希望根據(jù)省份字段對(duì)統(tǒng)計(jì)值進(jìn)行劃分喳资。
PPOST /book-index/_search
{
"from": 0,
"size": 100,
"query": {
"range": {
"createTime": {
"from": "2015-03-08 00:00:00",
"to": "2020-03-08 00:00:00",
"include_lower": true,
"include_upper": true,
"boost": 1
}
}
},
"aggregations": {
"brandName": {
"terms": {
"field": "brandName",
"size": 10,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
},
"aggregations": {
"avg_request_stock": {
"avg": {
"field": "stock"
}
}
}
}
}
}
@Override
public void termsAggregation(String indexName, String startTime, String endTime) throws Exception {
RangeQueryBuilder queryBuilder = QueryBuilders.rangeQuery("createTime").from(startTime).to(endTime);
//text類(lèi)型不能用于索引或排序觉吭,必須轉(zhuǎn)成keyword類(lèi)型
TermsAggregationBuilder aggregation = AggregationBuilders.terms("brandName").field("brandName");
//avg_age 為子聚合名稱(chēng),名稱(chēng)可隨意
aggregation.subAggregation(AggregationBuilders.avg("avg_request_stock").field("stock"));
baseQuery.builder(indexName, queryBuilder, null, aggregation);
}
@Test
public void testTermsAggregation() throws Exception {
aggregationQuery.termsAggregation(Constants.INDEX_NAME, "2015-03-08 00:00:00", "2020-03-08 00:00:00");
}
1.7 geo_distance統(tǒng)計(jì)
全用該類(lèi)型可以獲得給定位置某個(gè)距離范圍內(nèi)的文檔個(gè)數(shù)骨饿。
比如:利用第七節(jié)中的數(shù)據(jù)亏栈,統(tǒng)計(jì)離廈門(mén)(0100公里,100500公里,500~5000公里)索引中的文檔數(shù)量宏赘。
POST /map/cp/_search
{
"aggregations": {
"agg": {
"geo_distance": {
"field": "location",
"origin": {
"lat": 40.1225,
"lon": 116.2577
},
"ranges": [{
"key": "*-100.0",
"from": 0.0,
"to": 100.0
}, {
"key": "100.0-500.0",
"from": 100.0,
"to": 500.0
}, {
"key": "500.0-5000.0",
"from": 500.0,
"to": 5000.0
}],
"keyed": false,
"unit": "km",
"distance_type": "ARC"
}
}
}
}
@Override
public void geoDistanceAggregation(String indexName) throws Exception {
GeoDistanceAggregationBuilder geoDistanceAggregationBuilder = AggregationBuilders.geoDistance("agg", new GeoPoint(40.1225, 116.2577))
.field("location")
.unit(DistanceUnit.KILOMETERS)
.addUnboundedTo(100)
.addRange(100, 500)
.addRange(500, 5000);
baseQuery.builder(indexName, null, null, geoDistanceAggregationBuilder);
}
@Test
public void testGeoDistanceAggregation() throws Exception {
aggregationQuery.geoDistanceAggregation("cn_large_cities");
}