以球員信息為例,player索引的player type包含5個字段镰烧,姓名拢军,年齡,薪水怔鳖,球隊茉唉,場上位置。
index的mapping為:
"mappings": {
"player": {
"properties": {
"name": {
"index": "not_analyzed",
"type": "string"
},
"age": {
"type": "integer"
},
"salary": {
"type": "integer"
},
"team": {
"index": "not_analyzed",
"type": "string"
},
"position": {
"index": "not_analyzed",
"type": "string"
}
},
"_all": {
"enabled": false
}
}
}
索引中的全部數(shù)據(jù):
首先结执,初始化Builder:
SearchRequestBuilder sbuilder = client.prepareSearch("player").setTypes("player");
接下來舉例說明各種聚合操作的實現(xiàn)方法度陆,因為在es的api中,多字段上的聚合操作需要用到子聚合(subAggregation)献幔,初學者可能找不到方法(網(wǎng)上資料比較少懂傀,筆者在這個問題上折騰了兩天,最后度了源碼才徹底搞清楚T_T)蜡感,后邊會特意說明多字段聚合的實現(xiàn)方法鸿竖。另外,聚合后的排序也會單獨說明铸敏。
group by/count
例如要計算每個球隊的球員數(shù)缚忧,如果使用SQL語句,應(yīng)表達如下:
select team, count(*) as player_count from player group by team;
ES的java api:
TermsBuilder teamAgg= AggregationBuilders.terms("player_count ").field("team");
sbuilder.addAggregation(teamAgg);
SearchResponse response = sbuilder.execute().actionGet();
group by多個field
例如要計算每個球隊每個位置的球員數(shù)杈笔,如果使用SQL語句闪水,應(yīng)表達如下:
select team, position, count(*) as pos_count from player group by team, position;
ES的java api:
TermsBuilder teamAgg= AggregationBuilders.terms("player_count ").field("team");
TermsBuilder posAgg= AggregationBuilders.terms("pos_count").field("position");
sbuilder.addAggregation(teamAgg.subAggregation(posAgg));
SearchResponse response = sbuilder.execute().actionGet();
max/min/sum/avg
例如要計算每個球隊年齡最大/最小/總/平均的球員年齡,如果使用SQL語句蒙具,應(yīng)表達如下:
select team, max(age) as max_age from player group by team;
ES的java api:
TermsBuilder teamAgg= AggregationBuilders.terms("player_count ").field("team");
MaxBuilder ageAgg= AggregationBuilders.max("max_age").field("age");
sbuilder.addAggregation(teamAgg.subAggregation(ageAgg));
SearchResponse response = sbuilder.execute().actionGet();
對多個field求max/min/sum/avg
例如要計算每個球隊球員的平均年齡球榆,同時又要計算總年薪,如果使用SQL語句禁筏,應(yīng)表達如下:
select team, avg(age)as avg_age, sum(salary) as total_salary from player group by team;
ES的java api:
TermsBuilder teamAgg= AggregationBuilders.terms("team");
AvgBuilder ageAgg= AggregationBuilders.avg("avg_age").field("age");
SumBuilder salaryAgg= AggregationBuilders.avg("total_salary ").field("salary");
sbuilder.addAggregation(teamAgg.subAggregation(ageAgg).subAggregation(salaryAgg));
SearchResponse response = sbuilder.execute().actionGet();
聚合后對Aggregation結(jié)果排序
例如要計算每個球隊總年薪持钉,并按照總年薪倒序排列,如果使用SQL語句篱昔,應(yīng)表達如下:
select team, sum(salary) as total_salary from player group by team order by total_salary desc;
ES的java api:
TermsBuilder teamAgg= AggregationBuilders.terms("team").order(Order.aggregation("total_salary ", false);
SumBuilder salaryAgg= AggregationBuilders.avg("total_salary ").field("salary");
sbuilder.addAggregation(teamAgg.subAggregation(salaryAgg));
SearchResponse response = sbuilder.execute().actionGet();
需要特別注意的是每强,排序是在TermAggregation處執(zhí)行的始腾,Order.aggregation函數(shù)的第一個參數(shù)是aggregation的名字,第二個參數(shù)是boolean型空执,true表示正序浪箭,false表示倒序。
Aggregation結(jié)果條數(shù)的問題
默認情況下辨绊,search執(zhí)行后奶栖,僅返回10條聚合結(jié)果,如果想反悔更多的結(jié)果门坷,需要在構(gòu)建TermsBuilder 時指定size:
TermsBuilder teamAgg= AggregationBuilders.terms("team").size(15);
Aggregation結(jié)果的解析/輸出
得到response后:
Map<String, Aggregation> aggMap = response.getAggregations().asMap();
StringTerms teamAgg= (StringTerms) aggMap.get("keywordAgg");
Iterator<Bucket> teamBucketIt = teamAgg.getBuckets().iterator();
while (teamBucketIt .hasNext()) {
Bucket buck = teamBucketIt .next();
//球隊名
String team = buck.getKey();
//記錄數(shù)
long count = buck.getDocCount();
//得到所有子聚合
Map subaggmap = buck.getAggregations().asMap();
//avg值獲取方法
double avg_age= ((InternalAvg) subaggmap.get("avg_age")).getValue();
//sum值獲取方法
double total_salary = ((InternalSum) subaggmap.get("total_salary")).getValue();
//...
//max/min以此類推
}
總結(jié)
綜上宣鄙,聚合操作主要是調(diào)用了SearchRequestBuilder的addAggregation方法,通常是傳入一個TermsBuilder默蚌,子聚合調(diào)用TermsBuilder的subAggregation方法冻晤,可以添加的子聚合有TermsBuilder、SumBuilder敏簿、AvgBuilder、MaxBuilder宣虾、MinBuilder等常見的聚合操作惯裕。
從實現(xiàn)上來講,SearchRequestBuilder在內(nèi)部保持了一個私有的 SearchSourceBuilder實例绣硝, SearchSourceBuilder內(nèi)部包含一個List<AbstractAggregationBuilder>蜻势,每次調(diào)用addAggregation時會調(diào)用 SearchSourceBuilder實例,添加一個AggregationBuilder鹉胖。
同樣的握玛,TermsBuilder也在內(nèi)部保持了一個List<AbstractAggregationBuilder>,調(diào)用addAggregation方法(來自父類addAggregation)時會添加一個AggregationBuilder甫菠。有興趣的讀者也可以閱讀源碼的實現(xiàn)挠铲。
如果有什么問題,歡迎一起討論寂诱,如果文中有什么錯誤拂苹,歡迎批評指正。
注:文中使用的Elastic Search API版本為2.3.2
public List<Map<String, Object>> queryAggregationsByAttr(BoolQueryBuilder boolQueryBld){
List<Map<String, Object>> result = new ArrayList<>();
NestedBuilder nestedBuilder= AggregationBuilders.nested("negstedAttr").path("spuAttrList");
//屬性名稱分組
TermsBuilder tbName= AggregationBuilders.terms("attrNameAgg").field("spuAttrList.name");
//嵌套查詢的子查詢中分組count
TermsBuilder tb= AggregationBuilders.terms("attrvIdAgg").field("spuAttrList.attrvId");
//屬性值字段
TermsBuilder tbVal= AggregationBuilders.terms("attrValAgg").field("spuAttrList.value");
NestedBuilder all = nestedBuilder.subAggregation(tbName.subAggregation(tb.subAggregation(tbVal)));
NativeSearchQueryBuilder nativeQueryBuilderAgg = new NativeSearchQueryBuilder()
.withQuery(boolQueryBld)
.withIndices("skus").withTypes("skus")
.addAggregation(all);
SearchQuery searchQueryAgg = nativeQueryBuilderAgg.build();
Aggregations aggregations = elasticsearchTemplate.query(searchQueryAgg, new ResultsExtractor<Aggregations>() {
@Override
public Aggregations extract(SearchResponse response) {
return response.getAggregations();
}
});
Map<String, Aggregation> map=aggregations.asMap();
for(String s:map.keySet()){
if("negstedAttr".equals(s)) {
InternalNested internalNested = (InternalNested)map.get(s);
//屬性名稱
StringTerms nameTerms=(StringTerms) internalNested.getAggregations().get("attrNameAgg");
//屬性子表id
for(org.elasticsearch.search.aggregations.bucket.terms.Terms.Bucket tbket:nameTerms.getBuckets()){
//對應(yīng)一組屬性值
Map<String, Object> categoryIdsMapTerms = new HashMap<String, Object>();
categoryIdsMapTerms.put("typeId", "attrValueIds");
categoryIdsMapTerms.put("typeName", tbket.getKeyAsString());
LongTerms attrvIdTerms=(LongTerms)tbket.getAggregations().asMap().get("attrvIdAgg");
if(attrvIdTerms == null || CollectionUtils.isEmpty(attrvIdTerms.getBuckets())) {
continue;
}
List<Map<String, Object>> dataList = new ArrayList<>();
//屬性子表val
for(org.elasticsearch.search.aggregations.bucket.terms.Terms.Bucket attrIdB : attrvIdTerms.getBuckets()) {
//dataListMap
Map<String, Object> dataListMap = new HashMap<String, Object>();
Long attrvId = (Long) attrIdB.getKeyAsNumber();
StringTerms valTerms=(StringTerms) attrIdB.getAggregations().asMap().get("attrValAgg");
if(valTerms == null || CollectionUtils.isEmpty(valTerms.getBuckets())) {
continue;
}
String attrValStr = valTerms.getBuckets().get(0).getKeyAsString();
dataListMap.put("id", attrvId);
dataListMap.put("name", attrValStr);
dataList.add(dataListMap);
}
if(!CollectionUtils.isEmpty(dataList)) {
categoryIdsMapTerms.put("dataList", dataList);
}
result.add(categoryIdsMapTerms);
}
}
}
return result;
}