序言
后面有大量類似于mysql
的sum, group by
查詢
elk
===
elk總體架構(gòu)
https://www.elastic.co/cn/products
Beat
基于go語(yǔ)言寫(xiě)的輕量型數(shù)據(jù)采集器味混,讀取數(shù)據(jù)细睡,迅速發(fā)送到Logstash進(jìn)行解析门粪,亦或直接發(fā)送到Elasticsearch進(jìn)行集中式存儲(chǔ)和分析。
Logstash
Logstash 是開(kāi)源的服務(wù)器端數(shù)據(jù)處理管道,能夠同時(shí)從多個(gè)來(lái)源采集數(shù)據(jù)、格式化數(shù)據(jù)躲胳,然后將數(shù)據(jù)發(fā)送到es進(jìn)行存儲(chǔ)。
ElasticSearch
Elasticsearch 是基于JSON的分布式搜索和分析引擎纤勒,是利用倒排索引實(shí)現(xiàn)的全文索引坯苹。
Kibana
Kibana 能夠可視化 Elasticsearch 中的數(shù)據(jù)并操作。
elasticsearch
es在elk生態(tài)圈中處于核心地位踊东,是開(kāi)源大規(guī)谋崩模基于倒排索引的全文搜索分析引擎,他幾乎能實(shí)時(shí)的支持存儲(chǔ)搜索分析闸翅。
優(yōu)勢(shì):
- 橫向可擴(kuò)展性: 增加服務(wù)器可直接配置在集群中
- 分片機(jī)制提供更好的分布性: 分而治之的方式來(lái)提升處理效率
- 高可用: 提供復(fù)制(replica)機(jī)制
- 實(shí)時(shí)性: 通過(guò)將磁盤(pán)上的文件放入文件緩存系統(tǒng)來(lái)提高查詢速度
基本概念
- Index: 一系列文檔的集合,類似于mysql中數(shù)據(jù)庫(kù)的概念
- Type: 在Index里面可以定義不同的type菊霜,type的概念類似于mysql中表的概念坚冀,是一系列具有相同特征數(shù)據(jù)的結(jié)合。
- Document: 文檔的概念類似于mysql中的一條存儲(chǔ)記錄鉴逞,并且為json格式记某,在Index下的不同type下,可以有許多document构捡。
- Shards: 在數(shù)據(jù)量很大的時(shí)候液南,進(jìn)行水平的擴(kuò)展,提高搜索性能
- Replicas: 防止某個(gè)分片的數(shù)據(jù)丟失勾徽,可以并行得在備份數(shù)據(jù)里及搜索提高性能
elasticsearch查詢語(yǔ)法
_cat API
查詢當(dāng)前es集群的相關(guān)消息滑凉,包括集群中的index數(shù)量、運(yùn)行狀態(tài)、當(dāng)前集群所在的ip畅姊,目的在于將查詢的結(jié)果以更加友好的方式輸出咒钟。
- cat: 輸出
_cat api
中所有支持的查詢命令 - cat health: 檢查es集群運(yùn)行的狀況
- cat count: 可以快速的查詢集群或者index中文檔的數(shù)量
- cat indices: 查詢當(dāng)前集群中所有index的數(shù)據(jù),包括index的分片數(shù)若未、document的數(shù)量朱嘴、存儲(chǔ)所用的空間大小...
- 其他cat api參考官方文檔: https://www.elastic.co/guide/en/elasticsearch/reference/5.5/cat.html
Search APIs
搜索數(shù)據(jù),查詢語(yǔ)法多粗合,功能強(qiáng)大
REST request URI: 輕便快速的URI查詢方法
REST request body: 可以有許多限制條件的json格式查詢方法
- "query": 在請(qǐng)求消息體中的
query
允許我們用Query DSL
的方式查詢萍嬉。- "term": 查詢時(shí)判斷某個(gè)document是否包含某個(gè)具體的值,不會(huì)對(duì)被查詢的值進(jìn)行分詞查詢
- "match" 將被查詢值進(jìn)行分詞隙疚,然后用評(píng)分機(jī)制(TF/IDF)進(jìn)行打分
- "match_phrase": 查詢指定段落
- "Bool": 結(jié)合其他真值查詢壤追,通常和
must should mustnot
(與或非)一起組合出復(fù)雜的查詢 - "range": 查詢時(shí)指定某個(gè)字段在某個(gè)特定的范圍
"range": { "FIELD": {# 指定具體過(guò)濾的字段 "gte": 1,# gte: >=, gt: > "lte": 10 } }
- "from": 以一定的偏移量來(lái)查看我們檢索的結(jié)果,缺省從檢索的第一條數(shù)據(jù)開(kāi)始顯示
- "size": 指定檢索結(jié)果中輸出的數(shù)據(jù)條數(shù)甚淡,缺省為10條
- "sort": 允許我們將檢索的結(jié)果以指定的字段進(jìn)行排序顯示
- "_source": 指定檢索結(jié)果輸出的字段
- "script_fields": 該類型允許我們通過(guò)一個(gè)腳本來(lái)計(jì)算document中不存在的值大诸,比如我們需要計(jì)算install/click得到cti之類的
"script_fields": {
"FIELD": {# 指定腳本計(jì)算之后值得名稱
"script": {# 腳本內(nèi)的運(yùn)算
}
}
}
- "aggs": 基于搜索查詢,可以嵌套聚合來(lái)組合復(fù)雜的需求
"aggs": {
"NAME": {# 指定結(jié)果的名稱
"AGG_TYPE": {# 指定具體的聚合方法贯卦,
TODO: # 聚合體內(nèi)制定具體的聚合字段
}
}
TODO: # 該處可以嵌套聚合
}
Query DSL
Query DSL是es提供的一套完整的基于json格式的結(jié)構(gòu)化查詢方法资柔,包含兩類不同的查詢語(yǔ)義:
- Leaf query clauses: 葉子查詢句法就是在指定的字段中搜索指定的值,有
match, term or range
. - Compound query clauses: 復(fù)合查詢句法會(huì)包含葉子句法或者復(fù)合句法撵割,作用是為了多重查詢贿堰,有
bool or dis_max
.
Query and filter context
查詢語(yǔ)句的行為取決于它是使用查詢型上下文還是過(guò)濾型上下文
Query context: 在這種上下文環(huán)境中,查詢語(yǔ)句的返回的結(jié)果是”結(jié)果和查詢語(yǔ)句的匹配程序如何“啡彬,返回的結(jié)果數(shù)據(jù)中都會(huì)帶上
_score
值羹与,象征匹配程度;Filter context: 過(guò)濾型上下文環(huán)境中庶灿,查詢語(yǔ)句則表面匹配與否(yes or no)纵搁。es內(nèi)置式為
filter context
保留緩存用來(lái)提高查詢性能,因此filter context
查詢的速度要快于query context
elasticsearch查詢示例
_cat api查詢示例
_cat查詢當(dāng)前es集群運(yùn)行的狀況
Kibana’s Console: `GET /_cat/health?v`
curl: `curl -XGET "127.0.0.1:9200/_cat/health?v"`
_cat查詢當(dāng)前es集群中所有的indices
Kibana’s Console: `GET /_cat/indices?v`
curl: `curl -XGET "127.0.0.1:9200/_cat/indices?v"`
_search api查詢示例
創(chuàng)建index
PUT /customer?pretty
output:
{
"acknowledged": true,
"shards_acknowledged": true
}
插入數(shù)據(jù)
日常任務(wù)中往踢,有時(shí)候往es插入數(shù)據(jù)的時(shí)候會(huì)出現(xiàn)504網(wǎng)關(guān)超時(shí)腾誉,這時(shí)候就需要手動(dòng)的插入少量數(shù)據(jù)
PUT /rta_daily_report/campaign/164983850_rba_20170808?pretty
{
"doc": {
"cid": 164983850,
"advertiser_id": 799,
"trace_app_id": "com.zeptolab.cats.google",
"network_cid": "6656665",
"platform": 1,
"direct": 2,
"last_second_domain": "",
"jump_type": 2,
"direct_trace_app_id": "",
"mode": 0,
"third": "kuaptrk.com",
"hops": 9,
"yyyymmdd": "2017-08-07T16:00:00",
"type": "rba",
"click": 2
}
}
output:
{
"_index": "rta_daily_report",
"_type": "campaign",
"_id": "164983851_rba_20170808",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"created": true
}
刪除數(shù)據(jù)
指定document_id刪除:
DELETE /rta_daily_report/campaign/164983850_rba_20170808?pretty
query中滿足一定條件刪除
POST rta_daily_report/_delete_by_query
{
"query": {
"match": {
"message": "some message"
}
}
}
根據(jù)具體document_id查詢
GET rta_daily_report/campaign/145603275_m_normal_20170804?pretty
output:
{
"_index": "rta_daily_report",
"_type": "campaign",
"_id": "145603275_m_normal_20170804",
"_version": 1,
"found": true,
"_source": {
"cid": 145603275,
"advertiser_id": 457,
"trace_app_id": "id1105855019",
"network_cid": "plr_gs_ios_cn_osv9",
"platform": 2,
"direct": 1,
"last_second_domain": "tracking.lenzmx.com",
"jump_type": 7,
"direct_trace_app_id": "id1105855019",
"mode": 3,
"third": "3444.tlnk.io",
"hops": 1,
"yyyymmdd": "2017-08-03T16:00:00",
"type": "m_normal",
"click": 2,
"impression": 3,
"revenue": 0,
"install": 0
}
}
查詢所有數(shù)據(jù)
URI:
GET rta_daily_report/campaign/_search?q=*&pretty
request boy:
GET rta_daily_report/campaign/_search
{
"query": {
"match_all": {}
}
}
output:
"hits": {
"total": 2705059,
"max_score": 1,
"hits": [
{
"_index": "rta_daily_report",
"_type": "campaign",
"_id": "163016610_rba_20170801",
"_score": 1,
"_source": {
"cid": 163016610,
"advertiser_id": 799,
"trace_app_id": "mappstreet.videoeditor",
"network_cid": "6287283",
"platform": 1,
"direct": 2,
"last_second_domain": "",
"jump_type": 2,
"direct_trace_app_id": "",
"mode": 0,
"third": "aff.adsbreak.com",
"hops": 8,
"yyyymmdd": "2017-07-31T16:00:00",
"type": "rba",
"click": 0
}
},
....]
}
查詢特定字段,并且指定排序字段
在indices為rta_daily_report中搜索type:rba,以日期升序輸出1個(gè)查詢結(jié)果
URI:
GET rta_daily_report/_search?q=type:rba&sort=yyyymmdd:asc&pretty
request bofy:
GET rta_daily_report/_search
{
"query": {
"match": {
"type": "rba"
}
},
"sort": [
{
"yyyymmdd": {
"order": "desc"
}
}
]
}
指定輸出字段
查詢類型為rba/b2t峻呕,按照日期降序排列利职,輸出制定字段,并且只輸出5條查詢結(jié)果瘦癌,如果要匹配段落猪贪,則用"match_phrase": { "address": "mill lane" }
GET rta_daily_report/_search
{
"query": {
"match": {
"type": "rba b2t"
}
},
"sort": [
{
"yyyymmdd": {
"order": "desc"
}
}
],
"_source": ["yyyymmdd", "type", "cid", "click", "revenue"],
"size": 5
}
output:
"hits": {
"total": 1327184,
"max_score": null,
"hits": [
{
"_index": "rta_daily_report",
"_type": "campaign",
"_id": "54870921_b2t_20170804",
"_score": null,
"_source": {
"revenue": 76500,
"yyyymmdd": "2017-08-03T16:00:00",
"type": "b2t",
"click": 22616,
"cid": 54870921
},
"sort": [
1501776000000
]
},
bool組合復(fù)雜查詢
下例是查詢類型為b2t,收入必須大于0的所有單子的click讯私、revenue相關(guān)數(shù)據(jù)
GET rta_daily_report/_search
{
"query": {
"bool": {
"must": [
{"match": {
"type": "b2t"
}}
],
"must_not": [
{
"range": {
"revenue": {
"lte": 0
}
}
}
]
}
},
"sort": [
{
"yyyymmdd": {
"order": "desc"
}
}
],
"_source": ["yyyymmdd", "type", "cid", "click", "revenue"],
"size": 10
}
聚合查詢
下例是類似于sql中的聚合查詢热押,查詢每天不同類型對(duì)應(yīng)的intall總量
GET /rta_daily_report/_search
{
"size": 0,
"aggs": {
"sum_install": {
"date_histogram": {
"field": "yyyymmdd",
"interval": "day"
},
"aggs": {
"types": {
"terms": {
"field": "type.keyword",
"size": 10
},
"aggs": {
"install": {
"sum": {
"field": "install"
}
}
}
}
}
}
}
}
output
"aggregations": {
"sum_install": {
"buckets": [
{
"key_as_string": "2017-07-31T00:00:00.000Z",
"key": 1501459200000,
"doc_count": 659553,
"types": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "rba",
"doc_count": 321811,
"install": {
"value": 73835
}
},
{
"key": "m_normal",
"doc_count": 321711,
"install": {
"value": 18964
}
},
script查詢
下例通過(guò)document中的click,install字段西傀,計(jì)算出文檔中不存在的數(shù)據(jù)。
GET /rta_daily_report/campaign/_search?pretty
{
"query" : {
"bool": {
"must": [
{
"range": {
"click": {
"gt": 0
}
}
},
{
"range": {
"install": {
"gt": 0
}
}
}
]
}},
"size": 100,
"script_fields": {
"cti": {
"script": {
"lang": "painless",
"inline": "1.0 * doc['install'].value / doc['click'].value"
}
}
}
}
output
"hits": {
"total": 23036,
"max_score": 2,
"hits": [
{
"_index": "rta_daily_report",
"_type": "campaign",
"_id": "160647918_rta_20170801",
"_score": 2,
"fields": {
"cti": [
0.0005970149253731343
]
}
},
{
"_index": "rta_daily_report",
"_type": "campaign",
"_id": "162293741_rta_20170801",
"_score": 2,
"fields": {
"cti": [
0.00007796055196070789
]
}
},
查詢一段時(shí)間內(nèi)的聚合數(shù)據(jù)
GET rta_daily_report/campaign/_search
{
"size": 0,
"aggs": {
"snaptime": {
"date_range": {
"field": "@timestamp",
"ranges": [
{
"from": "now-30d/d",
"to": "now"
}
]
},
"aggs": {
"sum_revenue": {
"sum": {
"field": "revenue"
}
}
}
}
}
}
output:
"aggregations": {
"snaptime": {
"buckets": [
{
"key": "2017-07-17T00:00:00.000Z-2017-08-16T03:30:16.995Z",
"from": 1500249600000,
"from_as_string": "2017-07-17T00:00:00.000Z",
"to": 1502854216995,
"to_as_string": "2017-08-16T03:30:16.995Z",
"doc_count": 18685619,
"sum_revenue": {
"value": 6631665219
}
}
]
}
}
查詢某段時(shí)間內(nèi)聚合數(shù)據(jù)楞黄,并且script計(jì)算額外字段
GET rta_daily_report/campaign/_search
{
"size": 0,
"aggs" : {
"cvr_per_month" : {
"date_range" : {
"field": "@timestamp",
"ranges": [
{
"from": "now-30d/d",
"to": "now"
}
]
},
"aggs": {
"sum_click": {
"sum": {
"field": "click"
}
},
"sum_install": {
"sum": {
"field": "install"
}
},
"cvr": {
"bucket_script": {
"buckets_path": {
"install": "sum_install",
"click": "sum_click"
},
"script": "1.0 * params.install / params.click"
}
}
}
}
}
}
output:
"aggregations": {
"cvr_per_month": {
"buckets": [
{
"key": "2017-07-17T00:00:00.000Z-2017-08-16T03:37:22.732Z",
"from": 1500249600000,
"from_as_string": "2017-07-17T00:00:00.000Z",
"to": 1502854642732,
"to_as_string": "2017-08-16T03:37:22.732Z",
"doc_count": 18685619,
"sum_click": {
"value": 15067388421
},
"sum_install": {
"value": 7602055
},
"cvr": {
"value": 0.0005045370032012133
}
}
]
}
}
參考鏈接:
日期格式
查詢語(yǔ)法1
查詢語(yǔ)法2
kibana
logstash
TODO: