ZipKin入門(mén)教程
ZipKin介紹
Zipkin是一款開(kāi)源的分布式實(shí)時(shí)數(shù)據(jù)追蹤系統(tǒng)(Distributed Tracking System)瘫镇,基于 Google Dapper的論文設(shè)計(jì)而來(lái)蝶涩,由 Twitter 公司開(kāi)發(fā)貢獻(xiàn)赠摇。其主要功能是聚集來(lái)自各個(gè)異構(gòu)系統(tǒng)的實(shí)時(shí)監(jiān)控?cái)?shù)據(jù)徊都。
ZipKin架構(gòu)
ZipKin可以分為兩部分,一部分是zipkin server静檬,用來(lái)作為數(shù)據(jù)的采集存儲(chǔ)宽档、數(shù)據(jù)分析與展示尉姨;zipkin client是zipkin基于不同的語(yǔ)言及框架封裝的一些列客戶(hù)端工具,這些工具完成了追蹤數(shù)據(jù)的生成與上報(bào)功能吗冤,架構(gòu)如下:
Instrumented client和Instrumented server是分布式系統(tǒng)中的服務(wù)又厉,通過(guò)裝備庫(kù)采集跟蹤信息九府,裝備庫(kù)再調(diào)用Transport,把跟蹤信息發(fā)送給Zipkin收集器(collectors)馋没,這些收集器將跟蹤數(shù)據(jù)保存到存儲(chǔ)(storage)中昔逗。
Zipkin Server主要包括四個(gè)模塊:
(1)Collector 接收或收集各應(yīng)用傳輸?shù)臄?shù)據(jù)
(2)Storage 存儲(chǔ)接收或收集過(guò)來(lái)的數(shù)據(jù)降传,當(dāng)前支持Memory篷朵,MySQL,Cassandra婆排,ElasticSearch等声旺,默認(rèn)存儲(chǔ)在內(nèi)存中。
(3)API(Query) 負(fù)責(zé)查詢(xún)Storage中存儲(chǔ)的數(shù)據(jù)晌柬,提供簡(jiǎn)單的JSON API獲取數(shù)據(jù)辙谜,主要提供給web UI使用
(4)Web 提供簡(jiǎn)單的web界面
服務(wù)追蹤流程如下:
┌─────────────┐ ┌───────────────────────┐ ┌─────────────┐ ┌──────────────────┐
│ User Code │ │ Trace Instrumentation │ │ Http Client │ │ Zipkin Collector │
└─────────────┘ └───────────────────────┘ └─────────────┘ └──────────────────┘
│ │ │ │
┌─────────┐
│ ──┤GET /foo ├─? │ ────┐ │ │
└─────────┘ │ record tags
│ │ ?───┘ │ │
────┐
│ │ │ add trace headers │ │
?───┘
│ │ ────┐ │ │
│ record timestamp
│ │ ?───┘ │ │
┌─────────────────┐
│ │ ──┤GET /foo ├─? │ │
│X-B3-TraceId: aa │ ────┐
│ │ │X-B3-SpanId: 6b │ │ │ │
└─────────────────┘ │ invoke
│ │ │ │ request │
│
│ │ │ │ │
┌────────┐ ?───┘
│ │ ?─────┤200 OK ├─────── │ │
────┐ └────────┘
│ │ │ record duration │ │
┌────────┐ ?───┘
│ ?──┤200 OK ├── │ │ │
└────────┘ ┌────────────────────────────────┐
│ │ ──┤ asynchronously report span ├────? │
│ │
│{ │
│ "traceId": "aa", │
│ "id": "6b", │
│ "name": "get", │
│ "timestamp": 1483945573944000,│
│ "duration": 386000, │
│ "annotations": [ │
│--snip-- │
└────────────────────────────────┘
Instrumented client和server是分別使用了ZipKin Client的服務(wù)胆胰,Zipkin Client會(huì)根據(jù)配置將追蹤數(shù)據(jù)發(fā)送到Zipkin Server中進(jìn)行數(shù)據(jù)存儲(chǔ)、分析和展示澈缺。
ZipKin幾個(gè)概念
traceId:用來(lái)確定一個(gè)追蹤鏈的16字符長(zhǎng)度的字符串,在某個(gè)追蹤鏈中保持不變炕婶。
spanId:區(qū)域Id姐赡,在一個(gè)追蹤鏈中spanId可能存在多個(gè),每個(gè)spanId用于表明在某個(gè)服務(wù)中的身份柠掂,也是16字符長(zhǎng)度的字符串项滑。
parentId:在跨服務(wù)調(diào)用者的spanId會(huì)傳遞給被調(diào)用者,被調(diào)用者會(huì)將調(diào)用者的spanId作為自己的parentId涯贞,然后自己再生成spanId枪狂。
剛發(fā)起調(diào)用時(shí)traceId和spanId是一致,parentId不存在宋渔。被調(diào)用者的traceId和調(diào)用者的traceId是一致的州疾,被調(diào)用者會(huì)產(chǎn)生自己的spanId,并且被調(diào)用者的parentId是調(diào)用者的spanId皇拣。
Span模型幾乎完全仿造了Dapper中Span模型的設(shè)計(jì)严蓖,Zipkin中的span主要包含三個(gè)數(shù)據(jù)部分:基礎(chǔ)數(shù)據(jù)(包括traceId、spanId审磁、parentId谈飒、name、timestamp和duration态蒂,主要用于跟蹤樹(shù)中節(jié)點(diǎn)的關(guān)聯(lián)和界面展示)杭措、 Annotation(用來(lái)記錄請(qǐng)求特定事件相關(guān)信息)、BinaryAnnotation(提供一些額外信息钾恢,一般以key-value對(duì)出現(xiàn))手素。
裝備庫(kù)是實(shí)現(xiàn)采集的關(guān)鍵鸳址,針對(duì)不同語(yǔ)言,不同RPC框架泉懦,有不同的裝備庫(kù)實(shí)現(xiàn)稿黍,目前已有實(shí)現(xiàn)列表見(jiàn)zipkin的開(kāi)源社區(qū),其中Brave是zipkin官方提供的Java的裝備庫(kù)崩哩。一個(gè)裝備庫(kù)的實(shí)現(xiàn)需要考慮如下情況:
? 實(shí)現(xiàn)語(yǔ)言和需要裝備服務(wù)的語(yǔ)言一致
? zipkin需要核心數(shù)據(jù)結(jié)構(gòu)信息記錄巡球,包括tracerid,spanid的生成,延遲時(shí)間的計(jì)算邓嘹,事件記錄酣栈,tag記錄等
? 服務(wù)之間跟蹤信息的傳遞稱(chēng)為植入,不同RPC接口植入的方式不一樣汹押,例如HTTP接口采用B3協(xié)議植入
? 植入的信息包括:Trace Id矿筝、Span Id、Parent Id棚贾、Sampled窖维、Flags
? 可支持采樣率設(shè)置,減少跟蹤導(dǎo)致的系統(tǒng)負(fù)荷
? 可調(diào)用Transport將跟蹤信息傳給zipkin
基于Zipkin提供的裝備庫(kù)妙痹, Zipkin可實(shí)現(xiàn)調(diào)用延時(shí)分析與服務(wù)依賴(lài)關(guān)系分析兩個(gè)基本功能铸史。
調(diào)用鏈分析例子
啟動(dòng)4個(gè)服務(wù),調(diào)用關(guān)系如下:brave-webmvc-example服務(wù)調(diào)用brave-webmvc-example2细诸,brave-webmvc-example2分別調(diào)用brave-webmvc-example3和brave-webmvc-example4
[
{
"traceId": "a4aa11d855699355",
"id": "a4aa11d855699355",
"name": "get /start",
"timestamp": 1526110753393795,
"duration": 3873359,
"annotations": [
{
"timestamp": 1526110753393795,
"value": "sr",
"endpoint": {
"serviceName": "brave-webmvc-example",
"ipv4": "192.168.1.101"
}
},
{
"timestamp": 1526110757267154,
"value": "ss",
"endpoint": {
"serviceName": "brave-webmvc-example",
"ipv4": "192.168.1.101"
}
}
],
"binaryAnnotations": [
{
"key": "ca",
"value": true,
"endpoint": {
"serviceName": "",
"ipv6": "::1",
"port": 64570
}
},
{
"key": "http.method",
"value": "GET",
"endpoint": {
"serviceName": "brave-webmvc-example",
"ipv4": "192.168.1.101"
}
},
{
"key": "http.path",
"value": "/start",
"endpoint": {
"serviceName": "brave-webmvc-example",
"ipv4": "192.168.1.101"
}
},
{
"key": "mvc.controller.class",
"value": "HomeController",
"endpoint": {
"serviceName": "brave-webmvc-example",
"ipv4": "192.168.1.101"
}
},
{
"key": "mvc.controller.method",
"value": "start",
"endpoint": {
"serviceName": "brave-webmvc-example",
"ipv4": "192.168.1.101"
}
}
]
},
{
"traceId": "a4aa11d855699355",
"id": "cf49951d471ac7c5",
"name": "get /foo",
"parentId": "a4aa11d855699355",
"timestamp": 1526110753583404,
"duration": 3650640,
"annotations": [
{
"timestamp": 1526110753583404,
"value": "cs",
"endpoint": {
"serviceName": "brave-webmvc-example",
"ipv4": "192.168.1.101"
}
},
{
"timestamp": 1526110754327066,
"value": "sr",
"endpoint": {
"serviceName": "brave-webmvc-example2",
"ipv4": "192.168.1.101"
}
},
{
"timestamp": 1526110757234044,
"value": "cr",
"endpoint": {
"serviceName": "brave-webmvc-example",
"ipv4": "192.168.1.101"
}
},
{
"timestamp": 1526110757235819,
"value": "ss",
"endpoint": {
"serviceName": "brave-webmvc-example2",
"ipv4": "192.168.1.101"
}
}
],
"binaryAnnotations": [
{
"key": "ca",
"value": true,
"endpoint": {
"serviceName": "",
"ipv4": "127.0.0.1",
"port": 64578
}
},
{
"key": "http.method",
"value": "GET",
"endpoint": {
"serviceName": "brave-webmvc-example",
"ipv4": "192.168.1.101"
}
},
{
"key": "http.method",
"value": "GET",
"endpoint": {
"serviceName": "brave-webmvc-example2",
"ipv4": "192.168.1.101"
}
},
{
"key": "http.path",
"value": "/foo",
"endpoint": {
"serviceName": "brave-webmvc-example",
"ipv4": "192.168.1.101"
}
},
{
"key": "http.path",
"value": "/foo",
"endpoint": {
"serviceName": "brave-webmvc-example2",
"ipv4": "192.168.1.101"
}
},
{
"key": "mvc.controller.class",
"value": "HomeController",
"endpoint": {
"serviceName": "brave-webmvc-example2",
"ipv4": "192.168.1.101"
}
},
{
"key": "mvc.controller.method",
"value": "foo",
"endpoint": {
"serviceName": "brave-webmvc-example2",
"ipv4": "192.168.1.101"
}
}
]
},
{
"traceId": "a4aa11d855699355",
"id": "c2c029d693ecc49b",
"name": "get /bar",
"parentId": "cf49951d471ac7c5",
"timestamp": 1526110754397322,
"duration": 1583187,
"annotations": [
{
"timestamp": 1526110754397322,
"value": "cs",
"endpoint": {
"serviceName": "brave-webmvc-example2",
"ipv4": "192.168.1.101"
}
},
{
"timestamp": 1526110755367168,
"value": "sr",
"endpoint": {
"serviceName": "brave-webmvc-example3",
"ipv4": "192.168.1.101"
}
},
{
"timestamp": 1526110755810759,
"value": "ss",
"endpoint": {
"serviceName": "brave-webmvc-example3",
"ipv4": "192.168.1.101"
}
},
{
"timestamp": 1526110755980509,
"value": "cr",
"endpoint": {
"serviceName": "brave-webmvc-example2",
"ipv4": "192.168.1.101"
}
}
],
"binaryAnnotations": [
{
"key": "ca",
"value": true,
"endpoint": {
"serviceName": "",
"ipv4": "127.0.0.1",
"port": 64583
}
},
{
"key": "http.method",
"value": "GET",
"endpoint": {
"serviceName": "brave-webmvc-example2",
"ipv4": "192.168.1.101"
}
},
{
"key": "http.method",
"value": "GET",
"endpoint": {
"serviceName": "brave-webmvc-example3",
"ipv4": "192.168.1.101"
}
},
{
"key": "http.path",
"value": "/bar",
"endpoint": {
"serviceName": "brave-webmvc-example2",
"ipv4": "192.168.1.101"
}
},
{
"key": "http.path",
"value": "/bar",
"endpoint": {
"serviceName": "brave-webmvc-example3",
"ipv4": "192.168.1.101"
}
},
{
"key": "mvc.controller.class",
"value": "HomeController",
"endpoint": {
"serviceName": "brave-webmvc-example3",
"ipv4": "192.168.1.101"
}
},
{
"key": "mvc.controller.method",
"value": "bar",
"endpoint": {
"serviceName": "brave-webmvc-example3",
"ipv4": "192.168.1.101"
}
}
]
},
{
"traceId": "a4aa11d855699355",
"id": "e3968cec8747ce95",
"name": "get /tar",
"parentId": "cf49951d471ac7c5",
"timestamp": 1526110756017988,
"duration": 1194871,
"annotations": [
{
"timestamp": 1526110756017988,
"value": "cs",
"endpoint": {
"serviceName": "brave-webmvc-example2",
"ipv4": "192.168.1.101"
}
},
{
"timestamp": 1526110757081683,
"value": "sr",
"endpoint": {
"serviceName": "brave-webmvc-example4",
"ipv4": "192.168.1.101"
}
},
{
"timestamp": 1526110757212859,
"value": "cr",
"endpoint": {
"serviceName": "brave-webmvc-example2",
"ipv4": "192.168.1.101"
}
},
{
"timestamp": 1526110757222145,
"value": "ss",
"endpoint": {
"serviceName": "brave-webmvc-example4",
"ipv4": "192.168.1.101"
}
}
],
"binaryAnnotations": [
{
"key": "ca",
"value": true,
"endpoint": {
"serviceName": "",
"ipv4": "127.0.0.1",
"port": 64584
}
},
{
"key": "http.method",
"value": "GET",
"endpoint": {
"serviceName": "brave-webmvc-example4",
"ipv4": "192.168.1.101"
}
},
{
"key": "http.method",
"value": "GET",
"endpoint": {
"serviceName": "brave-webmvc-example2",
"ipv4": "192.168.1.101"
}
},
{
"key": "http.path",
"value": "/tar",
"endpoint": {
"serviceName": "brave-webmvc-example4",
"ipv4": "192.168.1.101"
}
},
{
"key": "http.path",
"value": "/tar",
"endpoint": {
"serviceName": "brave-webmvc-example2",
"ipv4": "192.168.1.101"
}
},
{
"key": "mvc.controller.class",
"value": "HomeController",
"endpoint": {
"serviceName": "brave-webmvc-example4",
"ipv4": "192.168.1.101"
}
},
{
"key": "mvc.controller.method",
"value": "tar",
"endpoint": {
"serviceName": "brave-webmvc-example4",
"ipv4": "192.168.1.101"
}
}
]
}
]
總結(jié)
通過(guò)上面的分析沛贪,有如下總結(jié):
第一,在原理上震贵,在每次業(yè)務(wù)調(diào)用發(fā)生時(shí)利赋,在源頭請(qǐng)求中產(chǎn)生一個(gè)全局唯一的TraceId,通過(guò)網(wǎng)絡(luò)依次將TraceId在調(diào)用過(guò)程中透?jìng)餍上担恳粋€(gè)調(diào)用環(huán)節(jié)都用將信息記錄到Span日志中形成上下文記錄媚送,最后通過(guò)TraceId將散落在分布式系統(tǒng)上的“孤立”上下文記錄聯(lián)系在一起,重組還原出調(diào)用鏈過(guò)程寇甸,最后根據(jù)業(yè)務(wù)的需要對(duì)調(diào)用數(shù)據(jù)進(jìn)行分析塘偎。
第二,埋點(diǎn)與數(shù)據(jù)生成上拿霉,以上的實(shí)踐均采用了基于標(biāo)注的方案吟秩,該方式的缺點(diǎn)是需要代碼植入。無(wú)處不在的埋點(diǎn)以為這無(wú)處不在的代碼植入绽淘,如何做到對(duì)業(yè)務(wù)系統(tǒng)的透明化涵防,是個(gè)巨大的挑戰(zhàn)。對(duì)于這個(gè)問(wèn)題沪铭,大多數(shù)的互聯(lián)網(wǎng)公司得益于自身的高度統(tǒng)一的架構(gòu)設(shè)計(jì)壮池,代碼的植入只需要通過(guò)在中間件植入便可偏瓤,很好地解決了代碼植入對(duì)業(yè)務(wù)部件不透明的問(wèn)題。
第三椰憋,日志的收集與存儲(chǔ)涉及到數(shù)據(jù)及時(shí)性及系統(tǒng)性能需求等問(wèn)題厅克,根據(jù)業(yè)務(wù)需求,選擇最優(yōu)的存儲(chǔ)的方案橙依。
第四证舟,調(diào)用鏈跟蹤數(shù)據(jù)規(guī)范定義與可視化,調(diào)用跟蹤數(shù)據(jù)規(guī)范為數(shù)據(jù)結(jié)構(gòu)上的規(guī)范定義票编,通過(guò)規(guī)范的定義保證生成的日志最終可以被調(diào)用跟蹤系統(tǒng)進(jìn)行分析和可視化褪储。