APM全鏈路監(jiān)控：Skywalking出識（1）

一、概念與設(shè)計總覽

SkyWalking: 一個開源的可觀測平臺, 用于從服務(wù)和云原生基礎(chǔ)設(shè)施收集, 分析, 聚合及可視化數(shù)據(jù)捐腿。SkyWalking 提供了一種簡便的方式來清晰地觀測分布式系統(tǒng), 甚至橫跨多個云平臺譬嚣。SkyWalking 更是一個現(xiàn)代化的應(yīng)用程序性能監(jiān)控(Application Performance Monitoring)系統(tǒng), 尤其專為云原生、基于容器的分布式系統(tǒng)設(shè)計

二盒至、基本釋義

1缺谴、基本架構(gòu)

Agent

? ? ? ? 負責從應(yīng)用中，收集鏈路信息女责，發(fā)送給 SkyWalking OAP 服務(wù)器漆枚。目前支持 SkyWalking、Zikpin抵知、Jaeger 等提供的 Tracing 數(shù)據(jù)信息墙基。而我們目前采用的是，SkyWalking Agent 收集 SkyWalking Tracing 數(shù)據(jù)刷喜，傳遞給服務(wù)器碘橘。

SkyWalking OAP

? ? ? ? 負責接收 Agent 發(fā)送的 Tracing 數(shù)據(jù)信息，然后進行分析(Analysis Core) 吱肌，存儲到外部存儲器( Storage )，最終提供查詢( Query )功能仰禽。

Storage

? ? ? ? Tracing 數(shù)據(jù)存儲氮墨。目前支持 ES、MySQL吐葵、Sharding Sphere规揪、TiDB、H2 多種存儲器温峭。而我們目前采用的是 mysql猛铅。

SkyWalking UI

? ? ? ? 負責提供控臺，查看鏈路等等凤藏。

2奸忽、常見術(shù)語及概念

服務(wù)(Service) ：表示對請求提供相同行為的一系列或一組工作負載（同一應(yīng)用名稱）堕伪。

服務(wù)實例(Service Instance) ：一組工作負載中的每一個工作負載稱為一個實例。就像 Kubernetes 中的 pods 一樣, 服務(wù)實例未必就是操作系統(tǒng)上的一個進程栗菜。但當你在使用 Agent 的時候, 一個服務(wù)實例實際就是操作系統(tǒng)上的一個真實進程欠雌。

這里，我們可以看到 Spring Boot 應(yīng)用的服務(wù)為 {agent_name}-pid:{pid}@{hostname}疙筹，由 Agent 自動生成富俄。關(guān)于它，我們在「5.1 hostname」小節(jié)中而咆，有進一步的講解霍比，胖友可以瞅瞅。

端點(Endpoint)：對于特定服務(wù)所接收的請求路徑, 如 HTTP 的 URI 路徑和 gRPC 服務(wù)的類名 + 方法簽名暴备。

這里悠瞬，我們可以看到 Spring Boot 應(yīng)用的一個端點，為 API 接口 /demo/echo馍驯。

三阁危、UI視圖與Mysql存儲介紹

OAL數(shù)據(jù)解析語言語法及示例

語法

**// 聲明一個指標**

METRICS_NAME = from(SCOPE.(* | [FIELD][,FIELD ...])) // 從某一個SCOPE中獲取數(shù)據(jù)

[.filter(FIELD OP [INT | STRING])] **// 過濾掉部分數(shù)據(jù)**

.FUNCTION([PARAM][, PARAM ...]) **// 使用某個聚合函數(shù)將數(shù)據(jù)聚合**

**// 禁用一個指標**

disable(METRICS_NAME);

示例：

// 從ServiceInstanceJVMMemory的used獲取數(shù)據(jù)，只需要 heapStatus 為 true的數(shù)據(jù)汰瘫，并取long型的平均值

instance_jvm_memory_heap = from(ServiceInstanceJVMMemory.used).filter(heapStatus == true).longAvg();

常用術(shù)語

CPM:? ? 吞吐量狂打，表示每分鐘的調(diào)用.

Apdex:? ? 分數(shù),參考Apdex in WIKI

percentile:? ? 響應(yīng)時間百分比，包括 p99, p95, p90, p75, p50.參考percentile in WIKI

SLA:? ? 表示成功率混弥。對于HTTP趴乡，表示響應(yīng)為200的請求

常用表釋義

，記錄了service,instance,endpoint信息

表名 header 2

service_traffic

instance_traffic

endpoint_traffic

1蝗拿、Dashboard-APM-Global

Service Load(CPM/PPM) 服務(wù)每分鐘請求數(shù),指標 service_cpm

表名取數(shù)方式備注

service_cpm service_cpm = from(Service.*).cpm() 展示方式：get sorted top N values

latency：延遲

Slow Services 慢響應(yīng)服務(wù)晾捏，單位ms,指標 service_resp_time

表名取數(shù)方式備注

service_resp_time service_resp_time = from(Service.latency).longAvg() 服務(wù)域內(nèi)取出延遲平均值

Apdex 服務(wù)網(wǎng)格健康度

Un-Health Services (Apdex) Apdex性能指標，1為滿分哀托，指標service_apdex

表名取數(shù)方式備注

service_apdex service_apdex = from(Service.latency).apdex(name, status) 展示方式： get sorted top N values（edit界面可以看的到）

Slow Endpoints 慢端口惦辛，指標 endpoint_avg

表名取數(shù)方式

endpoint_avg endpoint_avg = from(Endpoint.latency).longAvg()

percentile 百分位

Global Response Latency 百分比響應(yīng)延時，不同百分比的延時時間仓手，單位ms胖齐。指標all_percentile

表名取數(shù)方式備注

all_percentile all_percentile = from(All.latency).percentile(10) // Multiple values including p50, p75, p90, p95, p99 延遲數(shù)據(jù)所占百分位

Global Heatmap 服務(wù)響應(yīng)時間熱力分布圖

表名取數(shù)方式備注

all_heatmap all_heatmap = from(All.latency).histogram(100, 20);

2、Dashboard-APM-Service

duration 持續(xù)時間

Service Apdex 服務(wù)網(wǎng)格健康度（1為滿分）嗽冒，指標service_apdex呀伙。此處兩個展圖，分別選擇不同的方式一個展示添坊，持續(xù)期間的single value剿另，一個持續(xù)期間all value

表名取數(shù)方式備注

service_apdex service_apdex = from(Service.latency).apdex(name, status) Global界面 read the single value in the duration(read all values in the duration)（edit界面可以看的到）

Service Avg Response Time 平均響應(yīng)延時，指標：service_resp_time，詳Global內(nèi)介紹雨女，此處展示單個服務(wù)持續(xù)時間內(nèi)的響應(yīng)狀態(tài)

表名取數(shù)方式備注

service_resp_time service_resp_time = from(Service.latency).longAvg() 服務(wù)域內(nèi)取出延遲平均值 read all values in the duration

Successful Rate 服務(wù)請求成功率谚攒，指標：service_sla

表名取數(shù)方式備注

service_sla service_sla = from(Service.*).percent(status == true) 展示方式：read the single value in the duration（read all values in the duration）

Service Load 每分鐘請求數(shù)，指標：service_cpm

表名取數(shù)方式備注

service_cpm service_cpm = from(Service.*).cpm() 展示方式：read the single value in the duration(read all values in the duration)

Service Throughput 每分鐘請求數(shù)戚篙，指標：service_throughput_received,service_throughput_sent

表名取數(shù)方式備注

---- service_throughput_received = from(Service.tcpInfo.receivedBytes).filter(type == RequestType.TCP).longAvg()

service_throughput_sent = from(Service.tcpInfo.sentBytes).filter(type == RequestType.TCP).longAvg() 展示方式：read all values in the duration

Service Instances Load? 每分鐘請求數(shù)五鲫，指標：service_instance_cpm

表名取數(shù)方式備注

service_instance_cpm service_instance_cpm = from(ServiceInstance.*).cpm() 展示方式：get sorted top N values

Slow Service Instance? 慢服務(wù)實例，指標：service_instance_resp_time

表名取數(shù)方式備注

service_instance_resp_time service_instance_resp_time= from(ServiceInstance.latency).longAvg() 展示方式：get sorted top N values

Service Instance Successful Rate? 每個服務(wù)實例請求成功率岔擂，指標：service_instance_sla

表名取數(shù)方式備注

service_instance_sla service_instance_sla = from(ServiceInstance.*).percent(status == true) 展示方式：get sorted top N values

3位喂、Dashboard-APM-Instance

Service Instance Load? 當前實例每分鐘請求數(shù)，指標：service_instance_cpm

表名取數(shù)方式備注

service_instance_cpm service_instance_cpm = from(ServiceInstance.*).cpm() 展示方式：read all values in the duration

Throughput 吞吐量

Service Instance Throughput? 當前實例吞吐量乱灵，指標：service_instance_throughput_received,service_instance_throughput_sent

表名取數(shù)方式備注

---- service_instance_throughput_received = from(ServiceInstance.tcpInfo.receivedBytes).filter(type == RequestType.TCP).longAvg(),

service_instance_throughput_sent = from(ServiceInstance.tcpInfo.sentBytes).filter(type == RequestType.TCP).longAvg() 展示方式：read all values in the duration

Service Instance Latency? 當前實例請求延遲情況塑崖，指標：service_instance_resp_time

表名取數(shù)方式備注

service_instance_resp_time service_instance_resp_time= from(ServiceInstance.latency).longAvg() 展示方式：read all values in the duration

JVM CPU (Java Service)? jvm占用CPU的百分比，指標：instance_jvm_cpu

表名取數(shù)方式備注

instance_jvm_cpu instance_jvm_cpu = from(ServiceInstanceJVMCPU.usePercent).doubleAvg() 展示方式：read all values in the duration

JVM Memory (Java Service)? JVM內(nèi)存占用大小痛倚，單位m规婆，指標：instance_jvm_memory_heap, instance_jvm_memory_heap_max,instance_jvm_memory_noheap, instance_jvm_memory_noheap_max

表名取數(shù)方式備注

instance_jvm_memory_heap,

instance_jvm_memory_heap_max,

instance_jvm_memory_noheap,

instance_jvm_memory_noheap_max instance_jvm_memory_heap = from(ServiceInstanceJVMMemory.used).filter(heapStatus == true).longAvg();

instance_jvm_memory_noheap = from(ServiceInstanceJVMMemory.used).filter(heapStatus == false).longAvg();

instance_jvm_memory_heap_max = from(ServiceInstanceJVMMemory.max).filter(heapStatus == true).longAvg();

instance_jvm_memory_noheap_max = from(ServiceInstanceJVMMemory.max).filter(heapStatus == false).longAvg(); 展示方式：read all values in the duration

JVM Class Count (Java Service)? jvm class 統(tǒng)計，指標：instance_jvm_class_loaded_class_count, instance_jvm_class_total_unloaded_class_count, instance_jvm_class_total_loaded_class_count

表名取數(shù)方式備注

instance_jvm_class_loaded_class_count,

instance_jvm_class_total_unloaded_class_count,

instance_jvm_class_total_loaded_class_count instance_jvm_class_loaded_class_count = from(ServiceInstanceJVMClass.loadedClassCount).longAvg();

instance_jvm_class_total_unloaded_class_count = from(ServiceInstanceJVMClass.totalUnloadedClassCount).longAvg();

instance_jvm_class_total_loaded_class_count = from(ServiceInstanceJVMClass.totalLoadedClassCount).longAvg(); 展示方式：read all values in the duration

CLR CPU (.NET Service)? CLR .NET相關(guān)暫不做解釋蝉稳，指標：instance_clr_cpu

表名取數(shù)方式備注

instance_clr_cpu instance_clr_cpu = from(ServiceInstanceCLRCPU.usePercent).doubleAvg(); 展示方式：get sorted top N values

4抒蚜、Dashboard-APM-Endpoints

Endpoint Load in Current Service? 每個端點的每分鐘請求數(shù)，指標：endpoint_cpm

表名取數(shù)方式備注

endpoint_cpm endpoint_cpm = from(Endpoint.*).cpm(); 展示方式：get sorted top N values

Slow Endpoints in Current Service? 端點的慢請求時間排行耘戚，單位ms嗡髓，指標：endpoint_avg

表名取數(shù)方式備注

endpoint_avg endpoint_avg = from(Endpoint.latency).longAvg(); 展示方式：get sorted top N values

Successful Rate in Current Service? 每個端點的請求成功率，指標：endpoint_sla

表名取數(shù)方式備注

endpoint_sla endpoint_sla = from(Endpoint.*).percent(status == true); 展示方式：get sorted top N values

Endpoint Load? 每個端點的每分鐘請求數(shù)收津，指標：endpoint_cpm

表名取數(shù)方式備注

endpoint_cpm endpoint_cpm = from(Endpoint.*).cpm(); 展示方式：read all values in the duration

Endpoint Avg Response Time? 當前端點每個時間段的請求行響應(yīng)時間饿这，單位ms，指標：endpoint_avg

表名取數(shù)方式備注

endpoint_avg endpoint_avg = from(Endpoint.latency).longAvg(); 展示方式：read all values in the duration

Endpoint Response Time Percentile? 當前端點每個時間段的響應(yīng)時間占比撞秋，單位ms长捧，指標：endpoint_percentile

表名取數(shù)方式備注

endpoint_percentile endpoint_percentile = from(Endpoint.latency).percentile(10); // Multiple values including p50, p75, p90, p95, p99 展示方式：read all values in the duration

Endpoint Successful Rate? 當前端點每個時間段的請求成功率，指標：endpoint_sla

表名取數(shù)方式備注

endpoint_sla endpoint_sla = from(Endpoint.*).percent(status == true); 展示方式：read all values in the duration

優(yōu)化配置

1吻贿、修改采樣頻率

具體配置在config/application.yml文件中receiver-trace模塊串结。

默認配置10000，采樣率精確到1/10000舅列，即10000 * 1/10000 = 1 = 100%奉芦。

假設(shè)我們設(shè)計采樣50%，那么設(shè)置為5000剧蹂，具體如下：

receiver-trace:

? selector: ${SW_RECEIVER_TRACE:default}

? default:

? ? sampleRate: ${SW_TRACE_SAMPLE_RATE:5000}

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末，一起剝皮案震驚了整個濱河市烦却，隨后出現(xiàn)的幾起案子宠叼，更是在濱河造成了極大的恐慌，老刑警劉巖，帶你破解...
沈念sama閱讀 216,744評論 6贊 502
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件冒冬，死亡現(xiàn)場離奇詭異伸蚯，居然都是意外死亡，警方通過查閱死者的電腦和手機简烤，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 92,505評論 3贊 392
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進店門剂邮，熙熙樓的掌柜王于貴愁眉苦臉地迎上來，“玉大人横侦，你說我怎么就攤上這事挥萌。” “怎么了枉侧？”我有些...
開封第一講書人閱讀 163,105評論 0贊 353
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵引瀑，是天一觀的道長。經(jīng)常有香客問我榨馁，道長憨栽，這世上最難降的妖魔是什么？我笑而不...
開封第一講書人閱讀 58,242評論 1贊 292
?港島之戀（遺憾婚禮）
正文為了忘掉前任翼虫，我火速辦了婚禮屑柔，結(jié)果婚禮上，老公的妹妹穿的比我還像新娘珍剑。我一直安慰自己掸宛，他們只是感情好，可當我...
茶點故事閱讀 67,269評論 6贊 389
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布次慢。她就那樣靜靜地躺著旁涤，像睡著了一般。火紅的嫁衣襯著肌膚如雪迫像。梳的紋絲不亂的頭發(fā)上劈愚，一...
開封第一講書人閱讀 51,215評論 1贊 299
城市分裂傳說
那天，我揣著相機與錄音闻妓，去河邊找鬼菌羽。笑死，一個胖子當著我的面吹牛由缆，可吹牛的內(nèi)容都是我干的注祖。我是一名探鬼主播，決...
沈念sama閱讀 40,096評論 3贊 418
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼均唉，長吁一口氣：“原來是場噩夢啊……” “哼是晨！你這毒婦竟也來了？” 一聲冷哼從身側(cè)響起舔箭，我...
開封第一講書人閱讀 38,939評論 0贊 274
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤罩缴，失蹤者是張志新（化名）和其女友劉穎蚊逢，沒想到半個月后，有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體箫章，經(jīng)...
沈念sama閱讀 45,354評論 1贊 311
?護林員之死
正文獨居荒郊野嶺守林人離奇死亡烙荷，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點故事閱讀 37,573評論 2贊 333
?白月光啟示錄
正文我和宋清朗相戀三年，在試婚紗的時候發(fā)現(xiàn)自己被綠了檬寂。大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片终抽。...
茶點故事閱讀 39,745評論 1贊 348
活死人
序言：一個原本活蹦亂跳的男人離奇死亡，死狀恐怖桶至，靈堂內(nèi)的尸體忽然破棺而出昼伴，到底是詐尸還是另有隱情，我是刑警寧澤塞茅，帶...
沈念sama閱讀 35,448評論 5贊 344
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布亩码，位于F島的核電站，受9級特大地震影響野瘦，放射性物質(zhì)發(fā)生泄漏描沟。R本人自食惡果不足惜，卻給世界環(huán)境...
茶點故事閱讀 41,048評論 3贊 327
男人毒藥：我在死后第九天來索命
文/蒙蒙一鞭光、第九天我趴在偏房一處隱蔽的房頂上張望吏廉。院中可真熱鬧，春花似錦惰许、人聲如沸席覆。這莊子的主人今日做“春日...
開封第一講書人閱讀 31,683評論 0贊 22
一樁弒父案汹买，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽佩伤。三九已至，卻和暖如春晦毙，著一層夾襖步出監(jiān)牢的瞬間生巡，已是汗流浹背。一陣腳步聲響...
開封第一講書人閱讀 32,838評論 1贊 269
情欲美人皮
我被黑心中介騙來泰國打工见妒，沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留孤荣，地道東北人。一個月前我還...
沈念sama閱讀 47,776評論 2贊 369
代替公主和親
正文我出身青樓须揣，卻偏偏與公主長得像盐股，于是被迫代替她去往敵國和親。傳聞我的和親對象是個殘疾皇子耻卡，可洞房花燭夜當晚...
茶點故事閱讀 44,652評論 2贊 354

APM全鏈路監(jiān)控：Skywalking出識（1）

推薦閱讀更多精彩內(nèi)容