1、分布式監(jiān)控系統(tǒng)介紹
隨著SOA,微服務(wù)架構(gòu)及PaaS隔箍,Devops等技術(shù)的興起,線上問題的追蹤和排查變得更加困難脚乡。對(duì)線上業(yè)務(wù)的可觀測(cè)性得到了越來越多企業(yè)的重視蜒滩,由此涌現(xiàn)出了許多優(yōu)秀的鏈路追蹤及服務(wù)監(jiān)控中間件。比較流行的有Spring Cloud全家桶自帶的Zipkin奶稠,點(diǎn)評(píng)的CAT, 華為的skywalking俯艰,Uber的Jaeger, naver的Pinpoint。
一個(gè)典型的應(yīng)用锌订,通常有三種類型的數(shù)據(jù)需要被監(jiān)控系統(tǒng)記錄:Metric, logs and traces竹握。讓我們先了解下它們都是什么。
Metrics
提供進(jìn)行運(yùn)行時(shí)的指標(biāo)信息辆飘。比如CPU使用率啦辐,內(nèi)存使用情況谓传,GC情況,網(wǎng)站流量等芹关。
Logging
可以監(jiān)控程序進(jìn)程中的日志续挟,比如集成Log4j記錄的日志,或者程序運(yùn)行中發(fā)生的事件或通知侥衬。
Tracing
也叫做分布式追蹤诗祸,包含請(qǐng)求中每個(gè)子操作的開始和結(jié)束時(shí)間,傳遞的參數(shù)轴总,請(qǐng)求間的調(diào)用鏈路直颅,請(qǐng)求在各個(gè)鏈路上的耗時(shí)等信息。Tracing可以包含消息發(fā)送和接收肘习,數(shù)據(jù)庫訪問际乘,負(fù)載均衡等各種信息,讓我們可以深入了解請(qǐng)求的執(zhí)行情況漂佩。Tracing為我們提供了獲取請(qǐng)求的時(shí)間主要消耗在哪里脖含,請(qǐng)求的參數(shù)都是什么,如果發(fā)生了異常投蝉,那么異常是在哪個(gè)環(huán)節(jié)產(chǎn)生的等能力养葵。
2、opentelemetry簡介
opentelemetry是一款數(shù)據(jù)收集中間件瘩缆。我們可以使用它來生成关拒,收集和導(dǎo)出監(jiān)測(cè)數(shù)據(jù)(Metrics,Logs and traces),這些數(shù)據(jù)可供支持OpenTelemetry的中間件存儲(chǔ)庸娱,查詢和顯示着绊,用以實(shí)現(xiàn)數(shù)據(jù)觀測(cè),性能分析熟尉,系統(tǒng)監(jiān)控归露,服務(wù)告警等能力。
opentelemetry項(xiàng)目開始于2019年斤儿,旨在提供基于云環(huán)境的可觀測(cè)性軟件的標(biāo)準(zhǔn)化方案剧包,提供與三方無關(guān)的監(jiān)控服務(wù)體系。項(xiàng)目迄今為止已獲得了Zipkin, Jaeger, skywalking, Prometheus等眾多知名中間件的支持往果。
3疆液、sample項(xiàng)目
本例中,我們使用spring cloud搭建一個(gè)簡單的微服務(wù)陕贮,來體驗(yàn)下如何使用opentelemetry來進(jìn)行系統(tǒng)監(jiān)控堕油,并在兩個(gè)不同的監(jiān)控系統(tǒng)(Zipkin,Jaeger)進(jìn)行快速切換。項(xiàng)目由2個(gè)微服務(wù)馍迄,2個(gè)可視化監(jiān)控系統(tǒng)福也,并使用opentelemetry 來集成微服務(wù)和監(jiān)控系統(tǒng)。
- gateway-service -使用spring cloud gateway搭建的服務(wù)網(wǎng)關(guān)
- cloud-user-service -用戶微服務(wù),使用Spring boot + spring mvc
- Zipkin - Zipkin監(jiān)控系統(tǒng)服務(wù)端
-
Jaeger - Jaeger監(jiān)控系統(tǒng)服務(wù)端
4攀圈、使用opentelemetry 集成Zipkin
示例中使用到的組件的版本:
java: 1.8
spring-cloud: 2020.0.2
spring-boot: 2.4.5
opentelemetry: 1.1.0
grpc: 1.36.1
4.1、cloud-user-service服務(wù)maven配置
引入Spring cloud 和 opentelemetry
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-dependencies</artifactId>
<version>${spring-cloud.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-bom</artifactId>
<version>${opentelemetry.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
加入opentelemetry依賴項(xiàng)
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-api</artifactId>
</dependency>
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-sdk</artifactId>
</dependency>
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-exporter-otlp</artifactId>
</dependency>
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-semconv</artifactId>
<version>1.1.0-alpha</version>
</dependency>
<dependency>
<groupId>io.grpc</groupId>
<artifactId>grpc-protobuf</artifactId>
<version>${grpc.version}</version>
</dependency>
<dependency>
<groupId>io.grpc</groupId>
<artifactId>grpc-netty-shaded</artifactId>
<version>${grpc.version}</version>
</dependency>
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-exporter-zipkin</artifactId>
</dependency>
4.2峦甩、配置opentelemetry
@Configuration
public class TraceConfig {
private static final String ENDPOINT_V2_SPANS = "/api/v2/spans";
private final AppConfig appConfig;
@Autowired
public TraceConfig(AppConfig appConfig) {
this.appConfig = appConfig;
}
@Bean
public OpenTelemetry openTelemetry() {
SpanProcessor spanProcessor = getOtlpProcessor();
Resource serviceNameResource = Resource.create(Attributes.of(ResourceAttributes.SERVICE_NAME, appConfig.getApplicationName()));
// Set to process the spans by the Zipkin Exporter
SdkTracerProvider tracerProvider =
SdkTracerProvider.builder()
.addSpanProcessor(spanProcessor)
.setResource(Resource.getDefault().merge(serviceNameResource))
.build();
OpenTelemetrySdk openTelemetry =
OpenTelemetrySdk.builder().setTracerProvider(tracerProvider)
.setPropagators(ContextPropagators.create(W3CTraceContextPropagator.getInstance()))
.buildAndRegisterGlobal();
// add a shutdown hook to shut down the SDK
Runtime.getRuntime().addShutdownHook(new Thread(tracerProvider::close));
// return the configured instance so it can be used for instrumentation.
return openTelemetry;
}
private SpanProcessor getZipkinProcessor() {
String host = "localhost";
int port = 9411;
String httpUrl = String.format("http://%s:%s", host, port);
ZipkinSpanExporter zipkinExporter = ZipkinSpanExporter.builder().setEndpoint(httpUrl + ENDPOINT_V2_SPANS).build();
return SimpleSpanProcessor.create(zipkinExporter);
}
}
4.3赘来、在cloud-user-service中,使用opentelemetry
當(dāng)我們完成了配置后凯傲,就可以在spring boot項(xiàng)目中犬辰,通過autowired來使用opentelemetry。
接下來我們定制一個(gè)WebFilter來攔截所有的Http請(qǐng)求冰单,并在Filter類中進(jìn)行埋點(diǎn)幌缝。
@Component
public class TracingFilter implements Filter {
private final AppConfig appConfig;
private final OpenTelemetry openTelemetry;
@Autowired
public TracingFilter(AppConfig appConfig, OpenTelemetry openTelemetry) {
this.appConfig = appConfig;
this.openTelemetry = openTelemetry;
}
@Override
public void doFilter(ServletRequest servletRequest, ServletResponse servletResponse, FilterChain filterChain) throws IOException, ServletException {
HttpServletRequest httpServletRequest = (HttpServletRequest)servletRequest;
Span span = getServerSpan(openTelemetry.getTracer(appConfig.getApplicationName()), httpServletRequest);
try (Scope scope = span.makeCurrent()) {
filterChain.doFilter(servletRequest, servletResponse);
} catch (Exception ex) {
span.setStatus(StatusCode.ERROR, "HTTP Code: " + ((HttpServletResponse)servletResponse).getStatus());
span.recordException(ex);
throw ex;
} finally {
span.end();
}
}
private Span getServerSpan(Tracer tracer, HttpServletRequest httpServletRequest) {
TextMapPropagator textMapPropagator = openTelemetry.getPropagators().getTextMapPropagator();
Context context = textMapPropagator.extract(Context.current(), httpServletRequest, new TextMapGetter<HttpServletRequest>() {
@Override
public Iterable<String> keys(HttpServletRequest request) {
List<String> headers = new ArrayList();
for (Enumeration names = request.getHeaderNames(); names.hasMoreElements();) {
String name = (String)names.nextElement();
headers.add(name);
}
return headers;
}
@Override
public String get(HttpServletRequest request, String s) {
return request.getHeader(s);
}
});
return tracer.spanBuilder(httpServletRequest.getRequestURI()).setParent(context).setSpanKind(SpanKind.SERVER).setAttribute(SemanticAttributes.HTTP_METHOD, httpServletRequest.getMethod()).startSpan();
}
}
在示例代碼中,我們實(shí)現(xiàn)了一個(gè)匿名類來從HttpServletRequest中解析tracing上下文信息诫欠。
在創(chuàng)建Span的同時(shí)涵卵,我們?cè)赟pan中寫入了Http請(qǐng)求的一些關(guān)鍵屬性,并且為所有的異常做了跟蹤記錄荒叼。
4.4轿偎、編寫服務(wù)代碼
接下來我們通過一段簡單的代碼來模擬查詢用戶以及拋出異常
@GetMapping("/{id}")
public ResponseEntity<User> get(@PathVariable("id") Long id) {
if (0 >= id) {
throw new IllegalArgumentException("Illegal argument value");
}
return ResponseEntity.ok(userService.get(id));
}
4.5、配置gateway-service
我們使用和cloud-user-service同樣的配置來配置gateway-service被廓。
4.6坏晦、在gateway-service中,集成opentelemetry
這里和cloud-user-service有些不同嫁乘,由于gateway-service是基于webflux構(gòu)建的昆婿。我們這次使用WebFilter和GlobalFilter來攔截網(wǎng)關(guān)上的http請(qǐng)求。
在WebFilter中蜓斧,添加opentelemetry來記錄收到的http請(qǐng)求
@Override
public Mono<Void> filter(ServerWebExchange serverWebExchange, WebFilterChain webFilterChain) {
ServerHttpRequest serverHttpRequest = serverWebExchange.getRequest();
Span span = getServerSpan(openTelemetry.getTracer(appConfig.getApplicationName()), serverHttpRequest);
Scope scope = span.makeCurrent();
serverWebExchange.getResponse().getHeaders().add("traceId", span.getSpanContext().getTraceId());
span.setAttribute("params", serverHttpRequest.getQueryParams().toString());
return webFilterChain.filter(serverWebExchange)
.doFinally((signalType) -> {
scope.close();
span.end();
})
.doOnError(span::recordException);
}
private Span getServerSpan(Tracer tracer, ServerHttpRequest serverHttpRequest) {
return tracer.spanBuilder(serverHttpRequest.getPath().toString()).setNoParent().setSpanKind(SpanKind.SERVER).setAttribute(SemanticAttributes.HTTP_METHOD, serverHttpRequest.getMethod().name()).startSpan();
}
接下來在GlobalFilter中仓蛆,記錄路由到微服務(wù)的http請(qǐng)求
@Override
public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain gatewayFilterChain) {
Span span = getClientSpan(openTelemetry.getTracer(appConfig.getApplicationName()), exchange);
Scope scope = span.makeCurrent();
inject(exchange);
return gatewayFilterChain.filter(exchange)
.then(Mono.fromRunnable(() -> {
scope.close();
span.end();
})
);
}
private void inject(ServerWebExchange serverWebExchange) {
HttpHeaders httpHeaders = new HttpHeaders();
TextMapPropagator textMapPropagator = openTelemetry.getPropagators().getTextMapPropagator();
textMapPropagator.inject(Context.current(), httpHeaders, HttpHeaders::add);
ServerHttpRequest request = serverWebExchange.getRequest().mutate()
.headers(headers -> headers.addAll(httpHeaders))
.build();
serverWebExchange.mutate().request(request).build();
}
private Span getClientSpan(Tracer tracer, ServerWebExchange serverWebExchange) {
ServerHttpRequest serverHttpRequest = serverWebExchange.getRequest();
URI routeUri = serverWebExchange.getAttribute(ServerWebExchangeUtils.GATEWAY_REQUEST_URL_ATTR);
return tracer.spanBuilder(routeUri.getPath()).setSpanKind(SpanKind.CLIENT).setAttribute(SemanticAttributes.HTTP_METHOD, serverHttpRequest.getMethod().name()).startSpan();
}
為了傳遞tracing的上下文信息,我們需要調(diào)用inject方法法精,把tracing上下文信息寫入到路由請(qǐng)求的頭信息里面多律。
5、運(yùn)行服務(wù)
現(xiàn)在搂蜓,讓我們?cè)L問網(wǎng)關(guān)http://localhost:8080/user/0 來觀察Zipkin對(duì)于服務(wù)訪問和異常的記錄情況狼荞。
可以看到在Tracing方面,Zikin整體表現(xiàn)還不錯(cuò)帮碰,有異常的鏈路也使用紅色做了標(biāo)記相味。Zipkin沒有打印出異常的堆棧信息,我們需要為此做額外的處理才行殉挽。
6丰涉、使用Jaeger對(duì)接opentelemetry
使用otlp exporter來替換之前使用的zipkin exporter拓巧。
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-exporter-otlp</artifactId>
</dependency>
在配置類中,使用otlp processor替換之前的zipkin processor一死。這樣就完成了Zipkin到Jaeger的切換肛度。
private SpanProcessor getOtlpProcessor(){
OtlpGrpcSpanExporter spanExporter = OtlpGrpcSpanExporter.builder().setTimeout(2, TimeUnit.SECONDS).build();
return BatchSpanProcessor.builder(spanExporter)
.setScheduleDelay(100, TimeUnit.MILLISECONDS)
.build();
}
7、再次運(yùn)行服務(wù)
我們?cè)俅芜\(yùn)行服務(wù)并訪問網(wǎng)關(guān)http://localhost:8080/user/0 來觀察Jaeger對(duì)于服務(wù)訪問和異常的記錄情況投慈。
首先看主界面承耿,Jaeger直接標(biāo)記了請(qǐng)求中包含異常。
再看下訪問的詳情伪煤,Jaeger記錄并顯示了異常的堆棧信息加袋。這對(duì)我們分析線上異常非常有幫助。
對(duì)比Zipkin抱既,Jaeger提供了更加豐富的功能和更美觀的可視化界面职烧。
8、總結(jié)
本文介紹了使用opentelemetry 來搭建監(jiān)控系統(tǒng)防泵,以及如何集成到Zipkin和Jaeger蚀之。
利用opentelemetry的標(biāo)準(zhǔn)化能力,我們可以方便地記錄更加詳細(xì)的鏈路監(jiān)控信息择克。
opentelemetry自推出以來恬总,得到了越來越多廠商的關(guān)注和支持。對(duì)于分布式監(jiān)控系統(tǒng)這個(gè)新生事物肚邢,opentelemetry是否能成為最終的事實(shí)標(biāo)準(zhǔn)壹堰,讓我們拭目以待。
References
5 Reasons why OpenTelemetry will boost Observability and Monitoring