preamble
Recently, I've been givingopentelemetry-java-instrumentation
Submitted aPRThis is about adding four new metrics to gRPC:
-
:: Client request packet size
-
: the size of the response packet received by the client
-
: Size of the request packet received by the server
-
: the size of the request packet responded to by the server
The main purpose of this PR is to be able to get the metrics monitored inRPC
requested packet sizes, and the key here is how you can get those packet sizes.
The first support isgRPC
(currently the most used in the cloud-native space), the rest of RPC is theoretically supported:
I was also more curious during the realizationOpenTelemetry
How does the framework givegRPC
Request Creationspan
of the call chain, as shown below:
This is a gRPC remote call. java-demo is the gRPC client and k8s-combat is the gRPC server.
Before we get started we can baseOpenTelemetry
The principle of operation is a rough guess as to how it's implemented.
First we apply the premise that we can create these link messages: using theOpenTelemetry
offeredjavaagent
The rationale for this agent is that it uses thebyte-buddy Enhances the bytecode of our application by proxying business logic in those bytecodes so that we can enhance our code without affecting the business (as long as it's just creating data for spans, metrics, etc.)
Some of Spring's proxy logic is also implemented like this
gRPC Enhancement Principle
And when it comes to engineering implementations, it's best that we don't make enhancements to the business code, but rather find the extended interfaces provided by these frameworks.
apprehendgRPC
for example, we can use the cap (a poem)
interface to enhance the code.
show (a ticket).v1_6.TracingClientInterceptor
class we can see that it is the implementation of the:
And one of the most critical things to realize#interceptCall
function:
@Override
public <REQUEST, RESPONSE> ClientCall<REQUEST, RESPONSE> interceptCall(
MethodDescriptor<REQUEST, RESPONSE> method, CallOptions callOptions, Channel next) {
GrpcRequest request = new GrpcRequest(method, null, null, ());
Context parentContext = ();
if (!(parentContext, request)) {
return (method, callOptions);
}
Context context = (parentContext, request);
ClientCall<REQUEST, RESPONSE> result;
try (Scope ignored = ()) {
try {
// call other interceptors
result = (method, callOptions);
} catch (Throwable e) {
(context, request, , e);
throw e;
} }
return new TracingClientCall<>(result, parentContext, context, request);
}
This interface isgRPC
The interceptor interface provided for thegRPC
Client-side it's the methods that will be executed before and after the real network call is initiated.
So in this interface we can implement the logic to create the span to get the package size and so on.
Code enhancement with byte-buddy
One problem, though, is that we realized class needs to be added to the interceptor in order for it to work:
var managedChannel = (host, port) .intercept(new TracingClientInterceptor()) // add the interceptor
.usePlaintext()
.build();
However, in the case ofjavaagent
There is no way to add such code to the business code.
At this point it is necessary tobyte-buddy It can dynamically modify the bytecode to achieve similar effects as modifying the source code.
exist.v1_6.GrpcClientBuilderBuildInstr umentation
class can be seen in theOpenTelemetry
How to usebyte-buddy
The.
@Override
public ElementMatcher<TypeDescription> typeMatcher() {
return extendsClass(named(""))
.and(declaresField(named("interceptors")));
}
@Override
public void transform(TypeTransformer transformer) {
(
isMethod().and(named("build")),
() + "$AddInterceptorAdvice");
}
@SuppressWarnings("unused")
public static class AddInterceptorAdvice {
@(suppress = )
public static void addInterceptor(
@ ManagedChannelBuilder<?> builder,
@("interceptors") List<ClientInterceptor> interceptors) {
VirtualField<ManagedChannelBuilder<?>, Boolean> instrumented =
(, );
if (!((builder))) {
(0, GrpcSingletons.CLIENT_INTERCEPTOR);
(builder, true);
}
}
}
As you can see from the source code here, the use of thebyte-buddy
Intercepted.#intercept(<>)
function.
Functions such as #extendsClass/ isMethod are functions provided by the byte-buddy library.
And this function is exactly where we need to add interceptors to our business code.
(0, GrpcSingletons.CLIENT_INTERCEPTOR);
GrpcSingletons.CLIENT_INTERCEPTOR = new TracingClientInterceptor(clientInstrumenter, propagators);
With this line of code you can manually set theOpenTelemetry
innerTracingClientInterceptor
added to the interceptor list and as the first interceptor.
And here it is:
extendsClass(named(""))
.and(declaresField(named("interceptors")))
It's also clear by the name of the function that it's to find the function that inherits the The presence of member variables in a class
interceptors
The class.
(
isMethod().and(named("build")),
() + "$AddInterceptorAdvice");
Then after calling thebuild
function will then go to the customizedAddInterceptorAdvice
class so that you can intercept the logic that adds the interceptor and then add the custom interceptor to it.
Get attribute of span
We can also see specific attributes of this request in the gRPC link, such as:
- The IP port provided by the gRPC service.
- Response code of the request
- Requested service and method
- Threads and other information.
All of this information is critical in the problem identification process.
You can see here the newattribute
There are three main categories:
-
net.*
is a network-related attribute -
rpc.*
is a grpc-related attribute -
thread.*
is a thread-related property
So theoretically, when we design the API, it is best to decouple these different groups of attributes, and if they are MQ-related, there may be some topics and other data, so the attributes do not affect each other.
With this in mind let's look at how gRPC is implemented here.
clientInstrumenterBuilder
.setSpanStatusExtractor()
.addAttributesExtractors(additionalExtractors)
.addAttributesExtractor((rpcAttributesGetter))
.addAttributesExtractor((netClientAttributesGetter))
.addAttributesExtractor((netClientAttributesGetter))
OpenTelemetry
will provide a#addAttributesExtractor
A builder function to hold a customized property parser.
The source code here shows that the network-related and RPC-related parsers are passed in; this corresponds to the properties in the diagram, and satisfies the decoupling feature we just mentioned.
And every custom property parser needs to implement the interface
public interface AttributesExtractor<REQUEST, RESPONSE> {
}
Here we takeGrpcRpcAttributesGetter
As an example.
enum GrpcRpcAttributesGetter implements RpcAttributesGetter<GrpcRequest> {
INSTANCE;
@Override
public String getSystem(GrpcRequest request) {
return "grpc";
}
@Override
@Nullable
public String getService(GrpcRequest request) {
String fullMethodName = ().getFullMethodName();
int slashIndex = ('/');
if (slashIndex == -1) {
return null;
}
return (0, slashIndex);
}
As you can see, system is written dead.grpc
That is to say, for a page that goes to the Properties.
And here'sgetService
function is used to get the attribute, you can see that it is passed through the
gRPC
The method of the
Information to getservice
The.
public interface RpcAttributesGetter<REQUEST> {
@Nullable
String getService(REQUEST request);
}
And here.REQUEST
is actually a generalization, which in gRPC isGrpcRequest
In other RPCs, this is the data of the corresponding RPC.
this oneGrpcRequest
is created and passed in our custom interceptor.
And the request packet size I need here is also obtained in the intercept and then written into GrpcRequest.
static <T> Long getBodySize(T message) {
if (message instanceof MessageLite) {
return (long) ((MessageLite) message).getSerializedSize();
} else {
// Message is not a protobuf message
return null;
}}
This enables different RPCs to fetch their ownattribute
While each groupattribute
They are also all isolated and decoupled from each other.
Customizing metrics
The logic for customizing Metrics for each plugin is similar and requires an API interface provided by the framework level:
public InstrumenterBuilder<REQUEST, RESPONSE> addOperationMetrics(OperationMetrics factory) {
(requireNonNull(factory, "operationMetrics"));
return this;
}
// client-side metrics
.addOperationMetrics(());
// server-side metrics
.addOperationMetrics(());
These customizations are also called back later at the framework level for theOperationMetrics
:
if ( != 0) {
// operation listeners run after span start, so that they have access to the current span
// for capturing exemplars
long startNanos = getNanos(startTime);
for (int i = 0; i < ; i++) {
context = operationListeners[i].onStart(context, attributes, startNanos);
}
}
if ( != 0) {
long endNanos = getNanos(endTime);
for (int i = - 1; i >= 0; i--) {
operationListeners[i].onEnd(context, attributes, endNanos);
}
}
The most critical of these are the two functions onStart and onEnd, which will be called back at the beginning and end of the current span, respectively.
So it's common practice for theonStart
function to initialize the data, and then theonEnd
The results are tallied at the end of the process, and you end up with the data you need for metrics.
based on this The client's request elapsed time metric is an example:
@Override
public Context onStart(Context context, Attributes startAttributes, long startNanos) {
return (
RPC_CLIENT_REQUEST_METRICS_STATE,
new AutoValue_RpcClientMetrics_State(startAttributes, startNanos));
}
@Override
public void onEnd(Context context, Attributes endAttributes, long endNanos) {
State state = (RPC_CLIENT_REQUEST_METRICS_STATE);
Attributes attributes = ().toBuilder().putAll(endAttributes).build();
(
(endNanos - ()) / NANOS_PER_MS, attributes, context);
}
Record the current time at the beginning, and get the difference between the current time and the end time at the end, which is exactly the execution time of the span, that is, the processing time of the rpc client.
existOpenTelemetry
This is how the vast majority of request times are recorded.
Golang enhancements
as well asGolang
Because there is nobyte-buddy Such magical libraries exist, and it is not possible to modify the source code directly, so the usual practice still has to be hard-coded to work.
neverthelessgRPC
For example, when we create a gRPC server, we have to specify aOpenTelemetry
provided functions.
s := (
(()),
)
This SDK also implements similar logic to that in Java, so I won't go into detail for the sake of space.
summarize
that's all...gRPC
existOpenTelemetry
In the specific implementation, the main thing is to find out whether the framework needs to be enhanced to provide extended interfaces, and if so, directly use the interface to bury the point.
If not then you need to check the source code to find the core logic and then use thebyte-buddy
Make a burial point.
For example, Pulsar does not provide some extension interfaces on the client side, you can only find its core functions to bury.
And in the specific process of burying pointsOpenTelemetry
Provides a number of decoupled APIs to facilitate the realization of the business logic required to bury the point, will also continue to analyze in subsequent articlesOpenTelemetry
Some of the design principles and use of the core API.
The design of this part of the API I think isOpenTelemetry
The most rewarding part of the
Reference Links:
- /#/
- /docs/specs/semconv/rpc/rpc-metrics/#metric-rpcserverrequestsize