Location>code7788 >text

OpenTelemetry Practice: Principles of gRPC Monitoring Implementation

Popularity:9 ℃/2024-09-04 09:56:49

preamble

Recently, I've been givingopentelemetry-java-instrumentation Submitted aPRThis is about adding four new metrics to gRPC:

  • :: Client request packet size
  • : the size of the response packet received by the client
  • : Size of the request packet received by the server
  • : the size of the request packet responded to by the server

The main purpose of this PR is to be able to get the metrics monitored inRPC requested packet sizes, and the key here is how you can get those packet sizes.

The first support isgRPC(currently the most used in the cloud-native space), the rest of RPC is theoretically supported:

I was also more curious during the realizationOpenTelemetry How does the framework givegRPC Request Creationspan of the call chain, as shown below:

This is a gRPC remote call. java-demo is the gRPC client and k8s-combat is the gRPC server.

Before we get started we can baseOpenTelemetry The principle of operation is a rough guess as to how it's implemented.

First we apply the premise that we can create these link messages: using theOpenTelemetry offeredjavaagentThe rationale for this agent is that it uses thebyte-buddy Enhances the bytecode of our application by proxying business logic in those bytecodes so that we can enhance our code without affecting the business (as long as it's just creating data for spans, metrics, etc.)

Some of Spring's proxy logic is also implemented like this

gRPC Enhancement Principle

And when it comes to engineering implementations, it's best that we don't make enhancements to the business code, but rather find the extended interfaces provided by these frameworks.

apprehendgRPC for example, we can use the cap (a poem) interface to enhance the code.

show (a ticket).v1_6.TracingClientInterceptor class we can see that it is the implementation of the

And one of the most critical things to realize#interceptCall function:

@Override  
public <REQUEST, RESPONSE> ClientCall<REQUEST, RESPONSE> interceptCall(  
    MethodDescriptor<REQUEST, RESPONSE> method, CallOptions callOptions, Channel next) {  
  GrpcRequest request = new GrpcRequest(method, null, null, ());  
  Context parentContext = ();  
  if (!(parentContext, request)) {  
    return (method, callOptions);  
  }  
  Context context = (parentContext, request);  
  ClientCall<REQUEST, RESPONSE> result;  
  try (Scope ignored = ()) {  
    try {  
      // call other interceptors  
      result = (method, callOptions);  
    } catch (Throwable e) {  
      (context, request, , e);  
      throw e;  
    }  }  
  return new TracingClientCall<>(result, parentContext, context, request);  
}

This interface isgRPC The interceptor interface provided for thegRPC Client-side it's the methods that will be executed before and after the real network call is initiated.

So in this interface we can implement the logic to create the span to get the package size and so on.

Code enhancement with byte-buddy

One problem, though, is that we realized class needs to be added to the interceptor in order for it to work:

var managedChannel = (host, port) .intercept(new TracingClientInterceptor()) // add the interceptor
.usePlaintext()
.build();

However, in the case ofjavaagent There is no way to add such code to the business code.

At this point it is necessary tobyte-buddy It can dynamically modify the bytecode to achieve similar effects as modifying the source code.

exist.v1_6.GrpcClientBuilderBuildInstr umentation class can be seen in theOpenTelemetry How to usebyte-buddy The.

  @Override
  public ElementMatcher<TypeDescription> typeMatcher() {
    return extendsClass(named(""))
        .and(declaresField(named("interceptors")));
  }

  @Override
  public void transform(TypeTransformer transformer) {
    (
        isMethod().and(named("build")),
        () + "$AddInterceptorAdvice");
  }

  @SuppressWarnings("unused")
  public static class AddInterceptorAdvice {

    @(suppress = )
    public static void addInterceptor(
        @ ManagedChannelBuilder<?> builder,
        @("interceptors") List<ClientInterceptor> interceptors) {
      VirtualField<ManagedChannelBuilder<?>, Boolean> instrumented =
          (, );
      if (!((builder))) {
        (0, GrpcSingletons.CLIENT_INTERCEPTOR);
        (builder, true);
      }
    }
  }

As you can see from the source code here, the use of thebyte-buddy Intercepted.#intercept(<>) function.

Functions such as #extendsClass/ isMethod are functions provided by the byte-buddy library.

And this function is exactly where we need to add interceptors to our business code.

(0, GrpcSingletons.CLIENT_INTERCEPTOR);
GrpcSingletons.CLIENT_INTERCEPTOR = new TracingClientInterceptor(clientInstrumenter, propagators);

With this line of code you can manually set theOpenTelemetry innerTracingClientInterceptor added to the interceptor list and as the first interceptor.

And here it is:

extendsClass(named(""))
        .and(declaresField(named("interceptors")))

It's also clear by the name of the function that it's to find the function that inherits the The presence of member variables in a classinterceptors The class.

(  
    isMethod().and(named("build")),  
    () + "$AddInterceptorAdvice");

Then after calling thebuild function will then go to the customizedAddInterceptorAdvice class so that you can intercept the logic that adds the interceptor and then add the custom interceptor to it.

Get attribute of span

We can also see specific attributes of this request in the gRPC link, such as:

  • The IP port provided by the gRPC service.
  • Response code of the request
  • Requested service and method
  • Threads and other information.

All of this information is critical in the problem identification process.

You can see here the newattribute There are three main categories:

  • net.* is a network-related attribute
  • rpc.* is a grpc-related attribute
  • thread.* is a thread-related property

So theoretically, when we design the API, it is best to decouple these different groups of attributes, and if they are MQ-related, there may be some topics and other data, so the attributes do not affect each other.

With this in mind let's look at how gRPC is implemented here.

clientInstrumenterBuilder
	.setSpanStatusExtractor()
	.addAttributesExtractors(additionalExtractors)
        .addAttributesExtractor((rpcAttributesGetter))
        .addAttributesExtractor((netClientAttributesGetter))
        .addAttributesExtractor((netClientAttributesGetter))

OpenTelemetry will provide a#addAttributesExtractorA builder function to hold a customized property parser.

The source code here shows that the network-related and RPC-related parsers are passed in; this corresponds to the properties in the diagram, and satisfies the decoupling feature we just mentioned.

And every custom property parser needs to implement the interface

public interface AttributesExtractor<REQUEST, RESPONSE> {
}

Here we takeGrpcRpcAttributesGetter As an example.

enum GrpcRpcAttributesGetter implements RpcAttributesGetter<GrpcRequest> {
  INSTANCE;

  @Override
  public String getSystem(GrpcRequest request) {
    return "grpc";
  }

  @Override
  @Nullable
  public String getService(GrpcRequest request) {
    String fullMethodName = ().getFullMethodName();
    int slashIndex = ('/');
    if (slashIndex == -1) {
      return null;
    }
    return (0, slashIndex);
  }

As you can see, system is written dead.grpcThat is to say, for a page that goes to the Properties.

And here'sgetService function is used to get the attribute, you can see that it is passed through thegRPC The method of the Information to getservice The.


public interface RpcAttributesGetter<REQUEST> {  
  
  @Nullable  
  String getService(REQUEST request);
}

And here.REQUEST is actually a generalization, which in gRPC isGrpcRequestIn other RPCs, this is the data of the corresponding RPC.

this oneGrpcRequest is created and passed in our custom interceptor.

And the request packet size I need here is also obtained in the intercept and then written into GrpcRequest.

static <T> Long getBodySize(T message) {  
  if (message instanceof MessageLite) {  
    return (long) ((MessageLite) message).getSerializedSize();  
  } else {  
    // Message is not a protobuf message  
    return null;  
  }}

This enables different RPCs to fetch their ownattributeWhile each groupattribute They are also all isolated and decoupled from each other.

Customizing metrics

The logic for customizing Metrics for each plugin is similar and requires an API interface provided by the framework level:

public InstrumenterBuilder<REQUEST, RESPONSE> addOperationMetrics(OperationMetrics factory) {
  (requireNonNull(factory, "operationMetrics"));
  return this;
}
// client-side metrics
.addOperationMetrics(());

// server-side metrics
.addOperationMetrics(());

These customizations are also called back later at the framework level for theOperationMetrics:

    if ( != 0) {
      // operation listeners run after span start, so that they have access to the current span
      // for capturing exemplars
      long startNanos = getNanos(startTime);
      for (int i = 0; i < ; i++) {
        context = operationListeners[i].onStart(context, attributes, startNanos);
      }
    }

	if ( != 0) {  
	  long endNanos = getNanos(endTime);  
	  for (int i =  - 1; i >= 0; i--) {  
	    operationListeners[i].onEnd(context, attributes, endNanos);  
	  }
	}

The most critical of these are the two functions onStart and onEnd, which will be called back at the beginning and end of the current span, respectively.

So it's common practice for theonStart function to initialize the data, and then theonEnd The results are tallied at the end of the process, and you end up with the data you need for metrics.

based on this The client's request elapsed time metric is an example:

@Override  
public Context onStart(Context context, Attributes startAttributes, long startNanos) {  
  return (  
      RPC_CLIENT_REQUEST_METRICS_STATE,  
      new AutoValue_RpcClientMetrics_State(startAttributes, startNanos));  
}

@Override  
public void onEnd(Context context, Attributes endAttributes, long endNanos) {  
  State state = (RPC_CLIENT_REQUEST_METRICS_STATE);
	Attributes attributes = ().toBuilder().putAll(endAttributes).build();  
	(  
	    (endNanos - ()) / NANOS_PER_MS, attributes, context);
}

Record the current time at the beginning, and get the difference between the current time and the end time at the end, which is exactly the execution time of the span, that is, the processing time of the rpc client.

existOpenTelemetry This is how the vast majority of request times are recorded.

Golang enhancements

as well asGolang Because there is nobyte-buddy Such magical libraries exist, and it is not possible to modify the source code directly, so the usual practice still has to be hard-coded to work.

neverthelessgRPC For example, when we create a gRPC server, we have to specify aOpenTelemetry provided functions.

s := (  
    (()),  
)

This SDK also implements similar logic to that in Java, so I won't go into detail for the sake of space.

summarize

that's all...gRPC existOpenTelemetry In the specific implementation, the main thing is to find out whether the framework needs to be enhanced to provide extended interfaces, and if so, directly use the interface to bury the point.

If not then you need to check the source code to find the core logic and then use thebyte-buddy Make a burial point.

For example, Pulsar does not provide some extension interfaces on the client side, you can only find its core functions to bury.

And in the specific process of burying pointsOpenTelemetry Provides a number of decoupled APIs to facilitate the realization of the business logic required to bury the point, will also continue to analyze in subsequent articlesOpenTelemetry Some of the design principles and use of the core API.

The design of this part of the API I think isOpenTelemetry The most rewarding part of the

Reference Links:

  • /#/
  • /docs/specs/semconv/rpc/rpc-metrics/#metric-rpcserverrequestsize