Location>code7788 >text

OpenTelemetry Hands-On: Application Metrics Monitoring from Scratch

Popularity:623 ℃/2024-08-28 10:37:03

preamble

In the previous post:OpenTelemetry Hands-on: Distributed Link Tracing from ScratchHaving covered the link-related practicalities, this time we continue to follow up on how to integrate metrics monitoring using OpenTelemetry.

We recommend that those who are not too familiar with metrics monitoring check out this appetizer article first:From Prometheus to OpenTelemetry: The Evolution and Practice of Metrics Monitoring

name (of a thing) corresponds English -ity, -ism, -ization multilingualism releases
java-demo Clients sending gRPC requests Java opentelemetry-agent: 2.4.0/SpringBoot: 2.7.14
k8s-combat Servers providing gRPC services Golang /otel: 1.28/ Go: 1.22
Jaeger Trace Storage Server and TraceUI Demonstration Golang jaegertracing/all-in-one:1.56
opentelemetry-collector-contrib OpenTelemetry's collector server, which collects traces/metrics/logs and writes them to the remote storage. Golang otel/opentelemetry-collector-contrib:0.98.0
Prometheus As a storage and presentation component for metrics, you can also use theVictoriaMetrics and other Prometheus-compatible storage alternatives. Golang /prometheus/prometheus:v2.49.1

Quick Start

Above is the flowchart after adding the metrics, which will add a newPrometheus component, the collector writes the metrics indicator data to Prometheus via remote write.

Prometheus needs to be enabled in order to be compatible with data written by OpenTelemetry.characterizationOnly then.

For docker startup you need to pass in the relevant parameters:

docker run  -d -p 9292:9090 --name prometheus \
-v /prometheus/:/etc/prometheus/ \
/prometheus/prometheus:v2.49.1 \
--=/etc/prometheus/ \
--=/prometheus \
--=/etc/prometheus/console_libraries \
--=/etc/prometheus/consoles \
--enable-feature=exemplar-storage \
--enable-feature=otlp-write-receiver

--enable-feature=otlp-write-receiver The main thing is this parameter, which is used to turn on receiving data in OTLP format.

But using this Push feature will lose many of Prometheus's Pull features, such as service discovery, timed crawling, etc. But it's okay, Push and Pull can be used at the same time, and the components that were originally crawled using Pull are still unaffected.

Modify OpenTelemetry-Collector

Next, we need to change the Collector's configuration.

exporters:
  debug:
  otlp:
    endpoint: "jaeger:4317"
    tls:
      insecure: true
  otlphttp/prometheus:
    endpoint: http://prometheus:9292/api/v1/otlp
    tls:
      insecure: true      

processors:
  batch:

service:
  pipelines:
    traces:
      receivers:
      - otlp
      processors: [batch]
      exporters:
      - otlp
      - debug        
    metrics:
      exporters:
      - otlphttp/prometheus
      - debug
      processors:
      - batch
      receivers:
      - otlp

Here we are atexporter A newotlphttp/prometheus node that specifies the exportprometheus (used form a nominal expression)endpoint Address.

At the same time we also need to be in Configure the same key in theotlphttp/prometheus

It is important to note that here we must be configured in the Under this node, if configured in the When the trace is executed under theotlphttp/ This is in the endpoint.

So the point is that you need to understand the pairing relationship here.

operational effect

This way we only need to launch the app and then we can query the metrics reported by the app in Prometheus.

java -javaagent:opentelemetry-javaagent-2.4. \
-=otlp \
-=otlp \
-=none \
-=java-demo \
-=grpc \
-=tracecontext,baggage \
-=http://127.0.0.1:5317 -jar target/demo-0.0.

# Run go app
export OTEL_EXPORTER_OTLP_ENDPOINT=http://127.0.0.1:5317 OTEL_RESOURCE_ATTRIBUTES==k8s-combat
./k8s-combat

Since we have Debug's exporter turned on in the collector, you can see the following log:

2024-07-22T06:34:08.060Z	info	MetricsExporter	{"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 1, "metrics": 18, "data points": 44}

At this point it is possible to indicate that the indicator was uploaded successfully.

Then we open thePrometheus The address of thehttp://127.0.0.1:9292/graph
You can query the metrics reported by Java and Go applications.

OpenTelemetry's javaagent automatically reports JVM-related metrics.


And in Go programs we still need to explicitly configure some buried points:

func initMeterProvider() * {  
    ctx := ()  
  
    exporter, err := (ctx)  
    if err != nil {  
       ("new otlp metric grpc exporter failed: %v", err)  
    }  
    mp := (  
       ((exporter)),  
       (initResource()),  
    )    (mp)  
    return mp  
}

mp := initMeterProvider()
defer func() {
	if err := (()); err != nil {
		("Error shutting down meter provider: %v", err)
	}
}()

Similarly to Tracer, we first have to call theinitMeterProvider() function to initialize the Meter, it returns a Object.

OpenTelemetry Go's SDK already provides auto-burying for go runtime, we just need to call the relevant function:

err := (())
if err != nil {
    (err)
}

After that we launch the app and in Prometheus we can see the relevant metrics reported by the Go app.

runtime_uptime_milliseconds_total Runtime Metrics for Go

Prometheus The UI for displaying metrics is limited, and we usually work with thegrafana for display.

Manual reporting of indicators

Of course, in addition to the metrics automatically reported by the SDK, we can also manually report some metrics similar to trace;

For example, I'd like to keep track of the number of times a certain function is called.

var meter =  ("/k8s/combat")  
apiCounter, err = meter.Int64Counter(  
    "",  
    ("Number of API calls."),  
    ("{call}"),  
)  
if err != nil {  
    (err)  
}

func (s *server) SayHello(ctx , in *) (*, error) {  
    defer (ctx, 1)  
    return &{Message: ("hostname:%s, in:%s, md:%v", name, , md)}, nil  
}

Simply create aInt64Counter type of indicator, and then call its function at the place where the burial point is needed(ctx, 1) Ready to go.


After that, it can be used in thePrometheus This indicator was checked in the

In addition to this, the metrics definition in OpenTelemetry is similar to Prometheus, with the following types:

  • Counter: Monotonically incrementing counters that can be used, for example, to keep track of the number of orders, the total number of requests.
  • UpDownCounter: Similar to Counter, except that it can be decremented.
  • Gauge: Used to record values that change from time to time, such as memory usage, CPU usage, and so on.
  • Histogram: Typically used to record request delays, response times, etc.

A similar API is available in Java to accomplish custom metrics:

messageInCounter = meter    
        .counterBuilder(MESSAGE_IN_COUNTER)    
        .setUnit("{message}")    
        .setDescription("The total number of messages received for this topic.")    
        .buildObserver();

For data of type Gauge the usage is as follows, using thebuildWithCallback callback function to report data.OpenTelemetry will be called back every 30s at the framework level.

public static void registerObservers() {      
    Meter meter = ();      
      
    ("pulsar_producer_num_msg_send")      
            .setDescription("The number of messages published in the last interval")      
            .ofLongs()      
            .buildWithCallback(      
                    r -> recordProducerMetrics(r, ProducerStats::getNumMsgsSent));  
  
private static void recordProducerMetrics(ObservableLongMeasurement observableLongMeasurement, Function<ProducerStats, Long> getter) {      
    for (Producer producer : CollectionHelper.PRODUCER_COLLECTION.list()) {      
        ProducerStats stats = ();      
        String topic = ();      
        if ((RetryMessageUtil.RETRY_GROUP_TOPIC_SUFFIX)) {      
            continue;      
        }        ((stats),      
                (PRODUCER_NAME, (), TOPIC, topic));      
    }}

More specific usage can be found in the official documentation link:
/docs/languages/java/instrumentation/#metrics

If we don't want to pass the data through the collector but report it directly into Prometheus, it is possible to do so using the OpenTelemetry framework.

All we need to do is configure the environment variable.

export OTEL_METRICS_EXPORTER=prometheus

This allows us to access thehttp://127.0.0.1:9464/metrics Get the current metrics exposed by the application, at which point it can be used in thePrometheus Configure a collection job in the

scrape_configs:
  - job_name: "k8s-combat"
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
    static_configs:
      - targets: ["k8s-combat:9464"]   

This is the typical Pull model, whereas OpenTelemetry recommends the Push model, where data is captured by OpenTelemetry and then pushed to Prometheus.

There are benefits to both models:

Pull model Push model
vantage All crawling endpoints can be managed in a centralized configuration, or data such as crawling frequency can be configured individually for each application. In OpenTelemetry's collector, you can centralize the pre-processing of metrics before writing the filtered data to Prometheus, which is more flexible.
drawbacks 1. The pre-processing indicator is cumbersome, as all data is relabeled when it arrives at Prometheus and then written to storage.
2. Service discovery needs to be configured
1. Additional need to maintain an indicator gateway component like collector

For example, we are using Prometheus-compatible VictoriaMetrics to capture istio's metrics, but there are too many metrics in there and we need to remove some of them.

It will be necessary to write rules in the collection task:

apiVersion: /v1beta1  
kind: VMPodScrape  
metadata:  
  name: isito-pod-scrape  
spec:  
  podMetricsEndpoints:  
    - scheme: http  
      scrape_interval: "30s"  
      scrapeTimeout: "30s"  
      path: /stats/prometheus  
      metricRelabelConfigs:  
        - regex: ^envoy_.*|^url\_\_\_\_.*|istio_request_bytes_sum|istio_request_bytes_count|istio_response_bytes_sum|istio_request_bytes_sum|istio_request_duration_milliseconds_sum|istio_response_bytes_count|istio_request_duration_milliseconds_count|^ostrich_apigateway.*|istio_request_messages_total|istio_response_messages_total  
          action: drop_metrics  
  namespaceSelector:  
    any: true

By switching to processing in the collector, all of this logic can be moved to the collector and centralized.

summarize

The use of metrics is a bit simpler than trace. You don't need to understand the complex concepts of context, span, etc., you just need to figure out what types of metrics are available and what different scenarios they are used in.

Reference Links:

  • /docs/prometheus/latest/feature_flags/#otlp-receiver
  • /docs/languages/java/instrumentation/#metrics
  • /docs/languages/go/instrumentation/#metrics