preamble
In the previous post:OpenTelemetry Hands-on: Distributed Link Tracing from ScratchHaving covered the link-related practicalities, this time we continue to follow up on how to integrate metrics monitoring using OpenTelemetry.
We recommend that those who are not too familiar with metrics monitoring check out this appetizer article first:From Prometheus to OpenTelemetry: The Evolution and Practice of Metrics Monitoring
name (of a thing) | corresponds English -ity, -ism, -ization | multilingualism | releases |
---|---|---|---|
java-demo | Clients sending gRPC requests | Java | opentelemetry-agent: 2.4.0/SpringBoot: 2.7.14 |
k8s-combat | Servers providing gRPC services | Golang | /otel: 1.28/ Go: 1.22 |
Jaeger | Trace Storage Server and TraceUI Demonstration | Golang | jaegertracing/all-in-one:1.56 |
opentelemetry-collector-contrib | OpenTelemetry's collector server, which collects traces/metrics/logs and writes them to the remote storage. | Golang | otel/opentelemetry-collector-contrib:0.98.0 |
Prometheus | As a storage and presentation component for metrics, you can also use theVictoriaMetrics and other Prometheus-compatible storage alternatives. | Golang | /prometheus/prometheus:v2.49.1 |
![]() |
Quick Start
Above is the flowchart after adding the metrics, which will add a newPrometheus
component, the collector writes the metrics indicator data to Prometheus via remote write.
Prometheus needs to be enabled in order to be compatible with data written by OpenTelemetry.characterizationOnly then.
For docker startup you need to pass in the relevant parameters:
docker run -d -p 9292:9090 --name prometheus \
-v /prometheus/:/etc/prometheus/ \
/prometheus/prometheus:v2.49.1 \
--=/etc/prometheus/ \
--=/prometheus \
--=/etc/prometheus/console_libraries \
--=/etc/prometheus/consoles \
--enable-feature=exemplar-storage \
--enable-feature=otlp-write-receiver
--enable-feature=otlp-write-receiver
The main thing is this parameter, which is used to turn on receiving data in OTLP format.
But using this Push feature will lose many of Prometheus's Pull features, such as service discovery, timed crawling, etc. But it's okay, Push and Pull can be used at the same time, and the components that were originally crawled using Pull are still unaffected.
Modify OpenTelemetry-Collector
Next, we need to change the Collector's configuration.
exporters:
debug:
otlp:
endpoint: "jaeger:4317"
tls:
insecure: true
otlphttp/prometheus:
endpoint: http://prometheus:9292/api/v1/otlp
tls:
insecure: true
processors:
batch:
service:
pipelines:
traces:
receivers:
- otlp
processors: [batch]
exporters:
- otlp
- debug
metrics:
exporters:
- otlphttp/prometheus
- debug
processors:
- batch
receivers:
- otlp
Here we are atexporter
A newotlphttp/prometheus
node that specifies the exportprometheus
(used form a nominal expression)endpoint
Address.
At the same time we also need to be in Configure the same key in the
otlphttp/prometheus
。
It is important to note that here we must be configured in the Under this node, if configured in the
When the trace is executed under the
otlphttp/
This is in the endpoint.
So the point is that you need to understand the pairing relationship here.
operational effect
This way we only need to launch the app and then we can query the metrics reported by the app in Prometheus.
java -javaagent:opentelemetry-javaagent-2.4. \
-=otlp \
-=otlp \
-=none \
-=java-demo \
-=grpc \
-=tracecontext,baggage \
-=http://127.0.0.1:5317 -jar target/demo-0.0.
# Run go app
export OTEL_EXPORTER_OTLP_ENDPOINT=http://127.0.0.1:5317 OTEL_RESOURCE_ATTRIBUTES==k8s-combat
./k8s-combat
Since we have Debug's exporter turned on in the collector, you can see the following log:
2024-07-22T06:34:08.060Z info MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 1, "metrics": 18, "data points": 44}
At this point it is possible to indicate that the indicator was uploaded successfully.
Then we open thePrometheus
The address of thehttp://127.0.0.1:9292/graph
You can query the metrics reported by Java and Go applications.
OpenTelemetry's javaagent automatically reports JVM-related metrics.
And in Go programs we still need to explicitly configure some buried points:
func initMeterProvider() * {
ctx := ()
exporter, err := (ctx)
if err != nil {
("new otlp metric grpc exporter failed: %v", err)
}
mp := (
((exporter)),
(initResource()),
) (mp)
return mp
}
mp := initMeterProvider()
defer func() {
if err := (()); err != nil {
("Error shutting down meter provider: %v", err)
}
}()
Similarly to Tracer, we first have to call theinitMeterProvider()
function to initialize the Meter, it returns a Object.
OpenTelemetry Go's SDK already provides auto-burying for go runtime, we just need to call the relevant function:
err := (())
if err != nil {
(err)
}
After that we launch the app and in Prometheus we can see the relevant metrics reported by the Go app.
runtime_uptime_milliseconds_total Runtime Metrics for Go
Prometheus
The UI for displaying metrics is limited, and we usually work with thegrafana
for display.
Manual reporting of indicators
Of course, in addition to the metrics automatically reported by the SDK, we can also manually report some metrics similar to trace;
For example, I'd like to keep track of the number of times a certain function is called.
var meter = ("/k8s/combat")
apiCounter, err = meter.Int64Counter(
"",
("Number of API calls."),
("{call}"),
)
if err != nil {
(err)
}
func (s *server) SayHello(ctx , in *) (*, error) {
defer (ctx, 1)
return &{Message: ("hostname:%s, in:%s, md:%v", name, , md)}, nil
}
Simply create aInt64Counter
type of indicator, and then call its function at the place where the burial point is needed(ctx, 1)
Ready to go.
After that, it can be used in thePrometheus
This indicator was checked in the
In addition to this, the metrics definition in OpenTelemetry is similar to Prometheus, with the following types:
- Counter: Monotonically incrementing counters that can be used, for example, to keep track of the number of orders, the total number of requests.
- UpDownCounter: Similar to Counter, except that it can be decremented.
- Gauge: Used to record values that change from time to time, such as memory usage, CPU usage, and so on.
- Histogram: Typically used to record request delays, response times, etc.
A similar API is available in Java to accomplish custom metrics:
messageInCounter = meter
.counterBuilder(MESSAGE_IN_COUNTER)
.setUnit("{message}")
.setDescription("The total number of messages received for this topic.")
.buildObserver();
For data of type Gauge the usage is as follows, using thebuildWithCallback
callback function to report data.OpenTelemetry
will be called back every 30s at the framework level.
public static void registerObservers() {
Meter meter = ();
("pulsar_producer_num_msg_send")
.setDescription("The number of messages published in the last interval")
.ofLongs()
.buildWithCallback(
r -> recordProducerMetrics(r, ProducerStats::getNumMsgsSent));
private static void recordProducerMetrics(ObservableLongMeasurement observableLongMeasurement, Function<ProducerStats, Long> getter) {
for (Producer producer : CollectionHelper.PRODUCER_COLLECTION.list()) {
ProducerStats stats = ();
String topic = ();
if ((RetryMessageUtil.RETRY_GROUP_TOPIC_SUFFIX)) {
continue;
} ((stats),
(PRODUCER_NAME, (), TOPIC, topic));
}}
More specific usage can be found in the official documentation link:
/docs/languages/java/instrumentation/#metrics
If we don't want to pass the data through the collector but report it directly into Prometheus, it is possible to do so using the OpenTelemetry framework.
All we need to do is configure the environment variable.
export OTEL_METRICS_EXPORTER=prometheus
This allows us to access thehttp://127.0.0.1:9464/metrics Get the current metrics exposed by the application, at which point it can be used in thePrometheus
Configure a collection job in the
scrape_configs:
- job_name: "k8s-combat"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["k8s-combat:9464"]
This is the typical Pull model, whereas OpenTelemetry recommends the Push model, where data is captured by OpenTelemetry and then pushed to Prometheus.
There are benefits to both models:
Pull model | Push model | |
---|---|---|
vantage | All crawling endpoints can be managed in a centralized configuration, or data such as crawling frequency can be configured individually for each application. | In OpenTelemetry's collector, you can centralize the pre-processing of metrics before writing the filtered data to Prometheus, which is more flexible. |
drawbacks | 1. The pre-processing indicator is cumbersome, as all data is relabeled when it arrives at Prometheus and then written to storage. 2. Service discovery needs to be configured |
1. Additional need to maintain an indicator gateway component like collector |
For example, we are using Prometheus-compatible VictoriaMetrics to capture istio's metrics, but there are too many metrics in there and we need to remove some of them.
It will be necessary to write rules in the collection task:
apiVersion: /v1beta1
kind: VMPodScrape
metadata:
name: isito-pod-scrape
spec:
podMetricsEndpoints:
- scheme: http
scrape_interval: "30s"
scrapeTimeout: "30s"
path: /stats/prometheus
metricRelabelConfigs:
- regex: ^envoy_.*|^url\_\_\_\_.*|istio_request_bytes_sum|istio_request_bytes_count|istio_response_bytes_sum|istio_request_bytes_sum|istio_request_duration_milliseconds_sum|istio_response_bytes_count|istio_request_duration_milliseconds_count|^ostrich_apigateway.*|istio_request_messages_total|istio_response_messages_total
action: drop_metrics
namespaceSelector:
any: true
By switching to processing in the collector, all of this logic can be moved to the collector and centralized.
summarize
The use of metrics is a bit simpler than trace. You don't need to understand the complex concepts of context, span, etc., you just need to figure out what types of metrics are available and what different scenarios they are used in.
Reference Links:
- /docs/prometheus/latest/feature_flags/#otlp-receiver
- /docs/languages/java/instrumentation/#metrics
- /docs/languages/go/instrumentation/#metrics