Opentelemetry collector usage
-
Opentelemetry collector usage
-
Service
- Extensions
- healthcheckextension
- Pipelines
-
receiver
- OTLP Receiver
- prometheus receiver
- filelog receiver
-
Processor
- Data attribution
- Important
- memory limiter processor
- batch processor
- attributes processor && Resource Processor
- filter processor
- k8s attributes processor
- Tail Sampling Processor
- transform processor
- routing processor
-
Exporter
- debug exporter
- otlp exporter
- otlp http exporter
- prometheus exporter
- prometheus remote write exporter
- loadbalancing exporter
-
Connector
- roundrobin connector
- span metrics connector
- troubleshooting
-
expansion
- When to Expand
-
How to expand capacity
- stateful
-
Service
The Opentelemetry collector contains the followingsubassemblies:
- receiver
- processor
- exporter
- connector
- Service
Note that the individual components are only defined here; for them to actually take effect, they need to be added to theserviceMiddle.
formalopentelemetry-collectorcap (a poem)opentelemetry-collector-contribTwo libraries give a large number of Collector component implementations. The former is opentelemetry-collector'scruxconfiguration, which is used to provide vendor-independent collector configuration, the latter being provided by theDifferent vendors offer, such as aws, aure, kafka and so on. You can combine the functions of both to meet business needs when using them. It is also worth noting that each of the two repositories' individualComponent CatalogAll of them are provided in theHelp files such asotlpreceiver、prometheusremotewriteexporteretc.
Service
The service field is used to organize the enabling of the receivers, processors, exporters and extensions components. A service contains the following subfields:
- Extensions
- Pipelines
- Telemetry: Support Configurationmetriccap (a poem)log。
-
By default, opentelemetry will be
http://127.0.0.1:8888/metrics
Exposing metrics underSpecifies the address of the exposed metrics. The address of the exposed metrics can be specified using the
level
field controls the number of exposed metrics (given here for each level under themetrics):-
none
:: No collection of telemetry data -
basic
:: Acquisition of basic telemetry data -
normal
:: Default level, adding standard telemetry data on top of basic -
detailed
: The most detailed level, including dimensions and views.
-
-
The default level of the log is
INFO
SupportDEBUG
、WARN
、ERROR
。
-
Extensions
Collector authentication, health monitoring, service discovery or data forwarding can be implemented using extensions. Most extensions have default configurations.
service:
extensions: [health_check, pprof, zpages]
telemetry:
metrics:
address: 0.0.0.0:8888
level: normal
healthcheckextension
A health check can be provided for the pod's probe:
extensions:
health_check:
endpoint: ${env:MY_POD_IP}:13133
Pipelines
A pipeline contains the set of receivers, processors and exporters, and the same receivers, processors and exporters can be put into multiple pipelines.
Configure pipeline, type:
-
traces
: Capture and processing of trace data -
metrics
: Acquisition and processing of metric data -
logs
: Acquisition and processing of log data
Note that the order in which the PROCESSORS are positioned determines the order in which they are processed.
service:
pipelines:
metrics:
receivers: [opencensus, prometheus]
processors: [batch]
exporters: [opencensus, prometheus]
traces:
receivers: [opencensus, jaeger]
processors: [batch, memory_limiter]
exporters: [opencensus, zipkin]
The following focuses on a few common component configurations.
receiver
Used to receive telemetry data.
This can be done by<receiver type>/<name>
Configure multiple receivers for a class of receivers to ensure that the names of the receivers are unique. at least one receiver needs to be configured in the collector.
receivers:
# Receiver 1.
# <receiver type>:
examplereceiver:
# <setting one>: <value one>
endpoint: 1.2.3.4:8080
# ...
# Receiver 2.
# <receiver type>/<name>:
examplereceiver/settings:
# <setting two>: <value two>
endpoint: 0.0.0.0:9211
OTLP Receiver
utilizationOTLPformat to receive gRPC or HTTP traffic, such as thepush mode, i.e., the client is required to push telemetry data to opentelemetry:
receivers:
otlp:
protocols:
grpc:
http:
The otlp receiver can be defined under k8s in the following way:
receivers.
otlp.
protocols.
grpc: ${env:MY_POD_IP}:4317
endpoint: ${env:MY_POD_IP}:4317 #Define the server that receives grpc data formats
http: endpoint: ${env:MY_POD_IP}:4317
endpoint: ${env:MY_POD_IP}:4318 #Define the server that receives the http data format.
The Receiver itself supports push and pull modes, such ashaproxyreceiverIt's the pull mode:
receivers:
haproxy:
endpoint: http://127.0.0.1:8080/stats
collection_interval: 1m
metrics:
haproxy.connection_rate:
enabled: false
:
enabled: true
prometheus receiver
The prometheusreceiver supports the use of the prometheus approachpull metrics data, but note that the approachCurrently in the development stage,It's official.caveatcap (a poem)Unsupported Features:
receivers:
prometheus:
config:
scrape_configs:
- job_name: 'otel-collector'
scrape_interval: 5s
static_configs:
- targets: ['0.0.0.0:8888']
- job_name: k8s
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
regex: "true"
action: keep
metric_relabel_configs:
- source_labels: [__name__]
regex: "(request_duration_seconds.*|response_duration_seconds.*)"
action: keep
filelog receiver
filelog-receiverUsed to capture logs from files.
Processor
Modify or transform the data collected by the receiver according to the rules or configurations defined by each processor, such as filtering, discarding, renaming, and other operations.The order of execution of Processors depends on theThe order of Processors defined in the The recommended processors are as follows:
- memory_limiter
- sampling processors or the initialfiltering processors
- dependenciesContextThe processor that sends the source, e.g.
k8sattributes
- batch
- Other Processors
Data attribution
Since a receiver may be attached to multiple pipelines, there may be multiple processors processing data from the same receiver at the same time, and this involves theData attributionThe problem. There are two modes of data attribution from pipelines' point of view:
- Exclusive data: in this mode, the pipeline copies the data received from the receiver and the individual pipelines do not interact with each other.
- Shared data: in this mode, the pipeline does not copy the data received from the receiver, multiple pipelines share the same data, and the data isread-only (computing)that cannot be modified. This can be changed by setting the
MutatesData=false
to avoid data copying in exclusive mode.
Note: In the official(computer) fileThe warning is that when multiple pipelines refer to the same receiver, they can only be guaranteed to be independent, but since the entire process uses synchronous calls, if one pipeline blocks, it will cause other pipelines using the same receiver to block as well.:
Important
When the same receiver is referenced in more than one pipeline, the Collector creates only one receiver instance at runtime that sends the data to a fan-out consumer. The fan-out consumer in turn sends the data to the first processor of each pipeline. The data propagation from receiver to the fan-out consumer and then to processors is completed using a synchronous function call. This means that if one processor blocks the call, the other pipelines attached to this receiver are blocked from receiving the same data, and the receiver itself stops processing and forwarding newly received data.
memory limiter processor
Used to prevent collector OOM. this processor periodically checks the memory situation and if the memory used is greater than a set threshold, it executes a()
。memorylimiterprocessor
There are two thresholds:soft limitcap (a poem)hard limit. When memory usage exceedssoft limitWhen it does, the processor will reject the data and return an error (thus requiring the ability to retry the occurrence of the data or there will be data loss) until the memory usage falls below thesoft limitIf the memory usage exceedshard limit, then a GC will be enforced.
recommended generalmemorylimiterprocessor
Set as the first processor. The setup parameters are as follows:
-
check_interval
(default 0s): memory check period, recommended value is 1s. can be lowered if Collector memory is spikycheck_interval
or increasespike_limit_mib
to avoid exceeding the memoryhard limit。 -
limit_mib
(default 0): Definehard limitThe value of the maximum memory requested by the process heap in MiB. note that usually the total memory will be higher than this value by about 50 MiB. -
spike_limit_mib
(The default is 20% of thelimit_mib
): Measures the maximum expected peak between memory uses, which must be less thanlimit_mib
.。soft limit Equivalent (limit_mib
-spike_limit_mib
),spike_limit_mib
The recommended value of 20%limit_mib
。 -
limit_percentage
(default 0): defines the maximum amount of memory to be requested by the process heap as a percentage, with a lower priority than thelimit_mib
-
spike_limit_percentage
(default 0): Measures the maximum expected peak between memory uses by a percentage, which can only be compared with thelimit_percentage
Use in conjunction.
Use it as follows:
processors:
memory_limiter:
check_interval: 1s
limit_mib: 4000
spike_limit_mib: 800
processors:
memory_limiter:
check_interval: 1s
limit_percentage: 50
spike_limit_percentage: 30
batch processor
The batch processor can receive spans, metrics, or logs, and compresses the data to reduce the number of connections required for data transfer.
It is recommended to configure the batch processor on each Collector and place it in thememory_limiter
and after sampling processors. There are two batch sending modes based on size and interval.
The configuration parameters are as follows:
-
send_batch_size
(Default 8192): Defines the number of batch (spans, metric data points or log records) to send, after which a batch will be sent. -
timeout
(default 200ms): defines the timeout for sending a batch, after which a batch will be sent. if set to 0, it will be ignoredsend_batch_size
and based only onsend_batch_max_size
to send data. -
send_batch_max_size
(default 0): upper limit of the batch size, must be greater than or equal tosend_batch_size
。0
Indicates that there is no upper limit. -
metadata_keys
(empty by default): If this value is set, the processor will set a new value for each different combination of thevalue creates a batcher instance.take note ofExecuting a batch using metadata increases the memory required by the batch.
-
metadata_cardinality_limit
(Default 1000): When themetadata_keys
Non-null, this value limits the number of combinations of metadata keys that need to be processed.
The following defines a default batch processor and a custom batch processor.Note that this is just a declaration; for it to take effect you need to add theserviceCited in.
processors:
batch:
batch/2:
send_batch_size: 10000
timeout: 10s
attributes processor && Resource Processor
The Resource Processor can be seen as a subset of the attributes processor and is used to modify resource ( span, log, metric) attributes.
The attributes processor has two main functions:Modifying Resource Propertiesas well asdata filtering. Typically used to modify resource attributes, data filtering can be considered using thefilterprocessor。
Here are some common ways of modifying resource attributes, similar to how Prometheus modifies labels; see the official(for) instance:
processors:
attributes/example:
actions:
- key:
action: delete
- key: redacted_span
value: true
action: upsert
- key: copy_key
from_attribute: key_original
action: update
- key: account_id
value: 2245
action: insert
- key: account_password
action: delete
- key: account_email
action: hash
- key: http.status_code
action: convert
converted_type: int
filter processor
Used to discard spans, span events, metrics, datapoints, and logs captured by the Collector. filterprocessor will use the OTTL syntax to create whether or not it needs todiscardsThe CONDITIONS of the telemetry data, which are discarded if they match any CONDITION.
|
Span |
---|---|
|
SpanEvent |
|
Metric |
|
DataPoint |
logs.log_record |
Log |
as belowdiscardsAll HTTP spans:
processors:
filter:
error_mode: ignore
traces:
span:
- attributes[""] == nil
In addition the filter processor supportsOTTL Converter functions. Such as
# Drops metrics containing the '' attribute key
filter/keep_good_metrics:
error_mode: ignore
metrics:
metric:
- 'HasAttrKeyOnDatapoint("")'
k8s attributes processor
This processor automatically discovers k8s resources and then injects the required metadata information into span, metrics, and logs asresourcesProperties.
When k8sattributesprocessor receives data (log, trace or metric), it tries to match the data with the pod, and if the match is successful, it injects the associated pod metadata into that data. By default, k8sattributesprocessor uses the inbound connection IP and the Pod IP for association, but it can also be used via theresource_attribute
Customize the association method:
Each rule contains a pair offrom
(indicating the type of rule) andname
(iffrom
because ofresource_attribute
, then the attribute name).
from
There are two types:
-
connection
: Match the data using the IP attribute in the afternoon on the connection.When using this type, the processor must be located before any batching or tail sampling.。 -
resource_attribute
: Specify the attributes used to match the data from the received resource. Only attributes of metadata can be used.
pod_association:
# below association takes a look at the datapoint's resource attribute and tries to match it with
# the pod having the same attribute.
- sources:
- from: resource_attribute
name:
# below association matches for pair `` and ``
- sources:
- from: resource_attribute
name:
- from: resource_attribute
name:
By default the following properties are extracted and added, which can be accessed via themetadata
Modify the default value:
.start_time
The k8sattributesprocessor supports the use of the k8sattributesprocessor from the pods, namespaces, and nodes of thelabels
cap (a poem)annotations
Extraction on the (extract
) resource attributes.
extract:
annotations:
- tag_name: a1 # extracts value of annotation from pods with key `annotation-one` and inserts it as a tag with key `a1`
key: annotation-one
from: pod
- tag_name: a2 # extracts value of annotation from namespaces with key `annotation-two` with regexp and inserts it as a tag with key `a2`
key: annotation-two
regex: field=(?P<value>.+)
from: namespace
- tag_name: a3 # extracts value of annotation from nodes with key `annotation-three` with regexp and inserts it as a tag with key `a3`
key: annotation-three
regex: field=(?P<value>.+)
from: node
labels:
- tag_name: l1 # extracts value of label from namespaces with key `label1` and inserts it as a tag with key `l1`
key: label1
from: namespace
- tag_name: l2 # extracts value of label from pods with key `label2` with regexp and inserts it as a tag with key `l2`
key: label2
regex: field=(?P<value>.+)
from: pod
- tag_name: l3 # extracts value of label from nodes with key `label3` and inserts it as a tag with key `l3`
key: label3
from: node
The full example is as follows, and since the k8sattributesprocessor is itself a k8s Controller, it needs to be passed through thefilter
Specifies the scope of listwatch:
k8sattributes:
k8sattributes/2:
auth_type: "serviceAccount"
passthrough: false
filter:
node_from_env_var: KUBE_NODE_NAME
extract:
metadata:
-
-
-
-
-
- .start_time
labels:
- tag_name:
key: /component
from: pod
pod_association:
- sources:
- from: resource_attribute
name:
- sources:
- from: resource_attribute
name:
- sources:
- from: connection
Tail Sampling Processor
Sample traces based on a predefined strategy. note that in order to efficiently implement the sampling strategy, theAll spans under a trace must be processed in the same Collector instance。This processor must be placed into context-dependent processors (such as thek8sattributes
) after that, otherwise reorganization will result in the loss of the original context.Before sampling is performed, a sample is taken based on thetrace_id
Groups spans so that there is no need forgroupbytraceprocessorYou can then use the tail sampling processor directly.
tailsamplingprocessor
centerand
is a more specific strategy that will use theAND logic strings together multiple strategies. For example, in the following exampleand
String together multiple strategies for:
- filter out
because of
[service-1, service-2, service-3]
data - The data from the above 3 services is then filtered from the
because of
[/live, /ready]
data - The last will come from
[service-1, service-2, service-3]
serviced[/live, /ready]
The sampling rate of the data is set to 0.1
and:
{
and_sub_policy: # andLogical set of strategies
[
{
# filter by service name
name: service-name-policy,
type: string_attribute,
string_attribute:
{
key: ,
values: [service-1, service-2, service-3],
},
},
{
# filter by route
name: route-live-ready-policy,
type: string_attribute,
string_attribute:
{
key: ,
values: [/live, /ready],
enabled_regex_matching: true, #Enabling Regular Expressions
},
},
{
# apply probabilistic sampling
name: probabilistic-policy,
type: probabilistic,
probabilistic: { sampling_percentage: 0.1 },
},
],
},
See more at Official(for) instance
transform processor
The processor contains a series of functions that are related to theContext Typeassociated conditions and statements, and executes the conditions and statements on the received telemetry data in the configured order.It uses a method namedOpenTelemetry Transformation LanguageThe SQL-like syntax of the
transform processorcantrace、metricscap (a poem)logsConfigure multiplecontext statements,context
Specifies that the statements use theOTTL Context:
Telemetry | OTTL Context |
---|---|
Resource |
Resource |
Instrumentation Scope |
Instrumentation Scope |
Span |
Span |
Span Event |
SpanEvent |
Metric |
Metric |
Datapoint |
DataPoint |
Log |
Log |
The Contexts supported by trace, metric and log are as follows:
Signal | Context Values |
---|---|
trace_statements |
resource , scope , span , and spanevent
|
metric_statements |
resource , scope , metric , and datapoint
|
log_statements |
resource , scope , and log
|
Each statement can contain aWhere
sub-statement to check if the statement is executed.
The transform processor also supports an optional field.error_mode
, which is used to determine how the processor responds to errors generated by the statement.
error_mode | description |
---|---|
ignore | processor ignores the error, logs it, and moves on to the next statement, recommended mode. |
silent | processor ignores the error, does not log it, and proceeds to the next statement. |
propagate | The processor returns an error to the pipeline that causes the Collector to discard the payload. default option. |
In addition, the transform processor supports OTTL.function (math.)Telemetry data can be added, deleted, and modified.
As in the following example, if attributetest
does not exist, then the attributetest
set topass
:
transform:
error_mode: ignore
trace_statements:
- context: span
statements:
# accessing a map with a key that does not exist will return nil.
- set(attributes["test"], "pass") where attributes["test"] == nil
debug
Locate it by enabling debug logging in Collector:
receivers:
filelog:
start_at: beginning
include: [ ]
processors:
transform:
error_mode: ignore
log_statements:
- context: log
statements:
- set(["test"], "pass")
- set(instrumentation_scope.attributes["test"], ["pass"])
- set(attributes["test"], true)
exporters:
debug:
service:
telemetry:
logs:
level: debug
pipelines:
logs:
receivers:
- filelog
processors:
- transform
exporters:
- debug
routing processor
Routes logs, metrics or traces to the specified exporter.This processor needs to route trace information to specific exporters based on inbound HTTP request (gRPC) headers or resource attribute values.
Attention:
- This processor terminates subsequent processors of the pipeline and issues an alert if other processors are defined after this processor.
- If an exporter is added to the pipeline, it needs to be added to that processor as well, otherwise it won't take effect.
- Since this processor relies on HTTP headers or resource attributes, care needs to be taken in using aggregation processors in the pipeline (
batch
maybegroupbytrace
)
The mandatory parameters for configuration are as follows:
-
from_attribute
: HTTP header name or resource attribute name to get the route value. -
table
: Processor's routing table -
:
FromAttribute
Possible values for the field -
: If
FromAttribute
field value matchingIf it is not, the exporters defined here will be used.
The optional fields are listed below:
-
attribute_source
: Definitionfrom_attribute
The source of the attribute:-
context
(default) - Querycontext(contains HTTP headers).defaultfrom_attribute
Sources of data, which can be injected manually or by a third-party service (such as a gateway). -
resource
- Query Resource Properties
-
-
drop_resource_routing_attribute
- Whether to remove resource attributes used by the route. -
default_exporters
: exporters that cannot match the data in the routing table.
Examples are given below:
processors:
routing:
from_attribute: X-Tenant
default_exporters:
- jaeger
table:
- value: acme
exporters: [jaeger/acme]
exporters:
jaeger:
endpoint: localhost:14250
jaeger/acme:
endpoint: localhost:24250
Exporter
Attention opentelemetry's ExporterMostly push modethat needs to be sent to the back end.
debug exporter
For debugging use, you can output telemetry data to the terminal with the following configuration parameters:
-
verbosity
: (Defaultbasic
), with an optional value ofbasic
(output summary information),normal
(Output actual data),detailed
(output details) -
sampling_initial
: (Default2
), the number of messages output in each second in the beginning -
sampling_thereafter
: (Default1
), insampling_initial
The sampling rate after which the1
Indicates that the feature is disabled.assume (office)Output per second beforesampling_initial
message, and then outputs the firstsampling_thereafter
messages, discarding the rest.
exporters:
debug:
verbosity: detailed
sampling_initial: 5
sampling_thereafter: 200
otlp exporter
utilizationOTLP format, via gRPCSend dataNote that this ispushmode, TLS is required by default. you can optionally set theRetry and queue。
exporters:
otlp:
endpoint: otelcol2:4317
tls:
cert_file:
key_file:
otlp/2:
endpoint: otelcol2:4317
tls:
insecure: true
otlp http exporter
via HTTPdispatchOTLP data in the format of,
endpoint: "https://1.2.3.4:1234"
tls:
ca_file: /var/lib/
cert_file: certfile
key_file: keyfile
insecure: true
timeout: 10s
read_buffer_size: 123
write_buffer_size: 345
sending_queue:
enabled: true
num_consumers: 2
queue_size: 10
retry_on_failure:
enabled: true
initial_interval: 10s
randomization_factor: 0.7
multiplier: 1.3
max_interval: 60s
max_elapsed_time: 10m
headers:
"can you have a . here?": "F0000000-0000-0000-0000-000000000000"
header1: "234"
another: "somevalue"
compression: gzip
prometheus exporter
utilizationPrometheus formatexposing metrics.pull mode。
-
endpoint
: Expose the address of the metrics at path/metrics
-
const_labels
: key/values appended for each metrics -
namespace
: If set, the indicator exposure is<namespace>_<metrics>
-
send_timestamps
: Defaultfalse
If or not the metrics are sent in the response, the time of capture -
metric_expiration
: Default5m
The following is an example of how to define the length of time that exposed metrics do not need to be updated. -
resource_to_telemetry_conversion
: Defaultfalse
If enabled, all resource attributes are transformed into metric labels. -
enable_open_metrics
: Defaultfalse
, if enabled, metrics are exposed using the OpenMetrics format, which can support Exemplars functionality. -
add_metric_suffixes
: Defaulttrue
, if false, the type and unit suffixes will not be enabled.
exporters:
prometheus:
endpoint: "1.2.3.4:1234" # The exposed address is:https://1.2.3.4:1234/metrics
tls:
ca_file: "/path/to/"
cert_file: "/path/to/"
key_file: "/path/to/"
namespace: test-space
const_labels:
label1: value1
"another label": spaced value
send_timestamps: true
metric_expiration: 180m
enable_open_metrics: true
add_metric_suffixes: false
resource_to_telemetry_conversion:
enabled: true
It is recommended to use the transform processor to set the most common resource attributes to metric labels.
processor:
transform:
metric_statements:
- context: datapoint
statements:
- set(attributes["namespace"], [""])
- set(attributes["container"], [""])
- set(attributes["pod"], [""])
prometheus remote write exporter
be in favor ofHTTP Settingscap (a poem)Retry and timeout settings
Used to send opentelemetry metrics to prometheus remote wirte compatible backends such as Cortex, Mimir, and thanos.
The configuration parameters are as follows:
-
endpoint
:remote write URL -
tls
: TLS must be configured by default-
insecure
: Defaultfalse
. To enable TLS, you need to configure thecert_file
cap (a poem)key_file
-
-
external_labels
: add additional label name and value for each metric -
headers
: Adds an additional header to each HTTP request. -
add_metric_suffixes
: Defaulttrue
, if false, the type and unit suffixes will not be enabled. -
send_metadata
: Defaultfalse
Iftrue
If it is not, it generates and sends the prometheus metadata -
remote_write_queue
: Configure the queue and send parameters for remote write-
enabled
: start the send queue, defaulttrue
-
queue_size
: Number of OTLP metrics entered into the queue, default10000
-
num_consumers
: Minimum number of workers to send a request, default5
-
-
resource_to_telemetry_conversion
: Defaultfalse
Iftrue
, then all resource attributes will be transformed into metric labels. -
target_info
: Defaultfalse
Iftrue
, then it generates one for each resource indicatortarget_info
norm -
max_batch_size_bytes
: Default3000000
->~2.861 mb
. The number of batch samples sent to the remote end. If a batch is larger than this value, it will be given a cut into multiple batches.
exporters:
prometheusremotewrite:
endpoint: "https://my-cortex:7900/api/v1/push"
external_labels:
label_name1: label_value1
label_name2: label_value2
resource_to_telemetry_conversion:
enabled: true # Convert resource attributes to metric labels
It is recommended to use the transform processor to set the most common resource attributes to metric labels.
processor:
transform:
metric_statements:
- context: datapoint
statements:
- set(attributes["namespace"], [""])
- set(attributes["container"], [""])
- set(attributes["pod"], [""])
loadbalancing exporter
on the basis ofrouting_key
Load balancing of spans, metrics and logs. If you don't configure therouting_key
, then the default value of traces istraceID
The default value for metrics isservice
namelyequaltraceID
(or(when
service
act asrouting_key
)) spans are sent to the same backendThe following is an example of the use of the "red-metrics-collector". Especially suitable for tail-based samplers or red-metrics-collectors, which need to be based on theFull traceof the back end.
Note that load balancing is based only on the Trace ID or Service name and does not take into account the actual back-end load, nor does it perform polling load balancing.
routing_key
The optional values are.
routing_key | can be used for |
---|---|
service | logs, spans, metrics |
traceID | logs, spans |
resource | metrics |
metric | metrics |
streamID | metrics |
This can be done bystatic (as in electrostatic force)maybeDNSThe back-end is configured in the same way as the back-end. When a backend is updated, it is rerouted based on R/N (total number of routes/total number of backends). If backends change frequently, consider using thegroupbytrace
processor。
It is important to note that if there is an anomaly on the back end, at this point theloadbalancingexporter
does not attempt to resend the data, and there is a possibility of data loss, so theRequires queue and retry mechanisms to be configured on the exporter。
- When the resolver is
static
whenIf one backend is unavailable, data load balancing fails for all backends, until that backend returns to normal or fromstatic
Remove from the list.dns
resolver follows the same principle. - When using the
k8s
、dns
When a topology change is made, the topology change will eventually be reflected in theloadbalancingexporter
Up.
The main configuration parameters are listed below:
-
otlp
: Used to configureOTLP exporter. Note that no configuration is required hereendpoint
, the field is overwritten by the resolver's backend. -
resolver
: You can configure astatic
Onedns
and ak8s
maybeaws_cloud_map
, but not all 4 resolvers can be specified at the same time.-
dns
hit the nail on the headhostname
for obtaining a list of IP addresses.port
Indicates the port used to import traces, default is 4317;interval
Specify the parsing interval, e.g.5s
,1d
,30m
default5s
;timeout
Specify the parsing timeout, e.g.5s
,1d
,30m
default1s
-
k8s
hit the nail on the headservice
refers to the kubernetes service domain name, such as-ns
。port
Refers to the port used to import traces, default is 4317, if more than one port is specified, the corresponding backends will be added to the loadbalancer, just like different pods;timeout
Specify the parsing timeout, e.g.5s
,1d
,30m
default1s
-
-
routing_key
: For data (spans or metrics) routing. Currently only supportstrace
cap (a poem)metrics
Type. The following parameters are supported:-
service
: Route based on service name. Great for span metrics, so that all spans for each service are sent to a consistent metrics Collector. Otherwise metrics for the same service may be sent to different Collectors, resulting in inaccurate aggregation. -
traceID
: Based ontraceID
Route spans. metrics are invalid. -
metric
: Routes metrics based on metric name. spans is invalid. -
streamID
: Routes metrics based on the streamID of the data. streamID is a unique value generated by hashing the attributes and resource, scope and metrics data.
-
In the following example it is possible to ensure that spans with the same traceID are sent to the same backend (Pod), :
receivers:
otlp/external:
protocols:
grpc:
endpoint: ${env:MY_POD_IP}:4317
http:
endpoint: ${env:MY_POD_IP}:4318
otlp/internal:
protocols:
grpc:
endpoint: ${env:MY_POD_IP}:14317
http:
endpoint: ${env:MY_POD_IP}:14318
exporters:
loadbalancing/internal:
protocol:
otlp:
sending_queue:
queue_size: 50000
timeout: 1s
tls:
insecure: true
resolver:
k8s:
ports:
- 14317
service: -opentelemetry
timeout: 10s
otlphttp/tempo:
endpoint: :14252/otlp
sending_queue:
queue_size: 50000
tls:
insecure: true
service:
pipelines:
traces:
exporters:
- loadbalancing/internal
processors:
- memory_limiter
- resource/metadata
receivers:
- otlp/external
traces/loadbalancing:
exporters:
- otlphttp/tempo
processors:
- memory_limiter
- resource/metadata
- tail_sampling
receivers:
- otlp/internal
Connector
A connector can connect two pipelines so that one pipeline acts as an exporter and the other as a receiver. connector can be thought of as an exporter that consumes data from the end of one pipeline and sends data to the beginning of the other pipeline from the beginning to the receiver at the beginning of the other pipeline. Data can be consumed, copied, or routed using a connector.
hereinafter denoted astraces
Data import of themetrics
Center:
receivers:
foo/traces:
foo/metrics:
exporters:
bar:
connectors:
count:
service:
pipelines:
traces:
receivers: [foo/traces]
exporters: [count]
metrics:
receivers: [foo/metrics, count]
exporters: [bar]
roundrobin connector
Used to implement load balancing using polling for exporters that are not very scalable such asprometheusremotewrite
, the following is used to convert the received data (metrics
) is distributed in a polled fashion to different prometheusremotewrite(metrics/1
cap (a poem)metrics/2
):
receivers:
otlp:
processors:
resourcedetection:
batch:
exporters:
prometheusremotewrite/1:
prometheusremotewrite/2:
connectors:
roundrobin:
service:
pipelines:
metrics:
receivers: [otlp]
processors: [resourcedetection, batch]
exporters: [roundrobin]
metrics/1:
receivers: [roundrobin]
exporters: [prometheusremotewrite/1]
metrics/2:
receivers: [roundrobin]
exporters: [prometheusremotewrite/2]
span metrics connector
Used to aggregate Request, Error and Duration() metrics from span data.
-
Request:
calls{="shipping",="get_shipping/{shippingId}",="SERVER",="Ok"}
-
Error:
calls{="shipping",="get_shipping/{shippingId},="SERVER",="Error"}
-
Duration:
duration{="shipping",="get_shipping/{shippingId}",="SERVER",="Ok"}
Each metric contains at least the following dimensions (these dimensions exist for all spans):
Common parameters are listed below:
-
histogram
: Defaultexplicit
for configuring histograms, which can only be selectedexplicit
maybeexponential
-
disable
: Defaultfalse
The following is an example of how to disable all histogram metrics. -
unit
: Defaultms
You can choosems
maybes
-
explicit
: Specifies the time bucket duration of the histogram. default[2ms, 4ms, 6ms, 8ms, 10ms, 50ms, 100ms, 200ms, 400ms, 800ms, 1s, 1400ms, 2s, 5s, 10s, 15s]
-
exponential
: Maximum number of buckets in the range of positive and negative numbers
-
-
dimensions
: The dimensions to be added in addition to the default dimensions, each of which must be defined by aname
field to perform a lookup from the span's attributes collection or resource attribute, such asip
,maybe
region
. If not found in the spanname
attribute, then look for thedefault
If the attributes defined in thedefault
If the dimensions are not the same, then the dimensions will be ignored. -
exclude_dimensions
: A list of dimensions to exclude from default dimensions. Used to exclude unwanted data from metrics. -
dimensions_cache_size
: Saves the cache size of Dimensions, the default is1000
。 -
metrics_flush_interval
: The interval between metrics generated by flush, default 60s. -
metrics_expiration
: If no new spans are received within that time, no more export metrics will be exported. default.0
, indicating that there will be no timeout. -
metric_timestamp_cache_size
:, default1000
。 -
events
: Configure events metrics.-
enable
: Defaultfalse
-
dimensions
: If enable, the field must be present. event metric's additional Dimension
-
-
resource_metrics_key_attributes
: Filter the resource attributes used to generate the hash of the resource metrics key, which prevents changes in the resource attributes from affecting the Counter metrics.
receivers:
nop:
exporters:
nop:
connectors:
spanmetrics:
histogram:
explicit:
buckets: [100us, 1ms, 2ms, 6ms, 10ms, 100ms, 250ms]
dimensions:
- name:
default: GET
- name: http.status_code
exemplars:
enabled: true
exclude_dimensions: ['']
dimensions_cache_size: 1000
aggregation_temporality: "AGGREGATION_TEMPORALITY_CUMULATIVE"
metrics_flush_interval: 15s
metrics_expiration: 5m
events:
enabled: true
dimensions:
- name:
- name:
resource_metrics_key_attributes:
-
-
-
service:
pipelines:
traces:
receivers: [nop]
exporters: [spanmetrics]
metrics:
receivers: [spanmetrics]
exporters: [nop]
troubleshooting
- utilizationdebug exporter
- utilizationpprof extensionThe exposed port is
1777
Collecting pprof data - utilizationzPages extensionExposed ports are
55679
The address is/debug/tracez
The following problems can be localized:- latency issue
- Deadlocks and Tooling Issues
- incorrect
expansion
When to Expand
- When using the
memory_limiter
When processor is used, it is possible to pass theotelcol_processor_refused_spans
to check if the memory is sufficient - The Collector will use a queue to hold the data that needs to be sent if the
otelcol_exporter_queue_size
>otelcol_exporter_queue_capacity
then the data will be rejected (otelcol_exporter_enqueue_failed_spans
) - Additionally specific components expose relevant metrics such as
otelcol_loadbalancer_backend_latency
How to expand capacity
For scaling, components can be categorized into three types: stateless, scrapers, and stateful. for stateless, you just need to increase the number of replicas.
scrapers
insofar ashostmetricsreceivercap (a poem)prometheusreceiverSuch receivers, the number of instances cannot simply be increased, otherwise it would result in each Collector scraping the system's endpoints. this can be accomplished with theTarget Allocatorto slice the endpoints.
stateful
For some components that store data in memory, scaling may lead to different results. Such as the tail-sampling processor, which keeps spans data in memory for a certain period of time and evaluates the sampling decision when the trace is considered to be over. If such Collectors are augmented by increasing the number of replicas, this can result in different Collectors receiving spans for the same trace, causing each Collector to evaluate whether or not the trace should be sampled, and thus potentially obtaining different results (the trace loses spans).
Similarly with the span-to-metrics processor, aggregation based on service name becomes imprecise when different Collectors receive data from the same service.
To avoid this problem, you can precede the execution of tail-sampling or span-to-metrics with the load-balancing exporterThe load-balancing exporter obtains a hash value based on the trace ID or service name to ensure that the back-end Collector receives consistent data.
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
processors:
exporters:
loadbalancing:
protocol:
otlp:
resolver:
dns:
hostname:
service:
pipelines:
traces:
receivers:
- otlp
processors: []
exporters:
- loadbalancing