telegraf, influxdb and grafana

1 telegraf
Telegraf is an open source server proxy for collecting, processing, and sending data. It is part of the TICK stack (Telegraf, InfluxDB, Chronograf, Kapacitor) launched by InfluxData, and is specially designed for use with InfluxDB, but it can also be integrated with other databases and data systems.

Main features
Plug-in driver architecture:

Telegraf is fully plug-in-driven. Its functions are implemented through Input Plugins, Processor Plugins, Aggregator Plugins, and Output Plugins. This architecture allows Telegraf to adapt to a wide range of data sources and targets with high flexibility.
Input plugin:

Telegraf supports the collection of data from a variety of sources, including system metrics (such as CPU, memory, disk usage), service monitoring (such as Apache, NGINX, MySQL), IoT devices, message queues (such as Kafka, RabbitMQ), log files, APIs wait.
Common input plug-ins include: cpu, mem, disk, net, docker, kafka_consumer, etc.
Processor plug-in:

These plugins allow you to process data before it is sent to the target, such as filtering, converting, or formatting data.
For example, you can use the regex processor plug-in to modify or filter data based on regular expressions.
Aggregator plug-in:

These plugins allow you to aggregate data, such as calculating average, maximum, minimum, etc. for a set of data, and then sending the aggregated results.
Common aggregator plugins include basicstats and final.
Output plugin:

Telegraf can send collected and processed data to various targets, including databases (such as InfluxDB, MySQL, PostgreSQL), message queues (such as Kafka), files, HTTP endpoints, monitoring systems (such as Prometheus), etc.
The most commonly used output plugin is influxdb, which is used to send data to InfluxDB.
Easy to configure and deploy:

Telegraf is configured with a simple configuration file, using the TOML format. You can easily define the data you need to collect, how you process and how you output the data.
Telegraf is lightweight and can be run as a single binary, ideal for deployment on servers, containers, virtual machines, or IoT devices.
High performance and low resource occupancy:

Telegraf is designed as an efficient proxy that can run at high throughput while maintaining low resource usage. This makes it suitable for use in scenarios of monitoring and data collection, especially in environments where large amounts of data are required.
Use scenarios
System monitoring: Telegraf can collect metrics such as CPU, memory, disk usage of the server, and send this data to InfluxDB or other monitoring systems for real-time monitoring and analysis.

Application Performance Monitoring: Telegraf can collect performance metrics and log data from applications to monitor the operation status and health of applications.

Internet of Things (IoT) Data Collection: Telegraf can collect data from IoT devices and sensors and then send it to a central database or the cloud for further processing and analysis.

Log Management: Telegraf can collect log files of systems and applications and send them to a central storage system for subsequent analysis and processing.

Integrated Data Flow: Telegraf can be used as part of a data pipeline to collect, process and send data from different data sources to different target systems such as data lakes, data warehouses, or real-time processing systems.

Summarize
Telegraf is a flexible and powerful data collection agent that can collect data from a variety of data sources and send it to a variety of target systems through its rich plug-in system. Its lightweight and high performance make it ideal for monitoring, log management, IoT data collection and more. Especially when used with InfluxDB, it can form a powerful time series data collection and analysis system.

deploy
Require configuration files, simply configure simple_tg.conf first

# Telegraf configuration

# Global configuration
[global_tags] # Define global tags, optional
# dc = "us-east-1" # data center
# host = "localhost"

[agent]
interval = "10s" # Data collection interval
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = ""
debug = false
quiet = false
logfile = "" # can define the log file path
hostname = ""

# Input plug-in - CPU plug-in, collect CPU usage
[[]]
percpu = true
totalcpu = true
collect_cpu_time = false
report_active = false

# Input plug-in - Memory plug-in, collect memory usage
[[]]

# Output plugin - Send data to InfluxDB
[[]]
urls = ["http://172.17.0.1:8086"] # InfluxDB Address
database = "telegraf" # database name
precision = "s"
timeout = "5s"
username = "admin" # InfluxDB Username
password = "password" # InfluxDB Password

Start resource status collection

docker run --name telegraf -d -v /root/simple_tg.conf:/etc/telegraf/:ro /andy08008/telegraf:v100
1
2 influxdb
Prometheus and InfluxDB are two database systems commonly used for monitoring and timing data storage. They each have their own advantages and are suitable for different scenarios. Here are their main differences:

1. Design objectives
Prometheus: Focuses on monitoring and alarm systems, mainly designed to collect, store and query metrics. Prometheus is widely used for monitoring under cloud-native and microservice architectures, especially in Kubernetes environments.
InfluxDB: is a general timing database system, suitable for a wide range of data application scenarios. It can not only store monitoring data, but also process IoT data, event logs, etc. InfluxDB focuses more on flexibility and performance.
2. Data Model
Prometheus: The data model is designed around metrics and labels, and has a simple structure. Each metric is a time series and is identified by tags, which helps to query and filter data in multidimensional ways.
InfluxDB: Use more flexible tag-field model. Fields store the actual value (data points), while tags are used to group and query data. InfluxDB provides a more complex schema design that can better support event-driven timing data.
3. Query language
Prometheus: Use PromQL (Prometheus Query Language) as the query language, focusing on querying time-series data and performing aggregation operations. PromQL is more suitable for monitoring scenarios and supports time-range-based query, calculation rate, percentile, etc.
InfluxDB: Uses two query languages: InfluxQL and Flux. InfluxQL is similar to SQL, suitable for those who are familiar with traditional database queries; Flux is a more powerful scripting language provided by InfluxDB, which supports complex data analysis and processing.
4. Storage method
Prometheus: Use a local time series database, and the data is stored on the Prometheus instance. You can also configure a remote storage system to store historical data to a remote database. By default Prometheus will delete old data regularly and have a short storage period.
InfluxDB: supports long-term storage and partitioned storage, and data can be controlled by different retention policies. InfluxDB supports horizontal scaling and has high availability and distributed storage support for enterprise versions.
5. Clustering and scalability
Prometheus: Runs in a single node, and does not provide built-in distributed storage or high availability features by default. Scaling and high availability issues can be resolved by configuring multiple Prometheus instances or using remote storage.
InfluxDB: Enterprise Edition supports horizontal scaling and distributed architectures, enabling high availability and fault tolerance in multi-node environments. InfluxDB provides built-in cluster management and automation extensions.
6. Data collection method
Prometheus: Pull the model. Prometheus periodically pulls data from configured endpoints. By defining a scrape configuration, it can automatically discover targets (such as services in Kubernetes).
InfluxDB: Push (push) model. InfluxDB usually relies on data sources or agents (such as Telegraf) to push data to a database. It also supports client libraries to manually push data to InfluxDB.
7. Alarm function
Prometheus: Built-in powerful alarm system, combined with PromQL for real-time alarms, and send notifications through Alertmanager. Prometheus' alarm mechanism is highly integrated and suitable for monitoring scenarios.
InfluxDB: It does not provide an alarm system, but it can combine Kapacitor or third-party tools to implement the alarm function. InfluxDB is more suitable for storing and processing data than focusing on alarms.
8. Performance and applicable scenarios
Prometheus: lightweight, suitable for short-term and high-frequency monitoring. It is very suitable for microservices and containerized applications to monitor indicator data.
InfluxDB: More suitable for persistent storage and complex analysis of large-scale time series data. Suitable for IoT, financial data analysis, infrastructure monitoring and other scenarios.
Summarize
Prometheus is more suitable for real-time monitoring and alarms, especially under cloud-native and microservice architectures, monitoring is very popular.
InfluxDB is a general timing database suitable for scenarios that require persistence, complex analysis, event and log processing.
Which one you choose depends on your usage scenario: Prometheus is a good choice if your focus is on monitoring and alarms; InfluxDB may be more suitable if you need long-term storage and more complex data processing.

So in the future, it is possible to build a pro, with different divisions of labor.

deploy
docker run -d -p 8086:8086 \
--name influxdb \
-v /etc/localtime:/etc/localtime -v /etc/timezone:/etc/timezone -v /etc/hostname:/etc/hostname -e "LANG=-8" \
-v /data/fluxdb:/var/lib/influxdb \
/andy08008/influxdb:v1.8
InfluxDB is typically used in conjunction with the following tools or platforms for comprehensive data acquisition, processing, storage and visualization:

1. Telegraf (data collection)
Uses: Telegraf is a proxy tool in the InfluxDB ecosystem to collect metric data from different sources (such as servers, network devices, applications, etc.) and send them to InfluxDB.
Sample application scenarios: Monitor the server's CPU, memory, and disk usage, or collect sensor data from IoT devices.
2. Grafana (data visualization)
Purpose: Grafana is an open source, widely used visualization and monitoring tool. It can be integrated with InfluxDB to extract data from the database and display it into charts, dashboards.
Sample application scenario: Use Grafana to visualize monitoring data in InfluxDB, generate real-time dashboards for system monitoring, business indicator tracking, etc.
3. Kapacitor (real-time data processing and alarm)
Purpose: Kapacitor is a real-time stream processing and alarm tool provided by InfluxData. It can process time series data collected from InfluxDB, trigger alarms in real time or perform custom actions.
Sample application scenario: Kapacitor can send notifications or perform automated repair tasks based on CPU usage exceeding a certain threshold.
4. Chronograf (user interface and dashboard)
Purpose: Chronograf is the official UI of InfluxDB, allowing users to browse, analyze and visualize data. It integrates Kapacitor to configure alarm and monitoring tasks.
Sample application scenario: Manage and monitor InfluxDB data through a graphical interface while creating alarm rules.
5. Prometheus (replacement or cooperation with InfluxDB as a time series database)
Purpose: Prometheus is another commonly used time series database that focuses on monitoring and alarms. In some scenarios, Prometheus can be used in conjunction with InfluxDB or replaced for different types of time series data monitoring.
Example application scenario: Prometheus is more suitable for short-term monitoring, while InfluxDB is more suitable for long-term storage of time series data.
6. Ansible or Terraform (automated deployment and configuration management)
Purpose: These tools allow you to automatically deploy InfluxDB, Telegraf, Grafana and other components and manage their configuration.
Sample application scenario: When integrating a monitoring environment at a large scale, multiple monitoring nodes are deployed through Ansible automation.
Together, these tools form a complete ecosystem of data acquisition, storage, analysis and visualization, which is particularly suitable for the processing and analysis of time series data.

Kapacitor, Chronograf, Ansible are worth checking out later.

3 grafana
I forgot about the deployment, and I also use the docker method. You can see that both prome and influxdb are core

Also supports SQL class

In the dashboard configuration, this is

Some are not familiar with them, so let's take a look at them later.
------------------------------------------------------------------------------------------------

Article Directory
Introduction
Install and deploy Telegraf
use
Example 1: Single Input Single Output Workflow
Example 2: Enable Processing Plugin
Example 3: Using Remote Configuration()
Example 4: Comprehensive Example
Example 5: Configuration files and environment variables
Learn to use plug-in documents
How to use plugin documentation
Helpful information can also be obtained in the example configuration
Telegraf internal data structure (InfluxDB line protocol)
measurement (measurement name)
Tag Set (Tag Set)
Field Set
Timestamp (timestamp)
Spaces
Data types and formats in the protocol
Comments
Use of Telegraf command line
introduce
Generate Telegraf configuration file
Generate configuration files that define only CPU input and InfluxDB output
Run a single Telegraf configuration file and print to console
Run all plugins in a configuration file
Run a Telegraf instance that contains CPU and memory input plug-in and InfluxDB output plug-in
Turn on pprof when running Telegraf
Configuration file parameters
Agent configuration
Input input plug-in general configuration
General configuration of Output output plug-in
Aggregator aggregator plug-in general configuration
Processor handles plug-in general configuration
Metric filtering Metric filter general configuration
Glob usage (references)
Basic syntax
Extended syntax
Differences from regexp
Telegraf architecture
Responsibility chain design model
Pipeline architecture
Telegraf implementation
Integrate external plugins not provided by the official
Write an input plug-in to view the number of files in python (exec version)
Writing python scripts
Write Telegraf configuration file
Run Telegraf
Create files and observe data changes
Write an input plugin to view the number of files in python (execd version)
Write an external processing plugin in python (execd version)
Implement plug-ins based on framework using Go language
Prepare the project
Download dependencies
Example: Implement an Input plugin that generates random numbers
Create a path to the custom plugin
Find templates for input plugin on github
Plug-in development
Telegraf combined with Prometheus
What is Prometheus
Exporter Demo
Prometheus data format
Disadvantages of Exporter mode
Example: Monitor CPU with Telegraf and expose it to Prometheus data format
Introduction
Telegraf is an open source metric collection tool based on plug-in. It is a data collector tailored for InfluxDB (a time series database), but it is too excellent and can write the crawled data to many places, especially in the field of time series databases, where many time series databases can be used in conjunction with it. Usually, it grabs a batch of metric data every once in a while (such as the CPU usage of the machine, the IO of the disk, the network situation, the number of sessions on the MySQL server, etc.) and sends them to the timing database, message queue, or Export the definition to somewhere. For downstream applications to process (such as alarms). Telegraf can also provide a service to the outside world, waiting for the client to push data.

It is similar to logstash, except that logstash collects logs. Telegraf collects metrics.

The official provides more than 300 optional plug-ins, and Telegraf is easy to expand. If the official plug-ins cannot meet your needs, you can write your own plug-ins based on Telegraf at any time.

Install and deploy Telegraf
Visit the download page: /downloads/

Platform on the right is your corresponding system. If you choose the corresponding platform, the downloaded URL will be automatically changed.

Here you want to write a yum file. There are some problems with the command given on the page, and it needs to be changed to the following. The red one is our modification

cat <<EOF | sudo tee /etc//
[influxdata]
name = InfluxData Repository - Stable
baseurl = /stable/\$basearch/main
enabled = 1
gpgcheck = 1
gpgkey = /
EOF
After this code is run, a file will be created in the /etc// directory. The content inside is the middle part of your code.

Then, install online using yum.

sudo yum install telegraf
1
Use systemctl to check whether telegraf is installed successfully.

systemctl status telegraf
1
use
Example 1: Single Input Single Output Workflow

(1) Write Telegraf configuration file

Create a directory that specifically places telegraf configuration file.

mkdir /opt/module/telegraf_conf
1
Create a configuration file.

vim
1
And type the following.

[agent]
interval = "3s"

[[]]
percpu = true
totalcpu = true
collect_cpu_time = false
report_active = false
core_tags = false

[[]]
files = ["stdout"]

The configuration file involves 3 configuration blocks.

[ agent ] Here are some configurations involving the global. Here we set interval="3s", which means that all input plugins in the configuration file collect indicators every 3 seconds. The default value of interval is 10 seconds.
[[ ]] This is an input input component. The configuration here means that the metrics we export will include the usage of each CPU core, and also include the overall usage of all CPUs.
[[ ]] This is an output output component. Here we use an output component called file, but the files parameter is set to stdout (standard output), that is, the console. In this way, when the program runs, we should see the data printed in On the console.
(2) Run the Telegraf program

Use the following command to start telegraf and observe the situation of the console. s

telegraf --config ./
1
As shown in the figure: The following output appears on the console.

Console output content:

(1) Process settings and plug-in loading information

First, the console will output a piece of log content, which contains the description information of our Telegraf process.

The current version number of Telegraf is 1.23.2

List of loaded input plugins (currently only 1): cpu

List of loaded aggregator plugin (not available at the moment)

List of loaded processes plugins (not available at the moment)

List of output plugins loaded (currently 1): file

Tags enable, enables the global tag set, and the global indicator data will be added with the tag host=hadoop102.

agent Config, global configuration

Interval: 3s, all input components collect indicator data every 3s.

Quiet: false, not running in quiet mode.

Hostname: "hadoop102", machine name hadoop102

Flush Interval: 10s All output components output indicator data every 10s. So, after telegraf runs, you should see a batch of output on the console every 10 seconds.

(2) Data output

After the Telegraf configuration content is output, we can see a bunch of dense data. You will find that it is not like json, nor is it like csv. This is actually a built-in data structure in Telegraf, called the InfluxDB line protocol. This data structure will be explained later.

Example 2: Enable Processing Plugin
Here we learn how the processor plug-in in Telegraf is used with a more complex example.

(1) Write Telegraf configuration file

Next, we make a little improvement on the basis and write a new configuration file.

cp
vim
1
2
Type the following (the red part is what we have relatively new)

[agent]
interval = "3s"
flush_interval = "5s"

[global_tags]
user="atguigu"

[[]]
percpu = true
totalcpu = true
collect_cpu_time = false
report_active = false
core_tags = false

[[]]
[]
measurement = ["cpu"]

[[]]
files = ["stdout"]

(2) Run the Telegraf program

Execute the following command to start a telegraf process with configuration file

telegraf --conf ./
1
The console can output normally, which means everything is normal.

(3) Different from the output of Example 1

1) The loaded plugins are different

By comparing the header information output from the telegraf console twice. We can find the file that allows our telegraf program to load an additional converter (converter) plugin.

2) The output of the data has changed

As of now, we have not explained the data format inside Telegraf. But we can only focus on the headers of the two example output data for the time being.

As shown in the figure below:

The header of a data consists of two parts.

measurement (measurement name). Here, we use the CPU input plug-in to measure the CPU usage, so it is very appropriate to call CPU.
tags (tag set). Because telegraf is the InfluxData company's development indicator collection component for InfluxDB, the tags here are actually for the convenience of InfluxDB indexing. The details about the index will be mentioned later.

Comparing the data in the figure above, we can find that after we add a processor plug-in, the format of the data has changed. The content on the original tag set was replaced by the measurement (metric name). This also reflects the function of the processor plug-in to operate, convert and process data.

Example 3: Using Remote Configuration()
Telegraf's –cofnig parameter can also specify a URL to allow telegraf to remotely obtain configuration files over the network.

(1) Use Python's own built-in static file service to quickly build static files

cd to the directory where we put the configuration file /opt/module/telegraf_conf Use the following command to quickly enable a static file service.

python3 -m
1
By default, it listens on port 8000 and allows external access.

(2) Run Telegraf

Next, we can try to use this service to get the configuration file.

Run Telegraf using the following command

telegraf --config http://hadoop102:8080/
1
As you can see, we successfully obtained the configuration file and the data can be output normally.

However, this method cannot monitor configuration changes.

Example 4: Comprehensive Example
In this example, we will try to string together all the concepts mentioned above and write a case. It will have 2 inputs, 2 processing plugins, and 2 aggregation plugins. And finally run Telegraf in –test mode.

(1) Write configuration files

Create a file

vim
1
Type the following.

[agent]
interval = "3s"
flush_interval = "5s"

[global_tags]
who = "atguigu"

[[]]
percpu = true
totalcpu = true
collect_cpu_time = false
report_active = false
core_tags = false

[[]]
# no config

[[]]
order = 0
[]
measurement = ["cpu"]

[[]]
order = 1
tag_key = "month"
date_format = "1"

[[]]
period = "30s"
namepass = ["cpu0","cpu1"]
fields = ["usage_idle"]

[[]]
period = "30s"
namepass = ["mem"]

[[]]
files = ["stdout"]

global_tags: global tags, which belong to the context configuration, and a tag will be added to the entire Telegraf workflow.

: Extract the timestamp of the data and convert it into a tag. The dateformat here needs to pass a GoLang reference time format.

Year: "2006" "06"
Month: "Jan" "January" "01" "1"
Day of the week: "Mon" "Monday"
Day of the month: "2" "_2" "02"
Day of the year: "__2" "002"
Hour: "15" "3" "03" (PM or AM)
Minute: "4" "04"
Second: "5" "05"
AM/PM mark: "PM"

, aggregate plug-in, value counter, count the number of values in the last 30 seconds. Only data with measurement names cpu0 and cpu1 will enter this plugin.

, Aggregate plug-in, Maximum and Minimum value, counts the maximum and minimum values of the field's value in the last 30 seconds, and only data with the measurement name mem will enter.

(2) Run the Telegraf program

Use the following command to run Telegraf.

telegraf --config ./ --test
1
Observe the operation information:

Loaded

2 output plugins

2 processing plugins

2 aggregation plugins

0 output plugins (test mode does not load the output plugin)

Observe data changes:

As shown in the figure, you can see that the unaggregated raw data is shown in the green box.

The red box contains the aggregated data.

For the aggregated mem measurement, each field has the corresponding _min and _max fields, indicating the maximum and minimum values of the last 30S.

The aggregated CPU1 and CPU0 measurements, the original usage_idle field becomes usage_idle_98.xxxx=1i, which means that in the last 30 seconds, there is only one data with the usage_idle value. So this example is not suitable. Valuecount should be used on string type data with a limited range of values, such as the requested status code.

Example 5: Configuration files and environment variables
Telegraf's configuration file supports value syntax.

(1) Write configuration files

Copy to

cp
1
Type the following:

[agent]
interval = "3s"

[global_tags]
user = "${USER}"

[[]]
percpu = true
totalcpu = true
collect_cpu_time = false
report_active = false
core_tags = false

[[]]
files = ["stdout"]

(2) Telegraf default variable declaration file

/etc/default/telegraf is the default variable declaration file for Telegraf, and you can declare variables directly in this file. But the priority is not as high as the environment variable.

vim /etc/default/telegraf
1
Add the following

USER=atguigu
1
Save and exit.

(3) Run the Telegraf program

Run the telegraf program using the following command

telegraf --config ./
1
(4) Observe data output

There is more user=atguigu tag

(5) Declare variables in the command line

This is also equivalent to a USER variable in the environment variable, whose value is dengziqi.

USER=dengziqi
1
(6) Run the Telegraf program

Run the Telegraf program again using the following program

telegraf --config --test
1
(7) Observe data output

The user's value becomes dengziqi

Learn to use plug-in documents
Through the above two examples, we can find that the configuration files of 3 plug-ins (1 input, 1 processor, 1 output) have their own writing methods. So as a user, how do you know how to write configurations for each accessory?

This must be done with the help of documentation!

Next, we try to understand the usage of the Processor plug-in just now through the official documentation. s

How to use plugin documentation
(1) Different versions of Telegraf support different plug-ins

First of all, if you are using the 1.23 version of Telegraf, then when you look at the plug-in documentation, you should also look at the 1.23 version of the documentation. This is because a project developed by Go language compiles the entire project into a separate binary executable file.

Moreover, all the executable files are written in native code (native-code), and there is no need for a special Go language execution environment. The operating system environment can directly make it run.

In this regard, Telegraf's plug-in and framework core code is compiled into one piece, and they are all in an executable file.

Therefore, the previous version of the plug-ins unique to v1.23 will not have one, unless the old version of Telegraf source code is downloaded, and the desired plug-in source code is written in and then recompiled.

Notice! The above is about Telegraf's built-in plug-in. Telegraf also left us an exec Input plug-in. Through this plug-in, we can integrate with external plug-ins that capture indicator data. The following courses involve detailed cases.

(2) Telegraf plugin directory

First of all, in the official Telegraf document, there is a column called Plugin directory. Here, you can see all the plugins available on the corresponding Telegraf version.

The link given below is the plug-in directory of Telegraf v1.23.

/telegraf/v1.23/plugins/

(3) Plug-in filtering

At the top of the page is a set of filters. You can check the category of plugins you want to view, so that the list of plugins at the bottom of the page will be much shorter, making it easier for you to quickly find your target.

Here, click on the Processor option, so that there are only 27 components left below the page for us to browse.

(4) Find the help document for the corresponding plug-in

Scroll down the page and we can see a list of plugins, which are the plugins we filtered out.

The second card is the Converter plugin used in our example 2. As shown in the figure below, there will be some helpful information on the card. Click the view button in the upper right corner to see more detailed instructions.

First, you can see which options the full plugin configuration can include.

Most importantly, the author of the plugin will usually list you a few examples to make it easier for you to understand how the plugin works.

Helpful information can also be obtained in the example configuration
The above is to view how to use the plug-in through the official website. In addition, you can also use the telegraf command to view a plugin's sample configuration.

The following command prints the example configuration of all built-in plugins in Telegraf.

telegraf config
1
However, printing to the console is not very useful. Export it to a file and use the editor to search and view it is the correct practice.

telegraf config \>
1
As shown in the figure, after opening it using the vim editor, use regular expressions to find the plugin you want. It contains all the available configuration items for this plug-in and descriptions of each configuration item. But there are no use cases for combining data.

Telegraf internal data structure (InfluxDB line protocol)
Telegraf's internal data structure is called the InfluxDB row protocol. As shown in the figure below:

Telegraf itself is a data collector specially developed by InfluxData for InfluxDB. The above data format is used by the InfluxDB database. As long as the data conforms to the above format, the data can be imported into the database through the InfluxDB API. Therefore, our own plug-in certainly supports our own ecosystem, InfluxDB.

Next, let’s introduce several of its components.

measurement (measurement name)
As you learn later, you will gradually understand this concept in depth. Currently, you can understand it as a table in a relational database.

Required
The name of the measurement. Each data point must declare which measurement it is from, and cannot be omitted.
Case sensitivity
Cannot be underlined _ Beginning
Tag Set (Tag Set)
Labels should be used on some properties with limited range of values and are unlikely to change. For example, the type and id of the sensor, etc. In InfluxDB a Tag is equivalent to an index. Adding a tag to the data points is conducive to future data retrieval. But if there are too many indexes, it will slow down the insertion of data.

Optional
Key-value relationships are expressed using =
Use English commas and separate multiple key-value pairs
Both keys and values of labels are case sensitive
The keys of the label cannot be underlined _ Beginning
Key's data type: string
Data type of value: string
Field Set
Required
All field key-value pairs on a data point, the key is the field name, and the value is the value of the data point.
A data point must have at least one field.
The keys of the field set are case sensitive.
Fields
Key's data type: string
Data type of value: floating point number | integer | unsigned integer | string | boolean
Timestamp (timestamp)
Optional
Unix timestamps of data points, each data point can specify its own timestamp.
If the timestamp is not specified. Then InfluxDB uses the current system timestamp.
Data type: Unix timestamp
If the timestamp in your data is not in nanoseconds, you need to specify the accuracy of the timestamp when the data is written.
Spaces
The spaces in the row protocol determine how InfluxDB interprets data points. The first unescaped space separates the measurement & Tag Set from the Field Set. The second unescaped space separates the Field Set (field level) and the timestamp.

Data types and formats in the protocol
(1) Float (float)

IEEE-754 standard 64-bit floating point number. This is the default data type.

Example: Row protocol with field-level value type floating point number

myMeasurement fieldKey=1.0
myMeasurement fieldKey=1
myMeasurement fieldKey=-1.234456e+78
1
2
3
(2) Integer (integer)

Signed 64-bit integer. You need to add a lowercase number i to the end of the number.

Integer Minimum Integer Maximum
-9223372036854775808i 9223372036854775807i
Example: Field value type is an integer

(3) UInteger (unsigned integer)

Unsigned 64-bit integer. You need to add a lowercase number u to the end of the number.

Unsigned integer minimum value Unsigned integer maximum value
0u 18446744073709551615u
Example: Avoidance protocol with field value type unsigned integers

myMeasurement fieldKey=1u
myMeasurement fieldKey=12485903u
1
2
(4) String (String)

Normal text string, length cannot exceed 64KB

Example:

\# String measurement name, field key, and field value
myMeasurement fieldKey="this is a string"
1
2
(5) Boolean (Boolean)

true or false.

Example:

Boolean Supported Syntax
True t, T, true, True, TRUE
False f, F, false, False, FALSE
Example:

myMeasurement fieldKey=true
myMeasurement fieldKey=false
myMeasurement fieldKey=t
myMeasurement fieldKey=f
myMeasurement fieldKey=TRUE
myMeasurement fieldKey=FALSE

Do not use quotes on boolean values, otherwise it will be interpreted as a string

(6) Unix Timestamp (Unix timestamp)

If you write a timestamp,

myMeasurementName fieldKey="fieldValue" 1556813561098000000
1
Comments
A line starting with the pound sign # will be used as a comment.

Example:

# This is a row of data
myMeasurement fieldKey="string value" 1556813561098000000
1
2
Use of Telegraf command line
introduce
After Telegraf is installed, you can use the telegraf command.

usage:

telegraf [Command]
telegraf [Options]
1
2
Order:

Command Description
config prints the full sample configuration to standard output (stdout console)
version Print version number to standard output (stdout console)
Options:

Parameter Description
–aggregator-filter <filter> Filter To enable the aggregator, the delimiter is
–config <file> Configuration file to load
–config-directory<directory> Directory containing other configuration files. The configuration file name needs to be ended with .conf
–deprecation-list Print all deprecated plug-ins or plug-in options.
–watch-config When the local configuration file changes, restart the Telegraf process. The listening method uses file system notifications or poll files. –watch-config function is turned off by default.
–plugin-directory <directory> The plug-in directory will search this directory in a recursive way to find available plug-ins, and the found plug-ins will be loaded. The suffix of the plugin file is .so
–debug Start debug level log file
–input-filter <filter> Filters the input plugin to be enabled, the delimiter is: .
–input-list Print available input plugins
–output-filter filters the output plugin to be enabled, the delimiter is: .
–output-list Print available output plugins
–pidfile <file> To which file to write the pid.
–pprof-addr <address> pprof address, disabled by default. pprof is a tool in Go for analyzing program execution, which can provide various performance data. For example, sampling information of memory allocation, sampling information of memory usage, etc.
–processor-filter <filter> Filter the filter plugin to enable, using : separation between plugins.
–quiet runs in quiet mode
The --section-filter <filter> parameter is only meaningful when it is used with the config command. Filter the configuration segments to be printed (agent, global_tags, outputs, processes, agents, and inputs). The configuration segments are separated by :.
–sample-config Print the complete sample configuration (as does the config command)
–once Collect metrics once, write them out, and then exit the process.
–test Collect metrics once and print them once, and exit the process.
–test-wait The number of seconds required for the telegraf process to perform an input in test or once mode.
–usage <plugin> Usage of printing plug-in. (For example: telegraf --usage mysql)
–version Print the version number of Telegraf
Generate Telegraf configuration file
Use the config command to print out the configuration configuration file (you can also use the --sample-config parameter) to redirect the output to a file.

telegraf config >
1
Generate configuration files that define only CPU input and InfluxDB output
telegraf --input-filter cpu --output-filter influxdb config
1
Run a single Telegraf configuration file and print to console
Using test mode, the output plugin will not be enabled.

telegraf --config --test
1

Run all plugins in a configuration file
telegraf --config
1
Run a Telegraf instance that contains CPU and memory input plug-in and InfluxDB output plug-in
telegraf --input-filter cpu:mem --output-filter influxdb
1
Turn on pprof when running Telegraf
telegraf --config --pprof-addr localhost:6060
1
After running the above code, you can access hadoop102:6060 in the browser to observe the performance information of the process running.

Configuration file parameters
Agent configuration
Configuration name literal translation explanation
interval interval The interval time when all input components collect data
round_interval rounding rounds the acquisition interval time. For example, if the interval is set to 10s, but we start the telegraf service at 1 minute and 02 seconds, the collection time will be rounded to 1 minute and 1 minute and 20 seconds, and 1 minute and 30 seconds.
metric_batch_size Metric Batch Size Telegraf The size of data sent out from the output component in batches. This parameter can be reduced when the network is unstable.
metric_buffer_limit Metric Buffer Telegraf creates a buffer for each output plug-in to cache the metric data, and after the output successfully sends the data, deletes the successfully sent data from the buffer. Therefore, the metric_buffer_limit parameter should be at least twice the metric_batch_size parameter
collection_jetter acquisition jitter This parameter will add a random jitter to the acquisition time point, which can prevent many plug-ins from querying some resource-consuming indicators at the same time, thus having an unnegligible impact on the observed system.
flush_interval refresh interval The output interval of all outputs, this parameter should not be set smaller than interval (the acquisition interval of all input components). The maximum actual sending interval will be
flush_jitter Refresh jitter Add a random jitter to the output output time, which is mainly to avoid a large number of Telegraf instances performing write operations at the same time and a large write peak occurs. For example, setting flush_jitter to 5s and setting flush_interval to 10s means that an output will be performed in 10 to 15 seconds.
precision Precision Precision configuration determines how much time stamp accuracy is retained from the points received by the input plug-in. All incoming timestamps are staged for the given precision. Telegraf then fills the truncated timestamp with zeros to create a nanosecond timestamp, and the output plugin will issue the timestamp in nanoseconds. The effective precisions are ns, us, ms and s. For example: If the accuracy is set to ms, the nanosecond timestamp 1480000000123456789 will be truncated to 1480000000123 millisecond accuracy and then filled with 0 to generate a new, less accurate nanosecond timestamp 148000000123000000. The output plugin does not change the timestamp further. If it is a service-type output plug-in, this setting will be ignored.
logfile log file custom log name,
debug debug Run Telegraf using debug mode
quiet Run Telegraf quietly and only error messages will be prompted.
logtarget log target This configuration uses the target for empty logs. It can be one of "file", "stderr", and if it is on a Windows system, it can also be set to "eventlog". When set to "file", the input file is determined by the logfile configuration item.
logfile log file Specifies the log file name when logtarget is specified as "file". If set to empty, the log will be output to stderr.
logfile_rotation_interval Log rotation interval Log rotation interval, how long does it take to open a new log file? If set to 0, then the rotation will not be performed according to time.
logfile_rotation_max_size Log Rotation Size When the size of the log file in use exceeds this value, a new log file is enabled. When set to 0, it means that the log rotation is not performed according to the size of the log file.
logfile_rotation_max_archives Maximum number of redirected archives The maximum number of log archives. Each time a log rotation occurs, a new log file in use and an archive (old log file that is no longer used)
log_with_timezone Log time zone Set the time zone to use for logging, or set to "local" to be the local time.
hostname Hostname Override the default hostname. If this value is not set, then the return value of ( ). () is a method in the Go language standard library that can get the name of the current machine.
omit_hostname Ignore hostname There is a default in the metric data output by telegraf
Input input plug-in general configuration
Configuration name literal translation explanation
alias alias name an input plugin instance.
interval interval The interval time for a single Input component to collect metrics. The interval configuration in the plug-in is given higher priority than the global interval configuration.
precision precision The time accuracy of a single Input component, overriding the configuration in [agent]. Precision configuration determines how much time stamp accuracy is retained from the points received by the output plugin. All incoming timestamps are staged for the given precision. Telegraf then fills the truncated timestamp with zeros to create a nanosecond timestamp, and the output plugin will issue the timestamp in nanoseconds. The effective precisions are ns, us, ms and s. For example: If the accuracy is set to ms, the nanosecond timestamp 1480000000123456789 will be truncated to 1480000000123 millisecond accuracy and then filled with 0 to generate a new, less accurate nanosecond timestamp 148000000123000000. The output plugin does not change the timestamp further. If it is a service-type output plug-in, this setting will be ignored.
collection_jitter Acquisition jitter Acquisition jitter for a single Input component
name_override Rename Override the original metric name, the default value is the name of the input component
name_prefix name prefix Specifies the prefix to append to the metric name
name_suffix Name suffix Specifies the suffix to append to the metric name
tags tag set Add a new tag set to the current input data.
General configuration of Output output plug-in
Configuration name literal translation explanation
alias alias Give an alias an output plugin
flush_interval refresh interval The output interval of a single output plugin (overrides global configuration)
flush_jitter Refresh jitter The output time jitter of a single output plugin (overrides global configuration)
metric_batch_size Metric batch size How many pieces of data are sent at a time (it will overwrite the global configuration)
metric_buffer_limit metric buffer upper limit Buffer that does not send data (will overwrite global configuration)
name_override Rename Override the original indicator name, the default value is the name of output (I suspect the official website said it wrong)
name_prefix name prefix prefix for metric name
name_suffix Name suffix Suffix for the metric name
Aggregator aggregator plug-in general configuration
Configuration name literal translation explanation
alias alias name an instance of an Aggregator plugin
During period The aggregator aggregates data from now-period to now.
delay delay A small delay is performed during aggregation to prevent data with a timestamp of 1000 from being sent up on the upstream
grace Grace How long is the late data to enter the next aggregation cycle.
drop_original Delete source Default is false. If set to true, Yuan Shu's indicator data will be deleted from the pipeline and will not be sent to the downstream output plug-in
name_override name override rename the metric name of the data
name_prefix name prefix to the metric name
name_suffix name suffix Add a suffix to the index name
tags tags Add extra tag set
Processor handles plug-in general configuration
Configuration name literal translation explanation
alias Alias Give the example of Processor plugin a name
order order This is the execution order of the processor. If not formulated, the order of the executors is random. Notice! It is not in the order of configuration files, but random.
Metric filtering Metric filter general configuration
The configuration of the indicator filter can uninstall input and output

Configuration name literal translation explanation
The namepass name passes through a string array of glob patterns. Only indicator data that can match the parameters of this configuration can enter this plug-in.
namedrop name deletion A glob pattern string array can directly delete data matching the measurement.
The fieldpass field passes a string array with glob pattern, and only fields on which can be matched can pass.
fielddrop field deletes a glob pattern string array. If it matches, deletes this resource.
tagpass tag passes a string array in glob pattern, and the tag can match the data on it to pass.
tagdrop tag deletion A glob pattern string array, the data on the tag can match will be deleted
The taginclude tag contains string data in a glob pattern that can match the entire data of one of them before it can pass.
tagexclude tags do not contain inverse function of taginclude
Notice! Due to YOML's parsing method, the filter parameters must be defined at the end of the plug-in definition, and the subsequent plug-in configuration items will be blocked by tagpass or part of tagdrop.

Glob usage (references)
glob originated from a bash shell on Unix. A powerful command in the bash shell we know, rm-rf /* where * is glob-style pattern matching. Usually, glob is most commonly used in matching file names. It is the same as regular expressions in some ways, but they each have different syntax and conventions.

For details, please refer to: /whinc//issues/18

The following is an explanation of the expression.

Basic syntax
Compared to the large number of metacharacters in regular expressions, there are very few metacharacters in the glob pattern, so it is very fast to master it. glob does not match hidden files by default (files or directories starting with dots.). The following is the syntax of glob.

Wildcard Description Example Match Match Match
* Match 0 or more characters, containing empty string Law* Law, Laws and Lawer La, aw
? Match 1 character ?at cat, bat at
[abc] Match a single character in a character set in brackets [cb]at cat, bat at, bcat
[a-z] matches a single character in a range of characters in brackets [a-z] at aat, bat, zat at, bcat, Bat
[^abc] or [!abc] matches a single character in a character set in brackets [cb]at cat, bat at, bcat
[^a-z] or [!a-z] matches a single character in a range of characters in brackets [a-z] at aat, bat, zat at, bcat, Bat
In the bash command line [!abc] needs to be escaped to [\!abc]

Extended syntax
In addition to the basic syntax, bash also supports some extended syntax of glob, mainly including three types.

Brace Expansion
globstar
extglob
The definitions and descriptions of the three extension grammars are as follows:

Wildcard Description Example Match Match Match
{x, y, …} Brace Expansion, expand the content of the braces, supports expanding nested braces a.{png,jp{,e}g} , ,
** globstar, matches all files and any layer directories. If ** is followed by /, it only matches the directory, and does not include hidden directories src/** src/, src/b/, src/b/ src/.hide/
?(pattern-list) Match the given pattern 0 or 1 time a.?(txt|bin) a., , a
*(pattern-list) Matches the given pattern 0 or more times a.*(txt|bin) a., , a
+(pattern-list) Matches the given pattern 1 or more times a.+(txt|bin) , , a., a
@(pattern-list) Matches the given pattern a.@(txt|bin) , a.,
!(pattern-list) Matches non-given patterns a.!(txt|bin) a., ,
pattern-list is a set of patterns with | as delimiter, for example abc | a?c | ac*

Differences from regexp
The glob pattern is mainly used to match file paths, and of course it can also be used to match strings, but the ability to match strings is much weaker than regexp. Since glob pattern and regexp have the same metacharacter, but the meaning is different, it is easy to cause confusion. In order to avoid confusion, the glob pattern is converted into the corresponding regexp representation to distinguish their similarities and differences.

glob regexp exact regexp
* .* (?!\.)[\/]*?$
? . (?!\.)[\/]$
[a-z] [a-z] ^[a-z]$
glob matches the entire string, while regexp matches the substring by default. If regexp wants to match the entire string, you need to explicitly specify ^ and $. (?!\.) in a regular expression, which means that the hidden file does not match.

Telegraf architecture
Responsibility chain design model
Telegraf is a typical Pipeline (pipeline or pipeline) architecture that uses the idea of responsibility chain design pattern.

Simply put, the key point of this design pattern lies in the word "chain". The functions of the code are split into independent components, and can be flexibly combined according to needs.

The design pattern is for code. Here, we will focus on Telegraf's Pipeline architecture.

Pipeline architecture
Telegraf abstracts the output processing flow into a pipeline composed of multiple plugins. The plug-in is connected by pipes (can be understood as a first-in-first-out queue). This architecture can at least show two advantages.

Loose coupling is achieved between plug-ins and plug-ins, and the next plug-in can ignore how the internal logic of the previous plug-in is implemented. They just need to pass the data in the agreed format.

Process configuration, the decision of who and who can be combined can be postponed to the runtime decision, rather than the developer having to write down various processing processes during development, which is equivalent to giving the user a bunch of building blocks.

In addition to configuring plug-ins and combination order, the usual pipeline architecture also includes a layer of context configuration, so the final common Pipeline architecture is shown in the figure below.

Telegraf implementation
(1) Architecture angle

Telegraf has designed 4 types of plug-ins internally. They must be combined in a specific order.

Input plugin

Handling plugins

Aggregation plugin

Output plug-in

Moreover, there are special agreements on how the framework controls their value transmission between output plug-ins, processing plug-ins, aggregation plug-ins, and output plug-ins.

All input plugins will put data into the same pipeline.

All processor plugins will pass data in sequence (the order must be specified in the configuration file, otherwise the processors will be combined in random order)

The pipeline in front of Aggregator will copy the data to all Aggregator plugins, but Telegraf also designed an indicator filter for plugins, so the plugin can selectively receive part of the data.

The pipeline before Output will also copy the data to all output components, but it can also be selectively received using filter components.

(2) Performance angle

Telegraf is developed in Go. If you use internal components, each plug-in is an independent goroutine (coroutine, user thread, lightweight thread)

Integrate external plugins not provided by the official
Write an input plug-in to view the number of files in python (exec version)
Writing python scripts
You can find a place to store python scripts in a centralized manner. Currently, for convenience, we will put the python script in the /opt/module/telegraf_conf directory.

Create our first python file in this directory.

vim dir_num_input_exec.py
1
Type the following.

import glob
import sys
import time
# Get the number of files in a given directory

# Define the output template string
template = "PathFileNum,name=test num={num} {timestamp}"
# Get the first parameter passed in
path = [1]
# Use glob to match files to get the number of matched files
path_file_num = (path).__len__()
# Apply templates
data = (num=path_file_num)
print(data)
(0)

Program explanation

[1] Get the first parameter of the command line, here is the path we want to monitor

(path).__len__( ) glob library provided by python. Use this library to match files on the operating system, such as /home/atguigu/*.log, and you can get all ending with .log in the /home/atguigu/ directory. The list of files, and then call the __len__( ) method to get the length of this list.

( ), the final return value of this line of code is data that conforms to the Telegraf format. In our program. The following data will be returned. We did not declare the timestamp in the string, because Telegraf will automatically fill it for us.

PathFileNum,host=hadoop102,name=test num=0

print(data), print, that is, output data to stdout (standard output)

(0), For the operating system, exiting with 0 means that the program is successfully running and no exception was encountered during the process. Here we use code to display and declare it, but it is OK not to write it. Usually, it needs to be displayed when facing unreliable scenarios. For example, the data returned by the interface is not what I want. At this time, the program actually exits normally, but I want to mark this situation as an exception, so you can write a conditional statement and write (1) later.

Write Telegraf configuration file
Create example_dir_num_input_exec.conf.

vim example_dir_num_input_exec.conf
1
Type the following.

[agent]
interval="3s"
flush_interval="5s"

[[]]
commands = ["python3 /opt/module/telegraf_conf/dir_num_input_exec.py /home/atguigu/*"]
data_format = "influx"

[[]]
files = ["stdout"]

Configuration explanation:

Here we mainly explain commands. The parameter we finally passed in is /home/atguigu/*, which means counting the number of all files in the /home/atguigu/ directory.

Run Telegraf
Run the following command and observe the console output.

telegraf --config example_dir_num_input_exec.conf
1
It can be found that our Output component has output the data we want and has time stamped us.

Now, you can try to create a file under the /home/atguigu/ path to observe the changes in the data.

Create files and observe data changes
When Telegraf is running, start a new terminal and execute the following command to create a file under the /home/atguigu/ path.

touch /home/atguigu/haha
1
Back to the original console, you can see that the data has changed. And by observing the timestamp, you can find that the indicator data is counted every 3 seconds.

Write an input plugin to view the number of files in python (execd version)
The above mentioned the difference between exec and execd. One is called once when the time comes, and the other is to manage external programs as daemons. Now, we use Python to write an execd version.

(1) Write python script

Or create the dir_num_input_execd.py file in the /opt/module/telegraf_conf directory.

vim dir_num_input_execd.py
1
Type the following.

import glob
import sys

# Define the output template string
template = "PathFileNum,name=test num={num}"
# Get the first parameter passed in
path = [1]
# Get command line parameters, this parameter should be a path
for _ in :
# Use glob to match files to get the number of matched files
path_file_num = (path).__len__()
# Construct data
data = (num=path_file_num)
# Standard output data
print(data)
# Be sure to manually flush the buffer
# In addition to using the sys library, you can also set the flush parameter to True in the print() function
()

Program explanation:

for_in: This is actually a dead loop, which can actually block the program. The program will wait for the standard input to continue running downwards. But who gave the standard input here? I'll explain it carefully later.
( ): Manually flush out the buffer. When you use the print( ) function to print the string, the string will not appear directly on the console, but will enter the buffer first. The string will not be printed to the console until the buffer is full or the program exits. When we wrote the exec version before, we did not need to manually flush the buffer after print( ), because the program will exit directly after executing, and the buffer will be automatically flushed when exiting. However, the execd version of the program must be run as a daemon. If you want to get data immediately, you must manually flush the buffer.
(2) Write Telegraf configuration file

Create the example_dir_num_input_execd.conf file.

vim example_dir_num_input_execd.conf
1
Type the following.

[agent]
interval = "3s"
flush_interval = "5s"
[[]]
command = ["python3","/opt/module/telegraf_conf/dir_num_input_execd.py", "/home/atguigu/*"]
data_format = "influx"
signal = "STDIN"

[[]]
files = ["stdout"]

Configuration explanation:

command: This configuration is not the same as commands in the exec plugin. Although the command in the execd is also an array, it is actually used to separate a complete command. At runtime, commas are replaced by spaces, and strings in the array are eventually spelled into a complete command.

signal: literally, signal. It is a very clever design in the execd plugin. execd will send a signal to the daemon when the acquisition time is up. Here, set signal to STDIN (standard input), which means that Telegraf will send a signal to the standard input of the python process every time the acquisition time is reached. This is the meaning of writing for _ in the previous python script. This is also done to let the python process know that it is time to collect indicators. The interval configuration on Telegraf can be used with Python processes. Otherwise, you need to write ( ) and read parameters yourself.

(3) Run Telegraf

Run the following command and observe the console output.

telegraf --config example_dir_num_input_execd.conf
1
It can be seen that the program successfully observed the number of files in the /home/atguigu/ directory.

(4) Create a file and observe data changes

Similarly, when the Telegraf program is running, we create another new terminal. Use the touch command to create a new file and look back at the changes in the indicator data in Telegraf.

touch /home/atguigu/haha2
1
Data changes indicate that the input plug-in of the execd version can also operate successfully.

Write an external processing plugin in python (execd version)
Telegraf also has a plug-in. Note that exec does not provide processing plugins. This plugin allows us to take out the data in Telegraf and do some fancy conversions. Here, we can do the easiest thing, which is to add atguigu to the front of each data, which is equivalent to changing the measurement (measurement name) of each data.

(1) Write python script

In the /opt/module/telegraf_conf directory, create the add_atguigu_processor.py file.

vim add_atguigu_processor.py
1
Type the following:

import sys

# Loop to get standard input
for line in :
# Add something to the input and then output it
print("atguigu"+line,end="",flush=True)

Program explanation:

for line in: loop waiting for standard input. The upstream processor or input plug-in will write the index data into the current python program in the form of standard output. We just need to read it by line.
print("atguigu"+line,end="",flush=True): add an atguigu to the input content:
(2) Write Telegraf configuration file

Copy one copy and name it example_processor_python.conf

cp example_processor_python.conf
1
Type the following, the red part is relatively newly added.

[agent]
interval = "3s"
flush_interval = "5s"

[[]]
percpu = true
totalcpu = true
collect_cpu_time = false
report_active = false
core_tags = false

[[]]
command = ["python3","/opt/module/telegraf_conf/add_atguigu_processor.py"]

[[]]
files = ["stdout"]

(3) Run Telegraf and observe data changes

Use the following command to run the Telegraf program. Observe console data output

telegraf --config ./example_processor_python.conf
1
Each of the original data outputs was started with a CPU, but now it is atguigucpu.

The mission is completed!

Implement plug-ins based on framework using Go language
Prepare the project
(1) Git clone Telegraf source code specified version

Note that we are currently using the v1.23.3 distribution. For secondary development, in principle, the source code at the time of release of 1.23.3 should be used as the starting point. So here you can use -b to specify the corresponding version number when clone.

git clone -b v1.23.3 /influxdata/
1
After clone is completed, there will be an additional telegraf subdirectory in the directory.

(2) Open the project using GoLand

After entering the project, GoLand needs to index and dependency analysis of the files under the project. It will take longer. You can wait for the progress bar in the lower right to finish running.

(3) Configure GoLand project

Click File > Settings to enter the settings page.

GOPATH

If you are using the version after Go 1.11, GOPATH is a place to rely on when turning on Go Module mode. We have arranged it once in the environment variables before. Now GoLand can scan this environment variable.

In addition, you can set a GOPATH to put dependency for the current project.

Go Modules & Environment

Since Go exited the Go Module package management tool in 1.11, this method has been recommended for package management. Currently, within the project scope, GoLand is enabled by default for us, without any action. However, when Go downloads the dependency library, it will access Github. Since Github cannot be accessed directly in China, it is necessary to use GoLand to set a project-level environment variable.

GOPROXY=,direct
1

Download dependencies
Click on the file under the project directory.

The mod file is the dependency required by the Telegraf project, and the first required fast library is full of direct dependencies. GoLand Biaohong says we still do these dependencies at the moment.

Open Terminal below the GoLand window. Use the following command to download the dependency.

go mod download -x
1
-x means printing the download process to the console.

After the download is complete, wait for GoLand to reindex the file. The time may be longer.

The sign of successful download of dependency is the dependency in the file, and the color has turned green.

Example: Implement an Input plugin that generates random numbers
Create a path to the custom plugin
First, open the /plugins/inputs/ directory of the project, which contains all the input plugins. We create a subdirectory atguigu_random, and in the future, we will be the code for generating the random number plugin.

Find templates for input plugin on github
Access Telegraf's repository on Github (preferably jump to the 1.23.3 branch)

/influxdata/telegraf/tree/v1.23.3

Pull below to view the project's README. We can see a development guide to introduce you to how to develop plug-ins. Here, click Input Plugins

After clicking in, you can find that they have already listed the specific development.

Pull down and you can see a sample code for the output plugin.

In this way, we can happily become a CV engineer.

In the atguigu_random directory, create the atguigu_random.go file. And copy and paste the following code into the atguigu_random.go file.

Note that change the package name from simple to atguigu_random.

//go:generate ../../../tools/readme_config_includer/generator
// package simple
package atguigu_random

import (
_ "embed"

"/influxdata/telegraf"
"/influxdata/telegraf/plugins/inputs"
)

// DO NOT REMOVE THE NEXT TWO LINES! This is required to embed the sampleConfig data.
//go:embed
var sampleConfig string

type Simple struct {
Ok bool `toml:"ok"`
Log `toml:"-"`
}

func (*Simple) SampleConfig() string {
return sampleConfig
}

// Init is for setup, and validating config.
func (s *Simple) Init() error {
return nil
}

func (s *Simple) Gather(acc ) error {
if {
("state", map[string]interface{}{"value": "pretty good"}, nil)
} else {
("state", map[string]interface{}{"value": "not great"}, nil)
}

return nil
}

func init() {
("simple", func() { return &Simple{} })
}

Write the logic of your own plug-in into the functions in this file, and our plug-in will work normally.

Next, we will explain and develop our plug-ins.

Plug-in development
(1) Head go:generate

During the compilation stage, the help framework automatically integrates the help document. It will find the file under your package, then find the toml @ code block, and then do some operations to generate the document.

Writable or not, it will not affect the final passage of the compilation. If you want to become a source code contributor to Telegraf, you should write it according to community regulations.

(2) Two lines that cannot be deleted in the middle

//go:embed
1
This line looks like an annotation, but it is more like annotations in Java. This line of code will affect the compilation behavior of the program. During the compilation phase, the file under the package will be read, and the contents in it will be copied to the sampleConfig variable as a string.

Therefore, according to the requirements of the comment, not only can this line not be deleted, but there must be files in the package.

(3) Create a file

Create a file under the atguigu_random package.

(4) Write an example

The contents in it depend on how you want to use configuration items to determine the behavior of the program.

I decided to give the following two parameters.

size: generate several random number data in each interval time.

[range]: This range is a subconfiguration fast. In this configuration block, we can set the range of our random number generation, such as [0-10] or [0-100].

min: The minimum value of the range of random numbers

max: The maximum value of the range of random numbers

max must be larger than min.

Finally, the configuration file I wrote is as follows.

[[inputs.atguigu_random]]
size = 1
[inputs.atguigu_random.range]
min = 0
max = 10

We all give clear values for this column size min max, which is equivalent to setting a default value for the plugin.

Now we can try it.

(5) Register the plugin name into the inputs list

Now, the structure of our plug-in package has been basically formed, and we can register our own packages into the Telegraf plug-in list.

Under plugins/inputs/all, there is a file that records all available input plugins for Telegraf.

The way to register a plug-in is to import your own package. In this list, add the following content.

_ "/influxdata/telegraf/plugins/inputs/atguigu_random
1
If you are developing using GoLand, you may find that after typing the package name at the bottom of the list, the package name will disappear after a while. In fact, it should have been adjusted to the front of the import list. This is because when designing Go, I hope everyone can write codes with the same format, and don’t argue about whether curly braces are at the beginning or at the end of the line. Therefore, the go format tool was launched. Based on this, GoLand will sort the import packages in alphabetical order a-z, and your packages may be placed in front.

（6）init( )

The logic of this function is basically unnecessary. It is used to put plug-in instances into the inputs list when telegraf is started. The first parameter is the name of this plugin, and it is recommended to modify this. It affects telegraf --input-filter, logs and other program behaviors.

The second parameter is a function called creator. Our function is directly given a &Simple{}. Of course, you can also assign values to the contents in this structure, which is equivalent to giving the default value.

The final init() implementation is as follows.

func init() {
("atguigu_random", func() { return &Simple{} })
}
1
2
3
(7) Try to compile and see if there is any successful registration of the plug-in

Telegraf is compiled using make tool. To be precise, the compiler is still used by Go, but make only specifies the compilation steps.

In the root directory of the project, use the following command to compile telegraf.

make all
1

After compilation is completed, an executable file named telegraf will appear in the project and directory. This is the telegraf command we can use.

We can see if telegraf can load our sample configuration now. If it can be loaded successfully, it means that our plug-in has been compiled into the telegraf executable file and the project does not have structural problems.

Use the following command to find out if there is atguigu_random in the input list

./telegraf --input-list | grep atguigu
1

If there is atguigu_random, it means that the framework now recognizes our plug-in.

You can also use the following command to see if our configuration file can be returned in telegraf.

./telegraf --input-filter atguigu_random config
1
The return should be shown in the figure below.

Now, we can further enrich the logic of the plug-in.

(8) Analysis of configuration files

The configuration file parsing is done by Telegraf framework for us. We just need to declare in the plugin that it can be mapped to the configuration file content structure.

When parsing the configuration file, it encounters sub-configuration fast and needs to be mapped into a separate structure in the program.

In atgiugu_random.go

Create a new type named RangeConf.

type RangeConf struct {
Max int `toml:"max"`
Min int `toml:"min"`
}

Rewrite the Simple type in the template

type Simple struct {
Size int `toml:"size"`
Range *RangeConf

Log `toml:"-"`
}

Code explanation:

The initial letter is capitalized, and the initial letter in Go basically means public in Java. The initial letter can be accessed externally.

`toml:"xxx"` means the corresponding option name in the configuration file. If the parameter name in the structure is converted to underscore nomenclature and can be matched with the configuration file, then it can actually be ignored. If you don't match, you need to use `toml:"xxx"` to map manually.

*RangeConf, pointer type to the RangeConf type, pass pointers back and forth to prevent copying of structures.

(9)(s *Simple) Init( ) error verify the legality of the configuration

(s *Simple) Init( ) error function is not used for configuration file parsing. When this function is called, the configuration has been parsed. This function is suitable for some configuration legality checksum initialization operations.

Next in this function, we need to do two operations.

Configuration legality verification: determine whether Max is larger than Min, if not, an error will be reported

Set the seed of the random number generator: Set the current timestamp to seed.

The final (s *Simple) Init( ) error is implemented as follows:

// Init is for setup, and validating config.
func (s *Simple) Init() error {
if >= {
return ("max should be larger than min")
}
(().Unix())
return nil
}

(10)(s *Simple) Gather(acc) error Send data

With the acc variable, we can use the AddFields method to send data to the downstream pipeline. AddFields receives 4 parameters.

measurement, the measurement name of the data

fields, the type must be map[string]interface{}, the key must be string, the value can be any type

tags, type is map[string]string, and the types of key and value must be string.

t, type, that is, time. This parameter is not necessary. You can not pass it. If you do not pass it, Telegraf will automatically fill the timestamp.

Final implementation:

func (s *Simple) Gather(acc ) error {
for i := 1; i <= ; i++ {
("atguigu_random",
map[string]interface{}{"num": + ()},
nil)
}

return nil
}

(11) Compile again

Now that the logic of the plug-in has been written, use the following command to recompile it.

rm ./telegraf
make all
1
2
Wait for the compilation to end.

(12) Verify the plug-in effect

Create a configuration file,

vim
1
Type the following:

[[inputs.atguigu_random]]
size = 5
[inputs.atguigu_random.range]
min = 15
max = 10

Use the following command to run Telegraf

./telegraf --config ./ --test
1
As you can see, the plugin is already available, it finds that min is larger than max and throws the exception correctly.

Revise. Make min smaller than max

[[inputs.atguigu_random]]
size = 5
[inputs.atguigu_random.range]
min = 10
max = 20
1
2
3
4
5
Run again with the following command

./telegraf --config ./ --test
1

This time, our plugin was successfully run!

Telegraf combined with Prometheus
What is Prometheus
Prometheus is a server software designed specifically for monitoring scenarios. It also implements a timing database internally. Moreover, Prometheus is also very popular at present. Let’s briefly introduce the working structure of Prometheus.

Generally speaking, the monitoring object of Prometheus needs to expose an interface to the outside. When accessing this interface, you can get the internal indicator data of the program, and these data should conform to the Prometheus data format.

However, if I want to see the number of files under a certain path on server host1, who will expose this data to the outside. In this way, an HTTP service must be implemented, which counts the number of files in a local directory, and exposes an API to the outside, waiting for Prometheus to catch it. Components that implement this type of function are called Exporter in the Prometheus ecosystem.

Prometheus' official and community provides a large number of open source and out-of-the-box exporters, and the ecosystem is great.

Exporter Demo
We use the Node Exporter (export host data) provided by Prometheus

Reference official website: /docs/guides/node-exporter/

(1) Download Node Exporter (expose host running information)

Go to the server and download it using the wget command.

wget /prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.
1
(2) Unzip to the target path

Use the following command to unzip it to the target path

tar -zxvf node_exporter-1.3. -C /opt/module/
1
(3) Start Node Exporter

cd to the directory you are in. Look at what's inside.

cd /opt/module/node_exporter-1.3.-amd64
1

Use the following command to start the node exporter directly

./node_exporter
1
(4) View the contents in the indicator interface

node_exporter listens to port 9100 by default. Open the browser and visit http://hadoop102:9100/metrics. As shown in the figure below, this is the data that we Prometheus can crawl.

Prometheus data format
For the same reason as InfluxDB, the data format exported by Exporter is recognized by Prometheus and can be written to the Prometheus format.

At present, there is also a protocol called OpenMetrics whose popularity is rising, and this protocol is based on Prometheus data specifications.

Here is a brief introduction to the data format of Prometheus:

Metric Name: Metric Name is required and indispensable.

Tag set: Tag set is a bunch of key-value pairs, the key is the name of the tag, the value is the specific tag content, and the value must be a string. The index name and label together form an index.

The first space: The first space separates the indicator name & label set from the indicator value

Value: The value is in the floating point format by default.

Second space: The second space separates the value from the timestamp, but if the timestamp is omitted, this space can be omitted.

Timestamp: int64-bit Unix timestamp, in milliseconds.

Disadvantages of Exporter mode
The disadvantage of the Exporter model is that when there are many targets to be monitored, the cumbersomeness of management increases.

If I have a machine that has many services deployed and I want to grab their metrics. For example, monitor mysql, monitor hardware resources such as CPU, memory, disk, etc., monitor MongoDB, monitor memory usage of SpringBoot applications, etc. So for each monitoring target, I have to download a dedicated Exporter and make it run.

Each Exporter is an independent process. The requirements listed above require you to install 6 Exporters and open 6 ports, which is very troublesome to manage. And it is also necessary on the Prometheus side

In this way, Telegraf's plug-in mode is more friendly and convenient for unified management.

So, we combine Telegarf and Prometheus for convenience. With Telegraf, we can manage multiple input components with one configuration file. Moreover, each component is a lightweight thread, which has a smaller overhead. Finally, Prometheus just configures a crawling target, clean and neat.

Example: Monitor CPU with Telegraf and expose it to Prometheus data format
Expose the data to Prometheus format, just add an output plugin to the configuration file. This time, we make changes based on the previous ones.

(1) Write configuration files

cd to the target path:

cd /opt/module/telegraf_conf
1
A copy of:

cp /opt/module/telegraf_conf/ /opt/module/telegraf_conf/example_cpu_prometheus.conf
1
Edit example_cpu_prometheus.conf:

vim ./example_cpu_prometheus.conf
1
Type the following content, the red part is the relatively new content this time:

[agent]
interval = "3s"
flush_interval = "5s"

[[]]
percpu = true
totalcpu = true
collect_cpu_time = false
report_active = false
core_tags = false

[[]]
files = ["stdout"]

[[outputs.prometheus_client]]
listen = ":9273"

For more configurations of prometheus_client, please refer to: telegraf/ at release-1.23 · influxdata/telegraf ()

(2) Run Telegraf

Use the following command to run the telegraf program.

telegraf --config ./ example_cpu_prometheus.conf
1
(3) Observe the plug-in loading information

There are now two output plugins, one is file (output to standard output console) and the other is prometheus_client.

(4) Observe the console output

The console outputs dense outputs, indicating that the input data can reach the output smoothly.

(5) The browser viewes the exposed Prometheus data

Visit http://hadoop102:9273/metrics to view exposed Prometheus data.

If you see the page below, it will be successful.

------------------------------------------------------------------------------------------------