Logstash is an open source tool for log collection, often used with Elasticsearch and Kibana to form the ELK Stack (now known as the Elastic Stack.) Logstash is very flexible, and can be configured via a configuration file (usually the.conf
file) to define the input, processing, and output of data. For processing Java logs, a common scenario is parsing log files generated by a Java application (e.g., log files generated using Log4j or Logback).
1. Method 1: Logstash configuration example
The following is a Logstash configuration example that assumes we have a Java application with a log file that follows a common logging format, such as Logback's default schema (which contains a timestamp, log level, thread name, logger name, and message).
First, we need a Logstash configuration file, such as a file namedjava_log_pipeline.conf
. The following is an example of this configuration file:
input {
file {
# Specify the path to the log file
path => "/path/to/your/java/application/logs/"
# Trigger reads only when there is new content in the file
start_position => "beginning"
# Character encoding to use when reading the file
codec => "plain" { charset => "UTF-8" }
# Time interval (in seconds) to detect file changes
sincedb_path => "/dev/null"
# Ignore old data
ignore_older => 0
}
}
filter {
# Use the grok plugin to parse the logs
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} \[%{DATA:thread}\] %{LOGLEVEL:level} %{DATA:logger} - %{GREEDYDATA:message}" }
}
# Other filters can be added, such as date, mutate, etc.
date {
match => ["timestamp", "ISO8601"]
target => "@timestamp"
}
# Convert log level to lowercase (optional)
mutate {
lowercase => ["level"]
}
}
output {
# Output to Elasticsearch
elasticsearch {
hosts => ["http://localhost:9200"]
index => "java-app-logs-%{+}"
document_type => "_doc"
# If Elasticsearch sets a username and password
# user => "your_username"
# password => "your_password"
}
# You can print a log on the console for debugging purposes
stdout {
codec => rubydebug
}
}
caveat:
(1)file path:path
field needs to be changed to the path where our Java application actually generates the log file.
(2)timestamp format: If the timestamp format in the logs is not ISO8601, we need to modify thegrok
plug-in in theTIMESTAMP_ISO8601
for the corresponding mode.
(3)Elasticsearch Configuration: If our Elasticsearch service is not running in thelocalhost
or the port is not9200
The following is an example of how the system may need to be modified accordingly.hosts
Fields.
(4)adjust components during testing: Usestdout
The output helps us verify that Logstash is parsing the logs correctly.
This configuration example first passes thefile
plugin reads the log file and then uses thegrok
plugin to parse log messages and break them down into more specific fields (e.g. timestamp, log level, message, etc.). After that, use thedate
The plugin converts the timestamp field into a format that Logstash understands and uses as a timestamp for the event. Finally, the event timestamp field is converted to a format that Logstash understands by theelasticsearch
The plugin sends the processed logs to Elasticsearch for storage and further analysis. At the same time, using thestdout
The plugin prints logs to the console for debugging purposes.
2. Method 2: Logstash Input, Filtering and Output Configuration
In addition to the previously mentioned file-based input configurations, Logstash supports a variety of other types of input configurations that can be selected and adapted to our specific needs and environment. Below are some examples of common Logstash input, filtering, and output configurations that can be combined with Java log processing:
2.1 Input configuration
(1)TCP Input:
If we want Logstash to receive logs from a Java application over a TCP port (for example, a Java application configured with Log4j or Logback to send logs to a TCP socket), we can use the TCP Input plugin.
input {
tcp {
port => 5000
codec => json_lines # If the Java application is sending JSON-formatted logs.
# Or use plain encoding if the logs are not in JSON format
# codec => plain { charset => "UTF-8" }
}
}
Note: If the Java application is sending logs in a non-JSON format and we want to parse them using the Grok plugin, we may need to keep thecodec => plain
and make sure the log format matches the Grok pattern.
(2)Beats Input:
Logstash can receive data from Filebeat or other Beats products via the Beats input plugin. This approach is particularly well suited for situations where logs need to be collected from multiple sources, and Filebeat can efficiently collect, compress, and forward logs on the host.
In the Logstash configuration, we don't need to specify a special configuration for the Beats input, because Beats will act as a client sending data to the port specified by Logstash (usually 5044, but this can be customized). However, we need to specify the address and port for Logstash in the Filebeat configuration.
2.2 Filtering Configuration
In addition to the Grok plugin mentioned earlier, Logstash offers other filtering plugins such asdate
、mutate
、json
etc., for further processing and conversion of log data.
JSON Filtering:
If the Java application is sending logs in JSON format, we can use thejson
plugin to parse these logs and extract the JSON fields as separate fields.
filter {
json {
source => "message" # Assume the entire log message is a JSON string.
}
}
Note: If the log message itself is already a JSON object and we want to parse it directly, then the above configuration works. However, if the log message contains a JSON string (i.e., JSON surrounded by quotes), we may need to first add a new string to thegrok
plugin to extract the string before using thejson
plugin for parsing.
2.3 Output configuration
In addition to Elasticsearch, Logstash supports a variety of output configurations such as file, standard output, HTTP, Kafka, and more.
(1)file output:
If we need to save the processed logs to a file, we can use thefile
Output plug-ins.
output {
file {
path => "/path/to/your/output/"
codec => line { format => "Custom format: %{message}" }
}
}
Note: Here's theformat
is optional and defines the format of the output file. If not specified, Logstash will use the default format.
(2)standard output:
During debugging, we may want to output logs to the console. This can be done with thestdout
Plugin Implementation.
output {
stdout { codec => rubydebug }
}
rubydebug
The encoder will provide an easy to read formatted output that includes all fields of the event.
In summary, Logstash configuration is very flexible and can be customized to our specific needs. The above example provides some common configuration options, but please note that we need to select and adjust them according to our actual environment and needs.