Logstash Configuring the Java Log Format

Logstash is an open source tool for log collection, often used with Elasticsearch and Kibana to form the ELK Stack (now known as the Elastic Stack.) Logstash is very flexible, and can be configured via a configuration file (usually the.conf file) to define the input, processing, and output of data. For processing Java logs, a common scenario is parsing log files generated by a Java application (e.g., log files generated using Log4j or Logback).

1. Method 1: Logstash configuration example

The following is a Logstash configuration example that assumes we have a Java application with a log file that follows a common logging format, such as Logback's default schema (which contains a timestamp, log level, thread name, logger name, and message).

First, we need a Logstash configuration file, such as a file namedjava_log_pipeline.conf. The following is an example of this configuration file:

input {
  file {
    # Specify the path to the log file
    path => "/path/to/your/java/application/logs/"
    # Trigger reads only when there is new content in the file
    start_position => "beginning"
    # Character encoding to use when reading the file
    codec => "plain" { charset => "UTF-8" }
    # Time interval (in seconds) to detect file changes
    sincedb_path => "/dev/null"
    # Ignore old data
    ignore_older => 0
  }
}

filter {
  # Use the grok plugin to parse the logs
  grok {
    match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} \[%{DATA:thread}\] %{LOGLEVEL:level} %{DATA:logger} - %{GREEDYDATA:message}" }
  }

  # Other filters can be added, such as date, mutate, etc.
  date {
    match => ["timestamp", "ISO8601"]
    target => "@timestamp"
  }

  # Convert log level to lowercase (optional)
  mutate {
    lowercase => ["level"]
  }
}

output {
  # Output to Elasticsearch
  elasticsearch {
    hosts => ["http://localhost:9200"]
    index => "java-app-logs-%{+}"
    document_type => "_doc"
    # If Elasticsearch sets a username and password
    # user => "your_username"
    # password => "your_password"
  }

  # You can print a log on the console for debugging purposes
  stdout {
    codec => rubydebug
  }
}

caveat：

（1）file path：path field needs to be changed to the path where our Java application actually generates the log file.

（2）timestamp format: If the timestamp format in the logs is not ISO8601, we need to modify thegrok plug-in in theTIMESTAMP_ISO8601 for the corresponding mode.

（3）Elasticsearch Configuration: If our Elasticsearch service is not running in thelocalhost or the port is not9200The following is an example of how the system may need to be modified accordingly.hosts Fields.

（4）adjust components during testing: Usestdout The output helps us verify that Logstash is parsing the logs correctly.

This configuration example first passes thefile plugin reads the log file and then uses thegrok plugin to parse log messages and break them down into more specific fields (e.g. timestamp, log level, message, etc.). After that, use thedate The plugin converts the timestamp field into a format that Logstash understands and uses as a timestamp for the event. Finally, the event timestamp field is converted to a format that Logstash understands by theelasticsearch The plugin sends the processed logs to Elasticsearch for storage and further analysis. At the same time, using thestdout The plugin prints logs to the console for debugging purposes.

2. Method 2: Logstash Input, Filtering and Output Configuration

In addition to the previously mentioned file-based input configurations, Logstash supports a variety of other types of input configurations that can be selected and adapted to our specific needs and environment. Below are some examples of common Logstash input, filtering, and output configurations that can be combined with Java log processing:

2.1 Input configuration

（1）TCP Input：
If we want Logstash to receive logs from a Java application over a TCP port (for example, a Java application configured with Log4j or Logback to send logs to a TCP socket), we can use the TCP Input plugin.

input {
  tcp {
    port => 5000
    codec => json_lines # If the Java application is sending JSON-formatted logs.
    # Or use plain encoding if the logs are not in JSON format
    # codec => plain { charset => "UTF-8" }
  }
}

Note: If the Java application is sending logs in a non-JSON format and we want to parse them using the Grok plugin, we may need to keep thecodec => plain and make sure the log format matches the Grok pattern.

（2）Beats Input：
Logstash can receive data from Filebeat or other Beats products via the Beats input plugin. This approach is particularly well suited for situations where logs need to be collected from multiple sources, and Filebeat can efficiently collect, compress, and forward logs on the host.

In the Logstash configuration, we don't need to specify a special configuration for the Beats input, because Beats will act as a client sending data to the port specified by Logstash (usually 5044, but this can be customized). However, we need to specify the address and port for Logstash in the Filebeat configuration.

2.2 Filtering Configuration

In addition to the Grok plugin mentioned earlier, Logstash offers other filtering plugins such asdate、mutate、json etc., for further processing and conversion of log data.

JSON Filtering：
If the Java application is sending logs in JSON format, we can use thejson plugin to parse these logs and extract the JSON fields as separate fields.

filter {
  json {
    source => "message" # Assume the entire log message is a JSON string.
  }
}

Note: If the log message itself is already a JSON object and we want to parse it directly, then the above configuration works. However, if the log message contains a JSON string (i.e., JSON surrounded by quotes), we may need to first add a new string to thegrok plugin to extract the string before using thejson plugin for parsing.

2.3 Output configuration

In addition to Elasticsearch, Logstash supports a variety of output configurations such as file, standard output, HTTP, Kafka, and more.

（1）file output：
If we need to save the processed logs to a file, we can use thefile Output plug-ins.

output {  
  file {  
    path => "/path/to/your/output/"  
    codec => line { format => "Custom format: %{message}" }  
  }  
}

Note: Here's theformat is optional and defines the format of the output file. If not specified, Logstash will use the default format.

（2）standard output：
During debugging, we may want to output logs to the console. This can be done with thestdout Plugin Implementation.

output {  
  stdout { codec => rubydebug }  
}

rubydebug The encoder will provide an easy to read formatted output that includes all fields of the event.

In summary, Logstash configuration is very flexible and can be customized to our specific needs. The above example provides some common configuration options, but please note that we need to select and adjust them according to our actual environment and needs.