5/5 - (2 votes)

The purpose of the tutorial: To organize the collection and parsing of log messages using Filebeat.

Disclaimer: The tutorial doesn’t contain production-ready solutions, it was written to help those who are just starting to understand Filebeat and to consolidate the studied material by the author. Also, the tutorial does not compare log providers.

Summary

Filebeat is a lightweight log message provider. Its principle of operation is to monitor and collect log messages from log files and send them to Elasticsearch or LogStash for indexing.

Filebeat consists of key components:

  • harvesters –  responsible for reading log files and sending log messages to the specified output interface, a separate harvester is set for each log file;
  • input interfaces – responsible for finding sources of log messages and managing collectors.

Organizing log messages collection

Filebeat has a variety of input interfaces for different sources of log messages. As part of the tutorial, I propose to move from setting up collection manually to automatically searching for sources of log messages in containers. In my opinion, this approach will allow a deeper understanding of Filebeat and besides, I myself went the same way.

We need a service whose log messages will be sent for storage.
As such a service, let’s take a simple application written using FastAPI, the sole purpose of which is to generate log messages.

Collecting log messages using volume

First, let’s clone the repository (https://github.com/voro6yov/filebeat-template).

It contains the test application, the Filebeat config file, and the docker-compose.yml.
Configuring the collection of log messages using volume consists of the following steps:

  1. Setting up the application logger to write log messages to a file: app/api/main.py
logger.add(
    "./logs/file.log",
    format="app-log - {level} - {message}",
    rotation="500 MB"
)

2. Creating a volume to store log files outside of containers: docker-compose.yml

version: "3.8"

services:
  app:
    ...
    volumes:
      - app-logs:/logs

  log-shipper:
    ...
    volumes:
      - ./filebeat.docker.yml:/usr/share/filebeat/filebeat.yml:ro
      - app-logs:/var/app/log

volumes:
  app-logs:

3. Defining input and output filebeat interfaces: filebeat.docker.yml

filebeat.inputs:
- type: log
 
  paths:
    - /var/app/log/*.log

output.console:
  pretty: true

We launch the test application, generate log messages and receive them in the following format:

{
"@timestamp": "2021-04-01T04:02:28.138Z",
"@metadata": {
"beat": "filebeat",
"type": "_doc",
"version": "7.12.0"
},
"ecs": {
"version": "1.8.0"
},
"host": {
"name": "aa9718a27eb9"
},
"message": "app-log - ERROR - [Item not found] - 1",
"log": {
"offset": 377,
"file": {
  "path": "/var/app/log/file.log"
}
},
"input": {
"type": "log"
},
"agent": {
"version": "7.12.0",
"hostname": "aa9718a27eb9",
"ephemeral_id": "df245ed5-bd04-4eca-8b89-bd0c61169283",
"id": "35333344-c3cc-44bf-a4d6-3a7315c328eb",
"name": "aa9718a27eb9",
"type": "filebeat"
}
}

Collecting log messages using the container input interface

Сontainer allows collecting log messages from container log files.
Configuring the collection of log messages using the container input interface consists of the following steps:

  • Removing the settings for the log input interface added in the previous step from the configuration file.
  • Defining the container input interface in the config file: filebeat.docker.yml
filebeat.inputs:
- type: container
   paths:
     - '/var/lib/docker/containers/*/*.log'
output.console:
   pretty: true
  • Disabling volume app-logs from the app and log-shipper services and remove it, we no longer need it.
  • Connecting the container log files and the docker socket to the log-shipper service: docker-compose.yml
version: "3.8"

services:
  app:
    ...

  log-shipper:
    ...
    volumes:
  
      - ./filebeat.docker.yml:/usr/share/filebeat/filebeat.yml:ro
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
      - /var/run/docker.sock:/var/run/docker.sock:ro
  • Setting up the application logger to write log messages to standard output: app/api/main.py
logger.add(
    sys.stdout,
    format="app-log - {level} - {message}",
)

The container input interface configured in this way will collect log messages from all containers, but you may want to collect log messages only from specific containers.
This can be done in the following way.

Collecting log messages using autodiscovery

When collecting log messages from containers, difficulties can arise, since containers can be restarted, deleted, etc. In this case, Filebeat has auto-detection of containers, with the ability to define settings for collecting log messages for each detected container. The autodiscovery mechanism consists of two parts:

  • container search template;
  • configurations for collecting log messages.

The setup consists of the following steps:

  1. Removing the settings for the container input interface added in the previous step from the configuration file.
  2. Defining auto-discover settings in the configuration file: filebeat.docker.yml
filebeat.autodiscover:
  providers:
 
    - type: docker
      templates:
        - condition:
            contains:
            
              docker.container.name: fastapi_app
        
          config:
            - type: container
              paths:
                - /var/lib/docker/containers/${data.docker.container.id}/*.log
              
              exclude_lines: ["^INFO:"]

output.console:
  pretty: true

That’s all. Now Filebeat will only collect log messages from the specified container.

Collecting log messages using hints

Filebeat supports hint-based autodiscovery.
It looks for information (hints) about the collection configuration in the container labels.
As soon as the container starts, Filebeat will check if it contains any hints and run a collection for it with the correct configuration.

The collection setup consists of the following steps:

  • Removing the app service discovery template and enable hints: filebeat.docker.yml
filebeat.autodiscover:
  providers:
    - type: docker
      hints.enabled: true
output.console:
  pretty: true
  • Disabling collection of log messages for the log-shipper service: docker-compose.yml
version: "3.8"

services:
  app:
    ...

  log-shipper:
    ...
    labels:
      co.elastic.logs/enabled: "false"

Log messages parsing

Filebeat has a large number of processors to handle log messages.
They can be connected using container labels or defined in the configuration file.
Let’s use the second method.

First, let’s clear the log messages of metadata. To do this, add the drop_fields handler to the configuration file: filebeat.docker.yml

processors:
  - drop_fields:
      fields: ["agent", "container", "ecs", "log", "input", "docker", "host"]
      ignore_missing: true

Now the log message looks like this:

{
  "@timestamp": "2021-04-01T04:02:28.138Z",
  "@metadata": {
    "beat": "filebeat",
    "type": "_doc",
    "version": "7.12.0"
  },
  "message": "app-log - ERROR - [Item not found] - 1",
  "stream": ["stdout"]
}

To separate the API log messages from the asgi server log messages, add a tag to them using the add_tags handler: filebeat.docker.yml

processors:
  - drop_fields:
      ...
  - add_tags:
    when:
      contains:
        "message": "app-log"
    tags: [test-app]
    target: "environment"

Lets structure the message field of the log message using the dissect handler and remove it using drop_fields: filebeat.docker.yml

processors:
  - drop_fields:
    ...
  - add_tags:
    ...
 - dissect:
     when:
       contains:
         "message": "app-log"
     tokenizer: 'app-log - %{log-level} - [%{event.name}] - %{event.message}'
     field: "message"
     target_prefix: ""
 - drop_fields:
     when:
       contains:
         "message": "app-log"
     fields: ["message"]
     ignore_missing: true

Now the log message looks like this:

{
  "@timestamp": "2021-04-02T08:29:07.349Z",
  "@metadata": {
    "beat": "filebeat",
    "type": "_doc",
    "version": "7.12.0"
  },
  "log-level": "ERROR",
  "event": {
    "name": "Item not found",
    "message": "Foo"
  },
  "environment": [
    "test-app"
  ],
  "stream": "stdout"
}

In addition

Filebeat also has out-of-the-box solutions for collecting and parsing log messages for widely used tools such as Nginx, Postgres, etc.

They are called modules. For example, to collect Nginx log messages, just add a label to its container:

co.elastic.logs / module: "nginx"

and include hints in the config file. After that, we will get a ready-made solution for collecting and parsing log messages + a convenient dashboard in Kibana.