The purpose of the tutorial: To organize the collection and parsing of log messages using Filebeat.
Disclaimer: The tutorial doesn’t contain production-ready solutions, it was written to help those who are just starting to understand Filebeat and to consolidate the studied material by the author. Also, the tutorial does not compare log providers.
Summary
Filebeat is a lightweight log message provider. Its principle of operation is to monitor and collect log messages from log files and send them to Elasticsearch or LogStash for indexing.
Filebeat consists of key components:
- harvesters – responsible for reading log files and sending log messages to the specified output interface, a separate harvester is set for each log file;
- input interfaces – responsible for finding sources of log messages and managing collectors.
Organizing log messages collection
Filebeat has a variety of input interfaces for different sources of log messages. As part of the tutorial, I propose to move from setting up collection manually to automatically searching for sources of log messages in containers. In my opinion, this approach will allow a deeper understanding of Filebeat and besides, I myself went the same way.
We need a service whose log messages will be sent for storage.
As such a service, let’s take a simple application written using FastAPI, the sole purpose of which is to generate log messages.
Collecting log messages using volume
First, let’s clone the repository (https://github.com/voro6yov/filebeat-template).
It contains the test application, the Filebeat config file, and the docker-compose.yml.
Configuring the collection of log messages using volume consists of the following steps:
- Setting up the application logger to write log messages to a file:
app/api/main.py
logger.add( "./logs/file.log", format="app-log - {level} - {message}", rotation="500 MB" )
2. Creating a volume to store log files outside of containers: docker-compose.yml
version: "3.8" services: app: ... volumes: - app-logs:/logs log-shipper: ... volumes: - ./filebeat.docker.yml:/usr/share/filebeat/filebeat.yml:ro - app-logs:/var/app/log volumes: app-logs:
3. Defining input and output filebeat interfaces: filebeat.docker.yml
filebeat.inputs: - type: log paths: - /var/app/log/*.log output.console: pretty: true
We launch the test application, generate log messages and receive them in the following format:
{ "@timestamp": "2021-04-01T04:02:28.138Z", "@metadata": { "beat": "filebeat", "type": "_doc", "version": "7.12.0" }, "ecs": { "version": "1.8.0" }, "host": { "name": "aa9718a27eb9" }, "message": "app-log - ERROR - [Item not found] - 1", "log": { "offset": 377, "file": { "path": "/var/app/log/file.log" } }, "input": { "type": "log" }, "agent": { "version": "7.12.0", "hostname": "aa9718a27eb9", "ephemeral_id": "df245ed5-bd04-4eca-8b89-bd0c61169283", "id": "35333344-c3cc-44bf-a4d6-3a7315c328eb", "name": "aa9718a27eb9", "type": "filebeat" } }
Collecting log messages using the container input interface
Сontainer allows collecting log messages from container log files.
Configuring the collection of log messages using the container input interface consists of the following steps:
- Removing the settings for the log input interface added in the previous step from the configuration file.
- Defining the container input interface in the config file:
filebeat.docker.yml
filebeat.inputs: - type: container paths: - '/var/lib/docker/containers/*/*.log' output.console: pretty: true
- Disabling volume app-logs from the app and log-shipper services and remove it, we no longer need it.
- Connecting the container log files and the docker socket to the log-shipper service:
docker-compose.yml
version: "3.8" services: app: ... log-shipper: ... volumes: - ./filebeat.docker.yml:/usr/share/filebeat/filebeat.yml:ro - /var/lib/docker/containers:/var/lib/docker/containers:ro - /var/run/docker.sock:/var/run/docker.sock:ro
- Setting up the application logger to write log messages to standard output:
app/api/main.py
logger.add( sys.stdout, format="app-log - {level} - {message}", )
The container input interface configured in this way will collect log messages from all containers, but you may want to collect log messages only from specific containers.
This can be done in the following way.
Collecting log messages using autodiscovery
When collecting log messages from containers, difficulties can arise, since containers can be restarted, deleted, etc. In this case, Filebeat has auto-detection of containers, with the ability to define settings for collecting log messages for each detected container. The autodiscovery mechanism consists of two parts:
- container search template;
- configurations for collecting log messages.
The setup consists of the following steps:
- Removing the settings for the container input interface added in the previous step from the configuration file.
- Defining auto-discover settings in the configuration file:
filebeat.docker.yml
filebeat.autodiscover: providers: - type: docker templates: - condition: contains: docker.container.name: fastapi_app config: - type: container paths: - /var/lib/docker/containers/${data.docker.container.id}/*.log exclude_lines: ["^INFO:"] output.console: pretty: true
That’s all. Now Filebeat will only collect log messages from the specified container.
Collecting log messages using hints
Filebeat supports hint-based autodiscovery.
It looks for information (hints) about the collection configuration in the container labels.
As soon as the container starts, Filebeat will check if it contains any hints and run a collection for it with the correct configuration.
The collection setup consists of the following steps:
- Removing the app service discovery template and enable hints:
filebeat.docker.yml
filebeat.autodiscover: providers: - type: docker hints.enabled: true output.console: pretty: true
- Disabling collection of log messages for the log-shipper service:
docker-compose.yml
version: "3.8" services: app: ... log-shipper: ... labels: co.elastic.logs/enabled: "false"
Log messages parsing
Filebeat has a large number of processors to handle log messages.
They can be connected using container labels or defined in the configuration file.
Let’s use the second method.
First, let’s clear the log messages of metadata. To do this, add the drop_fields handler to the configuration file: filebeat.docker.yml
processors: - drop_fields: fields: ["agent", "container", "ecs", "log", "input", "docker", "host"] ignore_missing: true
Now the log message looks like this:
{ "@timestamp": "2021-04-01T04:02:28.138Z", "@metadata": { "beat": "filebeat", "type": "_doc", "version": "7.12.0" }, "message": "app-log - ERROR - [Item not found] - 1", "stream": ["stdout"] }
To separate the API log messages from the asgi server log messages, add a tag to them using the add_tags handler: filebeat.docker.yml
processors: - drop_fields: ... - add_tags: when: contains: "message": "app-log" tags: [test-app] target: "environment"
Lets structure the message field of the log message using the dissect handler and remove it using drop_fields: filebeat.docker.yml
processors: - drop_fields: ... - add_tags: ... - dissect: when: contains: "message": "app-log" tokenizer: 'app-log - %{log-level} - [%{event.name}] - %{event.message}' field: "message" target_prefix: "" - drop_fields: when: contains: "message": "app-log" fields: ["message"] ignore_missing: true
Now the log message looks like this:
{ "@timestamp": "2021-04-02T08:29:07.349Z", "@metadata": { "beat": "filebeat", "type": "_doc", "version": "7.12.0" }, "log-level": "ERROR", "event": { "name": "Item not found", "message": "Foo" }, "environment": [ "test-app" ], "stream": "stdout" }
In addition
Filebeat also has out-of-the-box solutions for collecting and parsing log messages for widely used tools such as Nginx, Postgres, etc.
They are called modules. For example, to collect Nginx log messages, just add a label to its container:
co.elastic.logs / module: "nginx"
and include hints in the config file. After that, we will get a ready-made solution for collecting and parsing log messages + a convenient dashboard in Kibana.