5/5 - (1 vote)

It was 2019, and we still didn’t have a standard solution for log aggregation in Kubernetes. In this article, we would like, using examples from real practice, to share our searches, the problems encountered and their solutions.
However, to begin with, I will make a notice that different customers understand very different things by collecting logs:

  • someone wants to see security and audit logs;
  • someone – centralized logging of the entire infrastructure;
  • and for someone it is enough to collect only the application logs, excluding, for example, balancers.

About how we implemented various “Wishlist” and what difficulties we encountered – see below.

Theory: About Logging Tools

Background on the components of the logging system

Logging has come a long way, as a result of which we have developed methodologies for collecting and analyzing logs, which we use today. Back in the 1950s, Fortran introduced an analogue of standard I / O streams that helped the programmer debug his program. These were the first computer logs that made life easier for programmers of those times. Today we see in them the first component of the logging system – the source or “producer” of the logs.
Computer science did not stay on one place: computer networks appeared, the first clusters … Complex systems consisting of several computers began to work. Now system administrators were forced to collect logs from several machines, and in special cases they could add OS kernel messages in case they needed to investigate a system failure. To describe centralized log collection systems, RFC 3164 came out in the early 2000s, which standardized remote_syslog. So another important component appeared: the collector of logs and their storage.
With the increase in the volume of logs and the widespread adoption of web technologies, the question arose of which logs should be conveniently displayed to users. Simple console tools (awk / sed / grep) were replaced by more advanced log viewers – the third component.
In connection with the increase in the volume of logs, another thing became clear: logs are needed, but not all. And different logs require different levels of security: some can be lost every other day, while others must be stored for 5 years. So, a filtration and routing component for data streams was added to the logging system – let’s call it a filter.
Repositories also made a major leap: they switched from regular files to relational databases, and then to document-oriented repositories (for example, Elasticsearch). So the storage was separated from the collector.
In the end, the very concept of the log has expanded to some abstract stream of events that we want to keep for history. More precisely, in the case when it is necessary to conduct an investigation or draw up an analytical report …
As a result, over a relatively short period of time, the collection of logs has developed into an important subsystem, which can rightfully be called one of the subsections in Big Data.

Kubernetes and Logs

When Kubernetes came into the infrastructure, the existing problem of collecting logs did not pass by it. Actually, it has become even more painful: the management of the infrastructure platform was not only simplified, but also complicated. Many old services began to migrate to microservice tracks. In the context of logs, this resulted in a growing number of log sources, their special life cycle, and the need to track through the logs the interconnections of all system components …
Looking ahead, I can say that now, unfortunately, there is no standardized logging option for Kubernetes that compares favorably with everyone else. The most popular schemes in the community are next:

  • someone is deploying an EFK stack (Elasticsearch, Fluentd, Kibana);
  • someone is trying the recently released Loki or using the Logging operator;
  • we (and maybe not only us? ..) are largely satisfied with our own development – loghouse …

As a rule, we use such bundles in K8s clusters (for self-hosted solutions):

  • Fluentd + Elasticsearch + Kibana;
  • Fluentd + ClickHouse + loghouse.

However, I will not dwell on the instructions for their installation and configuration. Instead, I will focus on their shortcomings and more global conclusions on the situation with logs in general.

Practice with logs in K8s

“Everyday logs”, how many of you? ..

Centralized collection of logs with a sufficiently large infrastructure requires considerable resources that will be spent on collecting, storing and processing logs. During the operation of various projects, we were faced with various requirements and problems arising from them in operation.