Rate this post

This article will discuss the nginx-log-collector project, which will read nginx logs and send them to the Clickhouse cluster. Usually for logs we use ElasticSearch. Clickhouse requires less resources (disk space, RAM, CPU). Clickhouse records data faster. Clickhouse compresses data, making disk data even more compact. 

  • To view log analytics, create a dashboard for Grafana.
  • Install nginx, grafana in the standard way
  • Install the clickhouse cluster using ansible-playbook 

Creating Databases and Tables in Clickhouse

This file describes SQL queries for creating a database and tables for nginx-log-collector in Clickhouse.

We make each request in turn on each server of the Clickhouse cluster.

Important note. On this line, logs_cluster should be replaced with your cluster name from the clickhouse_remote_servers.xml file between “remote_servers” and “shard”.

ENGINE = Distributed('logs_cluster', 'nginx', 'access_log_shard', rand())

Installing and Configuring nginx-log-collector-rpm

Nginx-log-collector does not have rpm. Here https://github.com/patsevanton/nginx-log-collector-rpm create rpm for it. Rpm will be collected using Fedora Copr.

Install the rpm package nginx-log-collector-rpm

yum -y install yum-plugin-copr
yum copr enable antonpatsev/nginx-log-collector-rpm
yum -y install nginx-log-collector
systemctl start nginx-log-collector

Edit the config /etc/nginx-log-collector/config.yaml:

 .......
  upload:
    table: nginx.access_log
    dsn: http://ip-адрес-кластера-clickhouse:8123/

- tag: "nginx_error:"
  format: error  # access | error
  buffer_size: 1048576
  upload:
    table: nginx.error_log
    dsn: http://ip-адрес-кластера-clickhouse:8123/

Nginx setup

 

General nginx config:

user  nginx;
worker_processes  auto;

#error_log  /var/log/nginx/error.log warn;
pid        /var/run/nginx.pid;

events {
    worker_connections  1024;
}

http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

    log_format avito_json escape=json
                     '{'
                     '"event_datetime": "$time_iso8601", '
                     '"server_name": "$server_name", '
                     '"remote_addr": "$remote_addr", '
                     '"remote_user": "$remote_user", '
                     '"http_x_real_ip": "$http_x_real_ip", '
                     '"status": "$status", '
                     '"scheme": "$scheme", '
                     '"request_method": "$request_method", '
                     '"request_uri": "$request_uri", '
                     '"server_protocol": "$server_protocol", '
                     '"body_bytes_sent": $body_bytes_sent, '
                     '"http_referer": "$http_referer", '
                     '"http_user_agent": "$http_user_agent", '
                     '"request_bytes": "$request_length", '
                     '"request_time": "$request_time", '
                     '"upstream_addr": "$upstream_addr", '
                     '"upstream_response_time": "$upstream_response_time", '
                     '"hostname": "$hostname", '
                     '"host": "$host"'
                     '}';

    access_log     syslog:server=unix:/var/run/nginx_log.sock,nohostname,tag=nginx avito_json; #ClickHouse
    error_log      syslog:server=unix:/var/run/nginx_log.sock,nohostname,tag=nginx_error; #ClickHouse

    #access_log  /var/log/nginx/access.log  main;

    proxy_ignore_client_abort on;
    sendfile        on;
    keepalive_timeout  65;
    include /etc/nginx/conf.d/*.conf;
}

vhost1.conf:

upstream backend {
    server ip-адрес-сервера-с-stub_http_server:8080;
    server ip-адрес-сервера-с-stub_http_server:8080;
    server ip-адрес-сервера-с-stub_http_server:8080;
    server ip-адрес-сервера-с-stub_http_server:8080;
    server ip-адрес-сервера-с-stub_http_server:8080;
}

server {
    listen   80;
    server_name vhost1;
    location / {
        proxy_pass http://backend;
    }
}

Add virtual hosts to the / etc / hosts file:

ip-адрес-сервера-с-nginx vhost1

HTTP server emulator

As an HTTP server emulator we will use nodejs-stub-server.

Nodejs-stub-server does not have rpm. Here https://github.com/patsevanton/nodejs-stub-server create rpm for it. Rpm will be collected using Fedora Copr

Install nodejs-stub-server package on upstream nginx rpm

yum -y install yum-plugin-copr
yum copr enable antonpatsev/nodejs-stub-server
yum -y install stub_http_server
systemctl start stub_http_server

Stress Testing

Testing done using Apache benchmark.

Install it:

yum install -y httpd-tools

We start testing using Apache benchmark from 5 different servers:

while true; do ab -H "User-Agent: 1server" -c 10 -n 10 -t 10 http://vhost1/; sleep 1; done
while true; do ab -H "User-Agent: 2server" -c 10 -n 10 -t 10 http://vhost1/; sleep 1; done
while true; do ab -H "User-Agent: 3server" -c 10 -n 10 -t 10 http://vhost1/; sleep 1; done
while true; do ab -H "User-Agent: 4server" -c 10 -n 10 -t 10 http://vhost1/; sleep 1; done
while true; do ab -H "User-Agent: 5server" -c 10 -n 10 -t 10 http://vhost1/; sleep 1; done

Grafana Setup

You also need to create a table variable with the contents of nginx.access_log.

Singlestat Total Requests:

SELECT
 1 as t,
 count(*) as c
 FROM $table
 WHERE $timeFilter GROUP BY t

Singlestat Failed Requests:

SELECT
 1 as t,
 count(*) as c
 FROM $table
 WHERE $timeFilter AND status NOT IN (200, 201, 401) GROUP BY t

Singlestat Failing Percent:

SELECT
 1 as t, (sum(status = 500 or status = 499)/sum(status = 200 or status = 201 or status = 401))*100 FROM $table
 WHERE $timeFilter GROUP BY t

Singlestat Avg Response Time:

SELECT
 1, avg(request_time) FROM $table
 WHERE $timeFilter GROUP BY 1

Singlestat Max Response Time:

SELECT
 1 as t, max(request_time) as c
 FROM $table
 WHERE $timeFilter GROUP BY t

Count Status:

$columns(status, count(*) as c) from $table

To output data, you need to install the plugin and restart grafana.

grafana-cli plugins install grafana-piechart-panel
service grafana-server restart

Pie TOP 5 Status:

SELECT
    1, /* fake timestamp value */
    status,
    sum(status) AS Reqs
FROM $table
WHERE $timeFilter
GROUP BY status
ORDER BY Reqs desc
LIMIT 5

Count http_user_agent:

$columns(http_user_agent, count(*) c) FROM $table

GoodRate/BadRate:

$rate(countIf(status = 200) AS good, countIf(status != 200) AS bad) FROM $table

Response Timing:

$rate(avg(request_time) as request_time) FROM $table

Upstream response time (response time of the 1st upstream):

$rate(avg(arrayElement(upstream_response_time,1)) as upstream_response_time) FROM $table
$columns(status, count(*) as c) from $table

Conclusion: 

Hopefully the community will get involved in developing / testing and using nginx-log-collector.

And someone, when he implements nginx-log-collector, will tell you how much he saved the disk, RAM, CPU.