5/5 - (1 vote)

Docker containers are the most popular containerization technology. Initially, it was used mainly for development and test environments, and over time it switched to production. Docker containers began to multiply in the production environment, like mushrooms after rain, however only few of those who use this technology thought about how to safely publish Docker containers.

Based on OWASP, we have prepared a list of rules, the implementation of which will significantly protect your environment, built on Docker containers.

Rule 0

The host machine and Docker must contain all current updates

To protect against known vulnerabilities that lead to escaping from the container environment to the host system, which usually comes to privilege escalation on the host system, installing all patches for the host OS, Docker Engine, and Docker Machine is extremely important.

In addition, containers (unlike virtual machines) share the kernel with the host, so the kernel exploit running inside the container is directly executed in the host kernel. For example, a kernel privilege escalation exploit (such as Dirty COW) running inside a well-isolated container will lead to the root access on the host.

Rule 1

Don’t give access to the socket of the Docker daemon

The Docker service (daemon) uses the UNIX socket /var/run/docker.sock for incoming API connections. The owner of this resource must be the root user. And no other way. Changing access rights to this socket is essentially equivalent to granting root access to the host system.

Also, you should not share the /var/run/docker.sock socket with containers, where you can do without it, because in this case, compromising the service in the container will lead to complete control over the host system. If you have containers that use something like this:

-v /var/run/docker.sock://var/run/docker.sock

or for docker-compose:

volumes:

- "/var/run/docker.sock:/var/run/docker.sock"

It is urgent to change this.

And the last – never ever use the Docker TCP socket without the absolute certainty that you need it, especially without the use of additional protection methods (at least authorization). By default, the Docker TCP socket opens the port on the external interface 0.0.0.0:2375 (2376, in the case of HTTPs) and allows you to fully control the containers, and with it the potential host system.

Rule 2

Configure an unprivileged user inside the container

Configuring the container to use an unprivileged user is the best way to avoid an elevation of privilege attack. This can be done in various ways:

  1. Using the “-u” option of the “docker run” command:
docker run -u 4000 alpine
  1. During image build:
FROM alpine

RUN groupadd -r myuser && useradd -r -g myuser myuser

<Here you can still execute commands from the root user, for example, install packages>

User myuser
  1. Enable support for “user namespace” (user environment) in Docker daemon:
--userns-remap = default

In Kubernetes, the latter is configured in the Security Context via the runAsNonRoot option:

kind: ...

apiVersion: ...

metadata:

  name: ...

spec:

  ...

  containers:

  - name: ...

    image: ....

    securityContext:

          ...

          runAsNonRoot: true

          ...

Rule 3

Limit container capabilities

On Linux, starting with kernel 2.2, there is a way to control the capabilities of privileged processes called Linux Kernel Capabilities-

Docker uses a predefined set of these kernel features by default. And it allows you to change this set using the commands:

--cap-drop - disables kernel feature support

--cap-add - adds kernel capability support

The best security setting is to first disable all features (–cap-drop all), and then connect only the necessary ones. For example, like this:

docker run --cap-drop all --cap-add CHOWN alpine

And most important (!): Avoid running containers with the –privileged flag !!!

In Kubernetes, the Linux Kernel Capabilities can be configured in the Security Context via the capabilities option:

kind: ...

apiVersion: ...

metadata:

  name: ...

spec:

  ...

  containers:

  - name: ...

    image: ....

    securityContext:

          ...

          capabilities:

            drop:

              - all

            add:

              - CHOWN

          ...

Rule 4

Use the no-new-privileges flag

When starting a container, it is useful to use the –security-opt = no-new-privileges flag which prevents privilege escalation inside the container.

In Kubernetes, the Linux Kernel Capabilities constraint is configured in the Security Context via the allowPrivilegeEscalation option:

kind: ...

apiVersion: ...

metadata:

  name: ...

spec:

  ...

  containers:

  - name: ...

    image: ....

    securityContext:

          ...

          allowPrivilegeEscalation: false

          ...

Rule 5

Turn off inter-container communication

By default, inter-container communication is enabled in Docker, what means that all containers can communicate with each other (using the docker0 network). This feature can be disabled by running the Docker service with the –icc = false flag.

Rule 6

Use Linux Security Modules (Linux Security Module – seccomp, AppArmor, SELinux)

By default, Docker already uses profiles for Linux security modules. Therefore, never disable security profiles! The maximum that can be done with them is to tighten the rules.

Docker also uses AppArmor for protection, and the Docker Engine itself generates a default profile for AppArmor when the container starts. In other words, instead of:

$ docker run --rm -it hello-world

starts up:

$ docker run --rm -it --security-opt apparmor = docker-default hello-world

The documentation also provides an example of an AppArmor profile for nginx, which is quite possible (necessary!) to use:

#include <tunables / global>

profile docker-nginx flags = (attach_disconnected, mediate_deleted) {

  #include <abstractions / base>

 

  network inet tcp,

  network inet udp,

  network inet icmp,

 

  deny network raw,

 

  deny network packet,

 

  file,

  umount,

 

  deny / bin / ** wl,

  deny / boot / ** wl,

  deny / dev / ** wl,

  deny / etc / ** wl,

  deny / home / ** wl,

  deny / lib / ** wl,

  deny / lib64 / ** wl,

  deny / media / ** wl,

  deny / mnt / ** wl,

  deny / opt / ** wl,

  deny / proc / ** wl,

  deny / root / ** wl,

  deny / sbin / ** wl,

  deny / srv / ** wl,

  deny / tmp / ** wl,

  deny / sys / ** wl,

  deny / usr / ** wl,

 

  audit / ** w,

 

  /var/run/nginx.pid w,

 

  / usr / sbin / nginx ix,

 

  deny / bin / dash mrwklx,

  deny / bin / sh mrwklx,

  deny / usr / bin / top mrwklx,

 

 

  capability chown,

  capability dac_override,

  capability setuid,

  capability setgid,

  capability net_bind_service,

 

  deny @ {PROC} / * w, # deny write for all files directly in / proc (not in a subdir)

  # deny write to files not in / proc / <number> / ** or / proc / sys / **

  deny @ {PROC} / {[^ 1-9], [^ 1-9] [^ 0-9], [^ 1-9s] [^ 0-9y] [^ 0-9s], [^ 1- 9] [^ 0-9] [^ 0-9] [^ 0-9] *} / ** w,

  deny @ {PROC} / sys / [^ k] ** w, # deny / proc / sys except / proc / sys / k * (effectively / proc / sys / kernel)

  deny @ {PROC} / sys / kernel / {?, ??, [^ s] [^ h] [^ m] **} w, # deny everything except shm * in / proc / sys / kernel /

  deny @ {PROC} / sysrq-trigger rwklx,

  deny @ {PROC} / mem rwklx,

  deny @ {PROC} / kmem rwklx,

  deny @ {PROC} / kcore rwklx,

 

  deny mount

 

  deny / sys / [^ f] * / ** wklx,

  deny / sys / f [^ s] * / ** wklx,

  deny / sys / fs / [^ c] * / ** wklx,

  deny / sys / fs / c [^ g] * / ** wklx,

  deny / sys / fs / cg [^ r] * / ** wklx,

  deny / sys / firmware / ** rwklx,

  deny / sys / kernel / security / ** rwklx,

}

Rule 7

Limit container resources

This rule is quite simple: in order to prevent containers from devouring all server resources during the next DoS / DDoS attack, we can set memory usage limits for each container individually. You can limit: amount of memory, CPU, number of container restarts.

So let’s go step-by-step.

Memory

The -m or –memory option

The maximum amount of memory a container can use. The minimum value is 4m (4 megabytes).

Option –memory-swap

Option to configure swap (swap file). Can be configured like this:

  • If –memory-swap> 0, then the –memory flag must also be set. In this case, memory-swap shows how much total memory is available to the container along with swap.
  • A simpler example. If –memory = “300m” and –memory-swap = “1g”, then the container can use 300MB of memory and 700MB of swap (1g – 300m).
  • If –memory-swap = 0, the setting is ignored.
  • If –memory-swap is set to the same value as –memory, then the container will not a have swap.
  • If –memory-swap is not specified, but –memory is specified, then the number of swap will be equal to twice the amount of memory specified. For example, if –memory = “300m”, and –memory-swap is not set, then the container will use 300MB of memory and 600MB of swap.
  • If –memory-swap = -1, then the container will use all the swap that is possible on the host system.

Note: the free utility launched inside the container doesn’t show the actual value of the available swap for the container, but the number of host swap.

Option –oom-kill-disable

Allows you to enable or disable the OOM (Out of memory) killer.

Attention! You can turn off OOM Killer only with the given –memory option, otherwise it may happen that with out-of-memory inside the container, the kernel will start killing the host system processes.

Other memory management configuration options, such as –memory-swappiness, –memory-reservation, and –kernel-memory, are more for tuning the performance of the container.

CPU

Option –cpus

The option sets how much available processor resources the container can use. For example, if we have a host with two CPUs and we set –cpus = “1.5”, then the container is guaranteed to use one and a half processors.

Option –cpuset-cpus

Configures the use of specific cores or CPUs. The value can be specified with a hyphen or a comma. In the first case, the range of allowed cores will be indicated, in the second – specific cores.

Number of container restarts

--restart = on-failure: <number_of_restarts>

This setting sets how many times Docker will try to restart the container if it unexpectedly crashes. The counter is reset to zero if the state of the container has changed to running.

It is recommended to set a small positive number, for example, 5, which will avoid endless restarts of a non-working service.

Rule 8

Use read-only file systems and volume

If the container does not have to write anything somewhere, then you need to use the read-only file system as much as possible. This will greatly complicate the life of a potential intruder.

An example of starting a container with read-only file system:

docker run --read-only alpine

An example of connecting volume in read-only mode:

docker run -v volume-name: / path / in / container: ro alpine

Rule 9

Use container security analysis tools

Tools must be used to discover containers with known vulnerabilities. There are not very many of them yet, but they are:

Free:

  • Clair.

Commercial:

  • Snyk (there is a free version);
  • anchore (there is a free version);
  • JFrog XRay;

And for Kubernetes, there are tools for detecting configuration errors:

  • kubeaudit;
  • io;
  • kube-bench.