The term “container breakout” is used to indicate a situation in which a program running inside a Docker container can overcome isolation mechanisms and gain additional capabilities or access to confidential information on the host. To prevent such breakthroughs, the number of container capabilities by default is reduced. For example, the Docker daemon runs by default under the root, however it is possible to create a user-level namespace or to remove potentially dangerous container capabilities.
Best practices
Capabilities, which the application doesn’t need should be removed.
- CAP_SYS_ADMIN is especially dangerous in terms of security, since it gives the right to perform a significant number of superuser-level operations: mounting file systems, entering core namespaces, ioctl etc.
- To have container capabilities equivalent to regular user rights, create an isolated user namespace for your containers. If possible, avoid running containers with uid 0.
- If you still cannot do without a privileged container, make sure that it is installed from a trusted repository.
Closely monitor cases of mounting potentially dangerous host resources: /var/run/docker.sock), / proc, / dev, etc. Usually, these resources are needed to perform operations related to the basic functionality of containers. Make sure that you understand why and how you need to limit the access of processes to this information. Sometimes just setting the read-only mode is enough. Never give write permission without making sure why it is needed. In any case, Docker uses copy-on-write to prevent changes that occur in the running container from getting into its base image and potentially into other containers that will be created based on this image.
Examples
The root user of the Docker container can create devices by default. Probably you want to disallow this:
# sudo docker run --rm -it --cap-drop=MKNOD alpine sh / # mknod /dev/random2 c 1 8 mknod: /dev/random2: Operation not permitted
Root can also change the permissions of any file. This is easy to verify: create a file under any regular user, run chmod 600 (read and write is available only to the owner), log in as root and make sure that the file is still available to you.
This can also be fixed, especially if you have mounted folders with confidential user data.
# sudo docker run --rm -it --cap-drop=DAC_OVERRIDE alpine sh
Create a regular user and go to his home directory. Then:
~ $ touch supersecretfile ~ $ chmod 600 supersecretfile ~ $ exit ~ # cat /home/user/supersecretfile cat: can't open '/home/user/supersecretfile': Permission denied
Many security scanners and malware collect their network packets from 0. This behavior can be disabled in this way:
# docker run --cap-drop=NET_RAW -it uzyexe/nmap -A localhost Starting Nmap 7.12 ( https://nmap.org ) at 2017-08-16 10:13 GMT Couldn't open a raw socket. Error: Operation not permitted (1)
If you create a container without a namespace, then by default, processes running inside the container, from the point of view of the host, will work on behalf of the superuser.
# docker run -d -P nginx # ps aux | grep nginx root 18951 0.2 0.0 32416 4928 ? Ss 12:31 0:00 nginx: master process nginx -g daemon off;
However, we can create a separate user namespace. To do this, add the conf key to the /etc/docker/daemon.json file (be careful, follow the json syntax rules):
"userns-remap": "default"
Restart Docker. This will create the dockremap user. The new namespace will be empty.
# systemctl restart docker # docker ps
Run the nginx image again:
# docker run -d -P nginx # ps aux | grep nginx 165536 19906 0.2 0.0 32416 5092 ? Ss 12:39 0:00 nginx: master process nginx -g daemon off;
Now the nginx process runs in a different (user) namespace. Thus, we are able to improve the insulation of the containers.