CRIU configuration files

One of the CRIU uses cases is container checkpointing and restoring, which also can be used to migrate containers. Therefore container runtimes are using CRIU to checkpoint all the processes in a container as well as to restore the processes in that container. Many container runtimes are layered, which means that the user facing layer (Podman, Docker, LXD) calls another layer to checkpoint (or restore) the container (runc, LXC) and this layer then calls CRIU.

This leads to the problem that if CRIU introduces a new feature or option, all involved layers need code changes. Or if one of those layers made assumption about how to use CRIU, the user must live with that assumption, which may be wrong for the user's use case.

To offer the possibility to change CRIU's behaviour through all these layers, be it that the container runtime has not implemented a certain CRIU feature or that the user needs a different CRIU behaviour, we started to discuss configuration files in 2016.

Configuration files should be evaluated by CRIU and offer a third way to influence CRIU's behaviour. Setting options via CLI and RPC are the other two ways.

At the Linux Plumbers Conference in 2016 during the Checkpoint/Restore micro-conference I gave a short introduction talk about how configuration files could look and everyone was nodding their head.

In early 2017 Veronika Kabatova provided patches which were merged in CRIU's development branch criu-dev. At that point the development stalled a bit and only in early 2018 the discussion was picked up again. To have a feature merged into the master branch, which means it will be part of the next release, requires complete documentation (man-pages and wiki) and feature parity for CRIU's CLI and RPC mode. At this point it was documented but not supported in RPC mode.

Adding configuration file support to CRIU's RPC mode was not a technical challenge, but if any recruiter ever asks me which project was the most difficult, I will talk about this. We were exchanging mails and patches for about half a year and it seems everybody had different expectations how everything should behave. I think at the end they pitied me and just merged my patches...

CRIU 3.11 which was released on 2018-11-06 is the first release which includes support for configuration files and now (finally) I want to write about how it could be used.

I am using the Simple_TCP_pair example from CRIU's wiki. First start the server:

#️  ./tcp-howto 10000

Then I am starting the client:

# ./tcp-howto 127.0.0.1 10000
Connecting to 127.0.0.1:10000
PP 1 -> 1
PP 2 -> 2
PP 3 -> 3
PP 4 -> 4

Once client and server are running, let's try to checkpoint the client:

# rm -f /etc/criu/default.conf
# criu dump -t `pgrep -f 'tcp-howto 127.0.0.1 10000'`
Error (criu/sk-inet.c:188): inet: Connected TCP socket, consider using --tcp-established option.

CRIU tells us that it needs a special option to checkpoint processes with established TCP connections. No problem, but instead of changing the command-line, let's add it to the configuration file:

# echo tcp-established > /etc/criu/default.conf
# criu dump -t `pgrep -f 'tcp-howto 127.0.0.1 10000'`
Error (criu/tty.c:1861): tty: Found dangling tty with sid 16693 pgid 16711 (pts) on peer fd 0.
Task attached to shell terminal. Consider using --shell-job option. More details on http://criu.org/Simple_loop

Alright, let's also add shell-job to the configuration file:

# echo shell-job >> /etc/criu/default.conf
# criu dump -t `pgrep -f 'tcp-howto 127.0.0.1 10000'` && echo OK
OK

That worked. Cool. Finally! Most CLI options can be used in the configuration file(s) and more detailed documentation can be found in the CRIU wiki.

I want to thank Veronika for her initial implementation and everyone else helping, discussing and reviewing emails and patches to get this ready for release.