One of the CRIU uses cases is container checkpointing and restoring, which also can be used to migrate containers. Therefore container runtimes are using CRIU to checkpoint all the processes in a container as well as to restore the processes in that container. Many container runtimes are layered, which means that the user facing layer (Podman, Docker, LXD) calls another layer to checkpoint (or restore) the container (runc, LXC) and this layer then calls CRIU.

This leads to the problem that if CRIU introduces a new feature or option, all involved layers need code changes. Or if one of those layers made assumption about how to use CRIU, the user must live with that assumption, which may be wrong for the user’s use case.

To offer the possibility to change CRIU’s behaviour through all these layers, be it that the container runtime has not implemented a certain CRIU feature or that the user needs a different CRIU behaviour, we started to discuss configuration files in 2016.

Configuration files should be evaluated by CRIU and offer a third way to influence CRIU’s behaviour. Setting options via CLI and RPC are the other two ways.

At the Linux Plumbers Conference in 2016 during the Checkpoint/Restore micro-conference I gave a short introduction talk about how configuration files could look and everyone was nodding their head.

In early 2017 Veronika Kabatova provided patches which were merged in CRIU’s development branch criu-dev. At that point the development stalled a bit and only in early 2018 the discussion was picked up again. To have a feature merged into the master branch, which means it will be part of the next release, requires complete documentation (man-pages and wiki) and feature parity for CRIU’s CLI and RPC mode. At this point it was documented but not supported in RPC mode.

Adding configuration file support to CRIU’s RPC mode was not a technical challenge, but if any recruiter ever asks me which project was the most difficult, I will talk about this. We were exchanging mails and patches for about half a year and it seems everybody had different expectations how everything should behave. I think at the end they pitied me and just merged my patches…

CRIU 3.11 which was released on 2018-11-06 is the first release which includes support for configuration files and now (finally) I want to write about how it could be used.

I am using the Simple_TCP_pair example from CRIU’s wiki. First start the server:

#️  ./tcp-howto 10000

Then I am starting the client:

# ./tcp-howto 127.0.0.1 10000
Connecting to 127.0.0.1:10000
PP 1 -> 1
PP 2 -> 2
PP 3 -> 3
PP 4 -> 4

Once client and server are running, let’s try to checkpoint the client:

# rm -f /etc/criu/default.conf
# criu dump -t `pgrep -f 'tcp-howto 127.0.0.1 10000'`
Error (criu/sk-inet.c:188): inet: Connected TCP socket, consider using --tcp-established option.

CRIU tells us that it needs a special option to checkpoint processes with established TCP connections. No problem, but instead of changing the command-line, let’s add it to the configuration file:

# echo tcp-established > /etc/criu/default.conf
# criu dump -t `pgrep -f 'tcp-howto 127.0.0.1 10000'`
Error (criu/tty.c:1861): tty: Found dangling tty with sid 16693 pgid 16711 (pts) on peer fd 0.
Task attached to shell terminal. Consider using --shell-job option. More details on http://criu.org/Simple_loop

Alright, let’s also add shell-job to the configuration file:

# echo shell-job >> /etc/criu/default.conf
# criu dump -t `pgrep -f 'tcp-howto 127.0.0.1 10000'` && echo OK
OK

That worked. Cool. Finally! Most CLI options can be used in the configuration file(s) and more detailed documentation can be found in the CRIU wiki.

I want to thank Veronika for her initial implementation and everyone else helping, discussing and reviewing emails and patches to get this ready for release.

After using Podman a lot during the last weeks while adding checkpoint/restore support to Podman I was finally ready to use containers in production on our mirror server. We were still running the ownCloud version that came via RPMs in Fedora 27 and it seems like many people have moved on to Nextcloud from tarballs.

One of the main reason to finally use containers is Podman’s daemonless approach.

The first challenge while moving from ownCloud 9.1.5 to Nextcloud 14 is the actual upgrade. To make sure it works I first made a copy of all the uploaded files and of the database and did a test upgrade yesterday using a CentOS 7 VM. With PHP 7 from Software Collections it was not a real problem. It took some time, but it worked. I used the included upgrade utility to upgrade from ownCloud 9 to Nextcloud 10, to Nextcloud 11, to Nextcloud 12, to Nextcloud 13, to Nextcloud 14. Lots of upgrades. Once I verified that everything was still functional I did it once more, but this time I used the real data and disabled access to our ownCloud instance.

The next step was to start the container. I decided to use the nextcloud:fpm container as I was planning to use the existing web server to proxy the requests. The one thing which makes using containers on our mirror server a bit difficult, is that it is not possible to use any iptables NAT rules. At some point there are just too many network connections in the NAT table from all the clients connecting to our mirror server that it used to drop network connections. This is a problem which is probably fixed since a long time, but it used to be a problem and I try to avoid it. That is why my Nextcloud container is using the host network namespace:

podman run --name nextcloud-fpm -d --net host 
  -v /home/containers/nextcloud/html:/var/www/html 
  -v /home/containers/nextcloud/apps:/var/www/html/custom_apps 
  -v /home/containers/nextcloud/config:/var/www/html/config 
  -v /home/containers/nextcloud/data:/var/www/html/data 
  nextcloud:fpm

I was reusing my existing config.php in which the connection to PostgreSQL on 127.0.0.1 was still configured.

Once the container was running I just had to add the proxy rules to the Apache HTTP Server and it should have been ready. Unfortunately this was not as easy as I hoped it to be. All the documentation I found is about using the Nextcloud FPM container with NGINX. I found nothing about Apache’s HTTPD. The following lines required most of the time of the whole upgrade to Nextcloud project:

<FilesMatch .php.*>
   SetHandler proxy:fcgi://127.0.0.1:9000/
   ProxyFCGISetEnvIf "reqenv('REQUEST_URI') =~ m|(/owncloud/)(.*)$|" SCRIPT_FILENAME "/var/www/html/$2"
   ProxyFCGISetEnvIf "reqenv('REQUEST_URI') =~ m|^(.+.php)(.*)$|" PATH_INFO "$2"
</FilesMatch>

I hope these lines are actually correct, but so far all clients connecting to it seem to be happy. To have the Nextcloud container automatically start on system startup I based my systemd podman service file on the one from the Intro to Podman article.

[Unit]
Description=Custom Nextcloud Podman Container
After=network.target

[Service]
Type=simple
TimeoutStartSec=5m
ExecStartPre=-/usr/bin/podman rm nextcloud-fpm

ExecStart=/usr/bin/podman run --name nextcloud-fpm --net host 
   -v /home/containers/nextcloud/html:/var/www/html 
   -v /home/containers/nextcloud/apps:/var/www/html/custom_apps 
   -v /home/containers/nextcloud/config:/var/www/html/config 
   -v /home/containers/nextcloud/data:/var/www/html/data 
   nextcloud:fpm

ExecReload=/usr/bin/podman stop nextcloud-fpm
ExecReload=/usr/bin/podman rm nextcloud-fpm
ExecStop=/usr/bin/podman stop nextcloud-fpm
Restart=always
RestartSec=30

[Install]
WantedBy=multi-user.target

On October 19th, 2018, I was giving a talk about OpenHPC at the CentOS Dojo at CERN.

I really liked the whole event and my talk was also recorded. Thanks for everyone involved for organizing it. The day before FOSDEM 2019 there will be another CentOS Dojo in Brussels. I hope I have the chance to also attend it.

The most interesting thing during my two days in Geneva was, however, the visit of the Antimatter Factory:

Antimatter Factory

Assuming I actually understood anything we were told about it, it is exactly that: an antimatter factory.

As mentioned before, I want a switch setup that is the same in every room. Of course I considered loxone touch connected to the miniserver by loxone tree  But I did not like it because of two reasons:

  1. The design is different from the design of the plugs and other elements. I don’t like the idea of having different looking electrical components.
  2. There is no possibility for a backup solution that allows to control light independent of the miniserver.

So I’ve chosen Taster 10 AX 250 V ~ (531 U)  (I’ll call it “1” from now on) and   Tastsensor-Modul 24 V AC/DC, 20 mA (A 5236 TSM) (I’ll call it “6” from now on, and the switch in the upper left will be called 6_1, the upper right 6_2 and so on …) from the company Jung.

The idea is to control the main light of each room with 1. 6_1 (up) and 6_2 (down) will be used for the roller blinds. The four remaining switches can be used differently in all rooms dependent on the needs.

But, and there’s always a but, a CAT cable only contains 8 wires. Even though it’d be enough for 7 push buttons there is no wire left for the 6 red feedback LEDs and the RGB LED. Connecting all that would require 3 CAT cables.

1 for 1
6 for 6_1 to 6_6
2 for Vcc and Ground
6 for red feedback LEDs
3 for RGB LED
----------
18 lines for each switch -> 3 CAT cables à 8 lines

That’s a price and effort I’m not willing to pay. It’d also mean that the miniserver has to provide 16 in/outputs for each room. This is what would make it really expensive. So I’ve decided to spend more of my time and come up with a solution that allows to connect my switch setup to the miniserver and to the backup circuitry at the same time while requiring only 1 CAT cable per switch.

Yes, that’s a cliffhanger.

Starting point of home automation is the signal and power cables routed to the switch cabinet in the basement. The additional cost and effort is the signal cables that would not be required in a traditional setup. The additional effort for the power lines can be neglected since the additional length from each room to the basement is compensated by less cable in the rooms for example from a switch for the roller blinds to the motor of the roller blind.

On the left you can see the power cables that go to the lights, plugs and roller blinds.

The red cables are the connections to the smoke detectors. Each room that is either a potential sleeping room or that is part of the escape path has a smoke detector (required by law). Additionally to the mandatory requirements they are connected on floor level and the floors are connected in the switch cabinet. In addition there is a connection between the three parts of the house. Currently they are all hard wired together. This might change in future to suppress the forwarding of alarms for some time. E.g. when testing smoke detectors in one part of the house it’s not desired to trigger all other smoke detectors.

As you can see there is still much space left in the switch cabinet, and that can’t be filled up only by simple fuses.

Nowadays, on floors that are partially constructed with wood, you’ve to install special fuses with spark detection . Those are 3 times the size of the traditional ones.

There will be the fault current protection switches that are nowadays mandatory for all three phases and not only for the bathroom.

There is my backup circuitry, that makes sure that, even without the home automation system, in each room the light can be switched and the roller blinds can be moved.

There will be a power supply for the backup system as well as for the home automation system.

And last but not least there will be the home automation system itself.

Since the miniserver has only a SD-card as internal storage and it’s prone to wear I’m thinking about logging of data outside the miniserver. Loxone offers so called loggers. One possibility is to set the storage location to a syslog target outside the miniserver. so now the data is in /var/log/syslog of alix.

What I need next is a possibility to store the data over a long time and a possibility to display it.

Possibilities I see:

  1. Do everything on my own
  2. influx/grafana
  3. logstash/kibana

Since #1 means work and maintenance and #2 & #3 mean quite a big installation on a small system I’m very open to suggestions of something in between.

When building a house of course the question comes up whether, and immediately after that, how much home automation should be implemented. First step after deciding that I want home automation was the selection of a system. I decided to use loxone. There are reasons:

  1. One of my friends already has some experience with the system
  2. The system is centralized, so in case it has to be replaced it can be done in that central place and no hardware updates are required in the living room. The centralized solution also allows to set up a backup system that provides basic functionality like switching of light and opening/closing of roller blinds.
  3. The company delivers the configuration software with the hardware without additional costs and conditions. If I want to update anything in the future I can do that. If I want to stick with an old version of their software I can stick with that.

The home automation has the goal to be invisible for the user and offer all the functionality that you’re used to in a “normal” home as a base. If you enter a room there shall be a switch that will turn on the light if pressed. Only if you want to you can dim the light by holding the switch or by double clicking.

Also the basic setup should look the same in all the rooms. So I’ve decided for a combination of a normal sized light switch and a 6 pin switch below it.

Details will follow.

The mechanical part of the house already exists:

Since a few weeks I have the new ThinkPad X1 Carbon 6th Generation and
as many people I really like it.

The biggest problem is that suspend does not work as expected.

The issue seems to be that the X1 is using a new suspend technology
called “Windows Modern Standby,” or S0i3, and has removed classic S3
sleep.[1]

Following the instructions in Alexander’s
article

it was possible to get S3 suspend to work as expected and everything was
perfect.

With the latest Firmware update to 0.1.28 (using sudo fwupdmgr update
(thanks a lot to Linux Vendor Firmware Service
(LVFS)
that this works!!!)) I checked if the
patch mentioned in Alexander’s article still applies and it did not.

So I modified the patch to apply again and made it available here:
https://lisas.de/~adrian/X1C6_S3_DSDT_0_1_28.patch

Talking with Christian about it he
mentioned an easier way to include the changed ACPI table into grub. For
my Fedora system this looks like this:

  • cp dsdt.aml /boot/efi/EFI/fedora/
  • echo 'acpi $prefix/dsdt.aml' > /boot/efi/EFI/fedora/custom.cfg

Thanks to Alexander and Christian I can correctly suspend my X1 again.

Update 2018-09-09: Lenovo fixed the BIOS and everything described
above is no longer necessary with version 0.1.30. Also see
https://brauner.github.io/2018/09/08/thinkpad-6en-s3.html

The version of CRIU which is included with
CentOS is updated with every minor CentOS
release (at least at the time of writing this) since 7.2, but once the
minor CentOS release is available CRIU is not updated anymore until the
next minor release. To make it easier to use the latest version of CRIU
on CentOS I am now also rebuilding the latest version in
COPR for CentOS:
https://copr.fedorainfracloud.org/coprs/adrian/criu-el7/.

To enable my CRIU COPR on CentOS following steps are necessary:

  • yum install yum-plugin-copr
  • yum copr enable adrian/criu-el7

And then the latest version of CRIU can be installed using yum install
criu
.

After many years the whole RPM Fusion repository has grown to over
320GB. There have been occasional requests to move the unsupported
releases to an archive, just like Fedora handles its mirror setup, but
until last week this did not happen.

As of now we have moved all unsupported releases (EL-5, Fedora 8 – 25)
to our archive (http://archive.rpmfusion.org/) and clients are now
being redirected to the new archive system. The archive consists of
260GB which means we can reduce the size mirrors need to carry by more
than 75%.

From a first look at the archive logs the amount of data requested by
all clients for the archived releases is only about 30GB per day. Those
30GB are downloaded by over 350000 HTTP requests and over 98% of those
requests are downloading the repository metdata only (repomd.xml,
*filelist*, *primary*, *comps*).