Articles by adrian

  1. CRIU configuration files

    One of the CRIU uses cases is container checkpointing and restoring, which also can be used to migrate containers. Therefore container runtimes are using CRIU to checkpoint all the processes in a container as well as to restore the processes in that container. Many container runtimes are layered, which means that the user facing layer (Podman, Docker, LXD) calls another layer to checkpoint (or restore) the container (runc, LXC) and this layer then calls CRIU.

    This leads to the problem that if CRIU introduces a new feature or option, all involved layers need code changes. Or if one of those layers made assumption about how to use CRIU, the user must live with that assumption, which may be wrong for the user's use case.

    To offer the possibility to change CRIU's behaviour through all these layers, be it that the container runtime has not implemented a certain CRIU feature or that the user needs a different CRIU behaviour, we started to discuss configuration files in 2016.

    Configuration files should be evaluated by CRIU and offer a third way to influence CRIU's behaviour. Setting options via CLI and RPC are the other two ways.

    At the Linux Plumbers Conference in 2016 during the Checkpoint/Restore micro-conference I gave a short introduction talk about how configuration files could look and everyone was nodding their head.

    In early 2017 Veronika Kabatova provided patches which were merged in CRIU's development branch criu-dev. At that point the development stalled a bit and only in early 2018 the discussion was picked up again. To have a feature merged into the master branch, which means it will be part of the next release, requires complete documentation (man-pages and wiki) and feature parity for CRIU's CLI and RPC mode. At this point it was documented but not supported in RPC mode.

    Adding configuration file support to CRIU's RPC mode was not a technical challenge, but if any recruiter ever asks me which project was the most difficult, I will talk about this. We were exchanging mails and patches for about half a year and it seems everybody had different expectations how everything should behave. I think at the end they pitied me and just merged my patches...

    CRIU 3.11 which was released on 2018-11-06 is the first release which includes support for configuration files and now (finally) I want to write about how it could be used.

    I am using the Simple_TCP_pair example from CRIU's wiki. First start the server:

    #️  ./tcp-howto 10000
    

    Then I am starting the client:

    # ./tcp-howto 127.0.0.1 10000
    Connecting to 127.0.0.1:10000
    PP 1 -> 1
    PP 2 -> 2
    PP 3 -> 3
    PP 4 -> 4
    

    Once client and server are running, let's try to checkpoint the client:

    # rm -f /etc/criu/default.conf
    # criu dump -t `pgrep -f 'tcp-howto 127.0.0.1 10000'`
    Error (criu/sk-inet.c:188): inet: Connected TCP socket, consider using --tcp-established option.
    

    CRIU tells us that it needs a special option to checkpoint processes with established TCP connections. No problem, but instead of changing the command-line, let's add it to the configuration file:

    # echo tcp-established > /etc/criu/default.conf
    # criu dump -t `pgrep -f 'tcp-howto 127.0.0.1 10000'`
    Error (criu/tty.c:1861): tty: Found dangling tty with sid 16693 pgid 16711 (pts) on peer fd 0.
    Task attached to shell terminal. Consider using --shell-job option. More details on http://criu.org/Simple_loop
    

    Alright, let's also add shell-job to the configuration file:

    # echo shell-job >> /etc/criu/default.conf
    # criu dump -t `pgrep -f 'tcp-howto 127.0.0.1 10000'` && echo OK
    OK
    

    That worked. Cool. Finally! Most CLI options can be used in the configuration file(s) and more detailed documentation can be found in the CRIU wiki.

    I want to thank Veronika for her initial implementation and everyone else helping, discussing and reviewing emails and patches to get this ready for release.

    Tagged as : criu podman
  2. Nextcloud in a Container

    After using Podman a lot during the last weeks while adding checkpoint/restore support to Podman I was finally ready to use containers in production on our mirror server. We were still running the ownCloud version that came via RPMs in Fedora 27 and it seems like many people have moved on to Nextcloud from tarballs.

    One of the main reason to finally use containers is Podman's daemonless approach.

    The first challenge while moving from ownCloud 9.1.5 to Nextcloud 14 is the actual upgrade. To make sure it works I first made a copy of all the uploaded files and of the database and did a test upgrade yesterday using a CentOS 7 VM. With PHP 7 from Software Collections it was not a real problem. It took some time, but it worked. I used the included upgrade utility to upgrade from ownCloud 9 to Nextcloud 10, to Nextcloud 11, to Nextcloud 12, to Nextcloud 13, to Nextcloud 14. Lots of upgrades. Once I verified that everything was still functional I did it once more, but this time I used the real data and disabled access to our ownCloud instance.

    The next step was to start the container. I decided to use the nextcloud:fpm container as I was planning to use the existing web server to proxy the requests. The one thing which makes using containers on our mirror server a bit difficult, is that it is not possible to use any iptables NAT rules. At some point there are just too many network connections in the NAT table from all the clients connecting to our mirror server that it used to drop network connections. This is a problem which is probably fixed since a long time, but it used to be a problem and I try to avoid it. That is why my Nextcloud container is using the host network namespace:

    podman run --name nextcloud-fpm -d --net host \
      -v /home/containers/nextcloud/html:/var/www/html \
      -v /home/containers/nextcloud/apps:/var/www/html/custom_apps \
      -v /home/containers/nextcloud/config:/var/www/html/config \
      -v /home/containers/nextcloud/data:/var/www/html/data \
      nextcloud:fpm
    

    I was reusing my existing config.php in which the connection to PostgreSQL on 127.0.0.1 was still configured.

    Once the container was running I just had to add the proxy rules to the Apache HTTP Server and it should have been ready. Unfortunately this was not as easy as I hoped it to be. All the documentation I found is about using the Nextcloud FPM container with NGINX. I found nothing about Apache's HTTPD. The following lines required most of the time of the whole upgrade to Nextcloud project:

    <FilesMatch \.php.*>
       SetHandler proxy:fcgi://127.0.0.1:9000/
       ProxyFCGISetEnvIf "reqenv('REQUEST_URI') =~ m|(/owncloud/)(.*)$|" SCRIPT_FILENAME "/var/www/html/$2"
       ProxyFCGISetEnvIf "reqenv('REQUEST_URI') =~ m|^(.+\.php)(.*)$|" PATH_INFO "$2"
    </FilesMatch>
    

    I hope these lines are actually correct, but so far all clients connecting to it seem to be happy. To have the Nextcloud container automatically start on system startup I based my systemd podman service file on the one from the Intro to Podman article.

    [Unit]
    Description=Custom Nextcloud Podman Container
    After=network.target
    
    [Service]
    Type=simple
    TimeoutStartSec=5m
    ExecStartPre=-/usr/bin/podman rm nextcloud-fpm
    
    ExecStart=/usr/bin/podman run --name nextcloud-fpm --net host \
       -v /home/containers/nextcloud/html:/var/www/html \
       -v /home/containers/nextcloud/apps:/var/www/html/custom_apps \
       -v /home/containers/nextcloud/config:/var/www/html/config \
       -v /home/containers/nextcloud/data:/var/www/html/data \
       nextcloud:fpm
    
    ExecReload=/usr/bin/podman stop nextcloud-fpm
    ExecReload=/usr/bin/podman rm nextcloud-fpm
    ExecStop=/usr/bin/podman stop nextcloud-fpm
    Restart=always
    RestartSec=30
    
    [Install]
    WantedBy=multi-user.target
    
    Tagged as : fedora nextcloud podman
  3. Antimatter Factory

    On October 19th, 2018, I was giving a talk about OpenHPC at the CentOS Dojo at CERN.

    I really liked the whole event and my talk was also recorded. Thanks for everyone involved for organizing it. The day before FOSDEM 2019 there will be another CentOS Dojo in Brussels. I hope I have the chance to also attend it.

    The most interesting thing during my two days in Geneva was, however, the visit of the Antimatter Factory:

    Antimatter Factory

    Assuming I actually understood anything we were told about it, it is exactly that: an antimatter factory.

    Tagged as : fedora centos openhpc
  4. S3 sleep with ThinkPad X1 Carbon 6th Generation

    Since a few weeks I have the new ThinkPad X1 Carbon 6th Generation and as many people I really like it.

    The biggest problem is that suspend does not work as expected.

    The issue seems to be that the X1 is using a new suspend technology called "Windows Modern Standby," or S0i3, and has removed classic S3 sleep.[1]

    Following the instructions in Alexander's article it was possible to get S3 suspend to work as expected and everything was perfect.

    With the latest Firmware update to 0.1.28 (using sudo fwupdmgr update (thanks a lot to Linux Vendor Firmware Service (LVFS) that this works!!!)) I checked if the patch mentioned in Alexander's article still applies and it did not.

    So I modified the patch to apply again and made it available here: https://lisas.de/~adrian/X1C6_S3_DSDT_0_1_28.patch

    Talking with Christian about it he mentioned an easier way to include the changed ACPI table into grub. For my Fedora system this looks like this:

    • cp dsdt.aml /boot/efi/EFI/fedora/
    • echo 'acpi $prefix/dsdt.aml' > /boot/efi/EFI/fedora/custom.cfg

    Thanks to Alexander and Christian I can correctly suspend my X1 again.

    Update 2018-09-09: Lenovo fixed the BIOS and everything described above is no longer necessary with version 0.1.30. Also see https://brauner.github.io/2018/09/08/thinkpad-6en-s3.html

    Tagged as : fedora X1
  5. Latest CRIU for CentOS COPR

    The version of CRIU which is included with CentOS is updated with every minor CentOS release (at least at the time of writing this) since 7.2, but once the minor CentOS release is available CRIU is not updated anymore until the next minor release. To make it easier to use the latest version of CRIU on CentOS I am now also rebuilding the latest version in COPR for CentOS: https://copr.fedorainfracloud.org/coprs/adrian/criu-el7/.

    To enable my CRIU COPR on CentOS following steps are necessary:

    • yum install yum-plugin-copr
    • yum copr enable adrian/criu-el7

    And then the latest version of CRIU can be installed using yum install criu.

    Tagged as : CentOS criu migration
  6. archive.rpmfusion.org

    After many years the whole RPM Fusion repository has grown to over 320GB. There have been occasional requests to move the unsupported releases to an archive, just like Fedora handles its mirror setup, but until last week this did not happen.

    As of now we have moved all unsupported releases (EL-5, Fedora 8 - 25) to our archive (http://archive.rpmfusion.org/) and clients are now being redirected to the new archive system. The archive consists of 260GB which means we can reduce the size mirrors need to carry by more than 75%.

    From a first look at the archive logs the amount of data requested by all clients for the archived releases is only about 30GB per day. Those 30GB are downloaded by over 350000 HTTP requests and over 98% of those requests are downloading the repository metdata only (repomd.xml, *filelist*, *primary*, *comps*).

  7. OpenHPC: Building Blocks

    I will be giving two talks about OpenHPC in the next weeks. The first talk will be at DevConf.cz 2018: OpenHPC Introduction

    The other talk will be at the CentOS Dojo in Brussels.

    I hope I will be able to demonstrate my two node HPC system based on Raspberry Pis and it definitely will be about OpenHPC's building blocks:

    OpenHPC Building
Blocks

    And the results:

    OpenHPC Building BlocksOpenHPC Building
BlocksOpenHPC Building
BlocksOpenHPC Building
Blocks

    Come to one of my talks and you will able to build your OpenHPC engineer from the available building blocks.

    Tagged as : CentOS OpenHPC
  8. Optimizing live container migration in LXD

    After having worked on optimizing live container migration based on runc (pre-copy migration and post-copy migration) I tried to optimize container migration in LXD.

    After a few initial discussions with Christian I started with pre-copy migration. Container migration in LXD is based on CRIU, just as in runc and CRIU's pre-copy migration support is based on dirty page tracking support of Linux: SOFT-DIRTY PTEs.

    As LXD uses LXC for the actual container checkpointing and restoring I was curious if there was already pre-copy migration support in LXC. After figuring out the right command-line parameters it almost worked thanks to the great checkpoint and restore support implemented by Tycho some time ago.

    Now that I knew that it works in LXC I focused on getting pre-copy migration support into LXD. LXD supports container live migration using the move command: lxc move <container> <remote>:<container>
    This move command, however, did not use any optimization yet. It basically did:

    1. Initial sync of the filesystem
    2. Checkpoint container using CRIU
    3. Transfer container checkpoint
    4. Final sync of the filesystem
    5. Restart container on the remote system

    The downtime for the container in this scenario is between step 2 and step
    5 and depends on the used memory of the processes inside the container. The goal of pre-copy migration is to dump the memory of the container and transfer it to the remote destination while the container keeps on running and doing a final dump with only the memory pages that changed since the last pre-dump (more about process migration optimization theories).

    Back to LXD: At the end of the day I had a very rough (and very hardcoded) first pre-copy migration implementation ready and I kept working on it until it was ready to be submitted upstream. The pull request has already been merged upstream and now LXD supports pre-copy migration.

    As not all architecture/kernel/criu combinations support pre-copy migration it has to be turned on manually right now, but we already discussed adding pre-copy support detection to LXC. To tell LXD to use pre-copy migration, the parameter 'migration.incremental.memory' needs to be set to 'true'. Once that is done and if LXD is instructed to migrate a container the following will happen:

    • Initial sync of the filesystem
    • Start pre-copy checkpointing loop using CRIU
      • Check if maximum number pre-copy iterations has been reached
      • Check if threshold of unchanged memory pages has been reached
      • Transfer container checkpoint
      • Continue pre-copy checkpointing loop if neither of those conditions is true
    • Final container delta checkpoint using CRIU
    • Transfer final delta checkpoint
    • Final sync of the filesystem
    • Restart container on the remote system

    So instead of doing a single checkpoint and transferring it, there are now multiple pre-copy checkpoints and the container keeps on running during those transfers. The container is only suspended during the last delta checkpoint and the transfer of the last delta checkpoint. In many cases this reduces the container downtime during migration, but there is the possibility that pre-copy migration also increases the container downtime during migration. This depends (as always) on the workload.

    To control how many pre-copy iterations LXD does there are two additional variables:

    1. migration.incremental.memory.iterations (defaults to 10)
    2. migration.incremental.memory.goal (defaults to 70%)

    The first variable (iterations) is used to tell LXD how many pre-copy iterations it should do before doing the final dump and the second variable (goal) is used to tell LXD the percentage of pre-copied memory pages that should not change between pre-copy iterations before doing the final dump.

    So LXD, in the default configuration, does either 10 pre-copy iterations before doing the final migration or the final migration is triggered when at least 70% of the memory pages have been transferred by the last pre-copy iteration.

    Now that this pull request is merged and if pre-copy migration is enabled a lxc move <container> <remote>:<container> should live migrate the container with a reduced downtime.

    I want to thank Christian for the collaboration on getting CRIU's pre-copy support into LXD, Tycho for his work preparing LXC and LXD to support migration so nicely and the developers of p.haul for the ideas how to implement pre-copy container migration. Next step: lazy migration.

    Tagged as : criu migration pre-copy
  9. Lazy Migration in CRIU's master branch

    For almost two years Mike Rapoport and I have been working on lazy process migration. Lazy process migration (or post-copy migration) is a technique to decrease the process or container downtime during the live migration. I described the basic functionality in the following previous articles:

    Those articles are not 100% correct anymore as we changed some of the parameters during the last two years, but the concepts stayed the same.

    Mike and I started about two years ago to work on it and the latest CRIU release (3.5) includes the possibility to use lazy migration. Now that the post-copy migration feature has been merged from the criu-dev branch to the master branch it is part of the normal CRIU releases.

    With CRIU's 3.5 release lazy migration can be used on any kernel which supports userfaultfd. I already updated the CRIU packages in Fedora to 3.5 so that lazy process migration can be used just by installing the latest CRIU packages with dnf (still in the testing repository right now).

    More information about container live migration in our upcoming Open Source Summit Europe talk: Container Migration Around The World.

    My pull request to support lazy migration in runC was also recently merged, so that it is now possible to migrate containers using pre-copy migration and post-copy migration. It can also be combined.

    Another interesting change about CRIU is that it started as x86_64 only and now it is also available on aarch64, ppc64le and s390x. The support to run on s390x has just been added with the previous 3.4 release and starting with Fedora 27 the necessary kernel configuration options are also active on s390x in addition to the other supported architectures.

    Tagged as : criu fedora
  10. Linux Plumbers Conference 2016

    It is a bit late but I still wanted to share my presentations from this year's Linux Plumbers Conference:

    On my way back home I had to stay one night in Albuquerque and it looks like the hotel needs to upgrade its TV system. It is still running Fedora 10 which is EOL since 2009-12-18:

    Still Fedora 10

    Tagged as : criu

Page 1 / 5