1. CRIU and SELinux

    When I started to include container migration into Podman using CRIU over a year ago I did not really think about SELinux. I was able to checkpoint a running container and I was also able to restore it later https://podman.io/blogs/2018/10/10/checkpoint-restore.html. I never looked at the process labels of the restored containers. But I really should have.

    After my initial implementation of container checkpoint and restore for Podman I started to work on live container migration for Podman in October last year (2018). I opened the corresponding pull request end of January 2019. I immediately started to get SELinux related failures from the CI.

    Amongst other SELinux denials the main SELinux related problem was a blocked connectto.

    avc: denied { connectto } for pid=23569 comm="top" path=002F6372746F6F6C732D70722D3233363139 scontext=system_u:system_r:container_t:s0:c245,c463 tcontext=unconfined_u:system_r:container_runtime_t:s0-s0:c0.c1023 tclass=unix_stream_socket permissive=0

    This is actually a really interesting denial, because it gives away details about how CRIU works. This denial was caused by a container running top (podman run -d alpine top) which I tried to checkpoint.

    To understand why a denial like this is reported by top it helps to understand how CRIU works. To be able to access all resources of the process CRIU tries to checkpoint (or dump), CRIU injects parasite code into the process. The parasite code allows CRIU to act from within the process's address space. Once the parasite is injected and running it connects to the main CRIU process and is ready to receive commands.

    The parasite's attempt to connect to the main CRIU process is exactly the step SELinux is blocking. Looking at the denial it seems that a process top running as system_u:system_r:container_t:s0:c245,c463 is trying to connectto a socket labeled as unconfined_u:system_r:container_runtime_t:s0-s0:c0.c1023, which is indeed suspicious: something running in a container tries to connect to something running on the outside of the container. Knowing that this is CRIU and knowing how CRIU works it is, however, required that the parasite code connects to the main process using connectto.

    Fortunately SELinux has the necessary interface to solve this: setsockcreatecon(3). Using setsockcreatecon(3) it is possible to specify the context of newly created sockets. So all we have to do is get the context of the process to checkpoint and tell SELinux to label newly created sockets accordingly (8eb4309). Once understood that was easy. Unfortunately this is also where the whole thing got really complicated.

    The CRIU RPM package in Fedora is built without SELinux support, because CRIU's SELinux support until now was limited and not tested. CRIU's SELinux support used to be: If the process context does not start with unconfined_ CRIU just refuses to dump the process and exits. Being unaware of SELinux a process restored with CRIU was no longer running with the context it was started but with the context of CRIU during the restore. So if a container was running with a context like system_u:system_r:container_t:s0:c248,c716 during checkpointing it was running with the wrong context after restore: unconfined_u:system_r:container_runtime_t:s0, which is the context of the container runtime and not of the actual container process.

    So first I had to fix CRIU's SELinux handling to be able to use setsockcreatecon(3). Fortunately, once I understood the problem, it was pretty easy to fix CRIU's SELinux process labeling. Most of the LSM code in CRIU was written by Tycho in 2015 with focus on AppArmor which luckily uses the same interfaces as SELinux. So all I had to do is remove the restrictions on which SELinux context CRIU is willing to operate on and make sure that CRIU stores the information about the process context in its image files 796da06.

    Once the next CRIU release with these patches included is available I have to add BuildRequires: libselinux-devel to the RPM to build Fedora's CRIU package with SELinux support. This, however, means that CRIU users on Fedora might see SELinux errors they have not seen before. CRIU now needs SELinux policies which allow CRIU to change the SELinux context of a running process. For the Podman use case which started all of this there has been the corresponding change in container-selinux to allow container_runtime_t to dyntransition to container domains.

    For CRIU use cases outside of containers additional policies have been created which are also used by the new CRIU ZDTM test case selinux00. A new boolean exists which allows CRIU to use "setcon to dyntrans to any process type which is part of domain attribute". So with setsebool -P unconfined_dyntrans_all 1 it should be possible to use CRIU on Fedora just like before.

    After I included all those patches and policies into Podman's CI almost all checkpoint/restore related tests were successful. Except one test which was testing if it is possible to checkpoint and restore a container with established TCP connections. In this test case a container with Redis is started, a connection to Redis is opened and the container is checkpointed and restored. This was still failing in CI which was interesting as this seemed unrelated to SELinux.

    Trying to reproduce the test case locally I actually saw the following SELinux errors during restore:

    audit: SELINUX_ERR op=security_bounded_transition seresult=denied oldcontext=unconfined_u:system_r:container_runtime_t:s0 newcontext=system_u:system_r:container_t:s0:c218,c449

    This was unusual as it did not look like something that could be fixed with a policy.

    The reason my test case for checkpointing and restoring containers with established TCP connections failed was not the fact that it is testing established TCP connections, but the fact that it is a multithreaded process. Looking at the SELinux kernel code I found following comment in security/selinux/hooks.c:

    /* Only allow single threaded processes to change context */

    This line is unchanged since 2008 so it seemed unlikely that it would be possible to change SELinux in such a way that it would be possible to label each thread separately. My first attempt to solve this was to change the process label with setcon(3) before CRIU forks the first time. This kind of worked but at the same time created lots of SELinux denials (over 50), because during restore CRIU changes itself and the forks it creates into the process it wants to restore. So instead of changing the process label just before forking the first time I switched to setting the process label just before CRIU creates all threads (e86c2e9).

    Setting the context just before creating the threads resulted in only two SELinux denials. The first is about CRIU accessing the log file during restore which is not critical and the other denial happens when CRIU tries to influence the PID of the threads it wants to create via /proc/sys/kernel/ns_last_pid. As CRIU is now running in the SELinux context of the to be restored container and to avoid allowing the container to access all files which are labeled as sysctl_kernel_t, Fedora's selinux-policy contains a patch to label /proc/sys/kernel/ns_last_pid as sysctl_kernel_ns_last_pid_t.

    So with the latest CRIU and selinux-policy installed and the following addition to my local SELinux policy (kernel_rw_kernel_ns_lastpid_sysctl(container_domain)) I can now checkpoint and restore a Podman container (even multi-threaded) with the correct SELinux process context after a restore and no further SELinux denials blocking the checkpointing or restoring of the container. There are a few SELinux denials which are mainly related to not being able to write to the log files. Those denials, however, do not interfere with the checkpoint and restoring.

    For some time (two or three years) I was aware that CRIU was never verified to work correctly with SELinux but I always ignored it and I should have just fixed it a long time ago. Without the CRIU integration into Podman, however, I would have not been able to test my changes as I was able to do.

    I would like to thank Radostin for his feedback and ideas when I was stuck and his overview of the necessary CRIU changes, Dan for his help in adapting the container-selinux package to CRIU's needs and Lukas for the necessary changes to Fedora's selinux-policy package to make CRIU work with SELinux on Fedora. All these combined efforts made it possible to have the necessary policies and code changes ready to support container migration with Podman.

    Tagged as : criu podman selinux fedora
  2. CRIU configuration files

    One of the CRIU uses cases is container checkpointing and restoring, which also can be used to migrate containers. Therefore container runtimes are using CRIU to checkpoint all the processes in a container as well as to restore the processes in that container. Many container runtimes are layered, which means that the user facing layer (Podman, Docker, LXD) calls another layer to checkpoint (or restore) the container (runc, LXC) and this layer then calls CRIU.

    This leads to the problem that if CRIU introduces a new feature or option, all involved layers need code changes. Or if one of those layers made assumption about how to use CRIU, the user must live with that assumption, which may be wrong for the user's use case.

    To offer the possibility to change CRIU's behaviour through all these layers, be it that the container runtime has not implemented a certain CRIU feature or that the user needs a different CRIU behaviour, we started to discuss configuration files in 2016.

    Configuration files should be evaluated by CRIU and offer a third way to influence CRIU's behaviour. Setting options via CLI and RPC are the other two ways.

    At the Linux Plumbers Conference in 2016 during the Checkpoint/Restore micro-conference I gave a short introduction talk about how configuration files could look and everyone was nodding their head.

    In early 2017 Veronika Kabatova provided patches which were merged in CRIU's development branch criu-dev. At that point the development stalled a bit and only in early 2018 the discussion was picked up again. To have a feature merged into the master branch, which means it will be part of the next release, requires complete documentation (man-pages and wiki) and feature parity for CRIU's CLI and RPC mode. At this point it was documented but not supported in RPC mode.

    Adding configuration file support to CRIU's RPC mode was not a technical challenge, but if any recruiter ever asks me which project was the most difficult, I will talk about this. We were exchanging mails and patches for about half a year and it seems everybody had different expectations how everything should behave. I think at the end they pitied me and just merged my patches...

    CRIU 3.11 which was released on 2018-11-06 is the first release which includes support for configuration files and now (finally) I want to write about how it could be used.

    I am using the Simple_TCP_pair example from CRIU's wiki. First start the server:

    #️  ./tcp-howto 10000
    

    Then I am starting the client:

    # ./tcp-howto 127.0.0.1 10000
    Connecting to 127.0.0.1:10000
    PP 1 -> 1
    PP 2 -> 2
    PP 3 -> 3
    PP 4 -> 4
    

    Once client and server are running, let's try to checkpoint the client:

    # rm -f /etc/criu/default.conf
    # criu dump -t `pgrep -f 'tcp-howto 127.0.0.1 10000'`
    Error (criu/sk-inet.c:188): inet: Connected TCP socket, consider using --tcp-established option.
    

    CRIU tells us that it needs a special option to checkpoint processes with established TCP connections. No problem, but instead of changing the command-line, let's add it to the configuration file:

    # echo tcp-established > /etc/criu/default.conf
    # criu dump -t `pgrep -f 'tcp-howto 127.0.0.1 10000'`
    Error (criu/tty.c:1861): tty: Found dangling tty with sid 16693 pgid 16711 (pts) on peer fd 0.
    Task attached to shell terminal. Consider using --shell-job option. More details on http://criu.org/Simple_loop
    

    Alright, let's also add shell-job to the configuration file:

    # echo shell-job >> /etc/criu/default.conf
    # criu dump -t `pgrep -f 'tcp-howto 127.0.0.1 10000'` && echo OK
    OK
    

    That worked. Cool. Finally! Most CLI options can be used in the configuration file(s) and more detailed documentation can be found in the CRIU wiki.

    I want to thank Veronika for her initial implementation and everyone else helping, discussing and reviewing emails and patches to get this ready for release.

    Tagged as : criu podman
  3. Latest CRIU for CentOS COPR

    The version of CRIU which is included with CentOS is updated with every minor CentOS release (at least at the time of writing this) since 7.2, but once the minor CentOS release is available CRIU is not updated anymore until the next minor release. To make it easier to use the latest version of CRIU on CentOS I am now also rebuilding the latest version in COPR for CentOS: https://copr.fedorainfracloud.org/coprs/adrian/criu-el7/.

    To enable my CRIU COPR on CentOS following steps are necessary:

    • yum install yum-plugin-copr
    • yum copr enable adrian/criu-el7

    And then the latest version of CRIU can be installed using yum install criu.

    Tagged as : CentOS criu migration
  4. Optimizing live container migration in LXD

    After having worked on optimizing live container migration based on runc (pre-copy migration and post-copy migration) I tried to optimize container migration in LXD.

    After a few initial discussions with Christian I started with pre-copy migration. Container migration in LXD is based on CRIU, just as in runc and CRIU's pre-copy migration support is based on dirty page tracking support of Linux: SOFT-DIRTY PTEs.

    As LXD uses LXC for the actual container checkpointing and restoring I was curious if there was already pre-copy migration support in LXC. After figuring out the right command-line parameters it almost worked thanks to the great checkpoint and restore support implemented by Tycho some time ago.

    Now that I knew that it works in LXC I focused on getting pre-copy migration support into LXD. LXD supports container live migration using the move command: lxc move <container> <remote>:<container>
    This move command, however, did not use any optimization yet. It basically did:

    1. Initial sync of the filesystem
    2. Checkpoint container using CRIU
    3. Transfer container checkpoint
    4. Final sync of the filesystem
    5. Restart container on the remote system

    The downtime for the container in this scenario is between step 2 and step
    5 and depends on the used memory of the processes inside the container. The goal of pre-copy migration is to dump the memory of the container and transfer it to the remote destination while the container keeps on running and doing a final dump with only the memory pages that changed since the last pre-dump (more about process migration optimization theories).

    Back to LXD: At the end of the day I had a very rough (and very hardcoded) first pre-copy migration implementation ready and I kept working on it until it was ready to be submitted upstream. The pull request has already been merged upstream and now LXD supports pre-copy migration.

    As not all architecture/kernel/criu combinations support pre-copy migration it has to be turned on manually right now, but we already discussed adding pre-copy support detection to LXC. To tell LXD to use pre-copy migration, the parameter 'migration.incremental.memory' needs to be set to 'true'. Once that is done and if LXD is instructed to migrate a container the following will happen:

    • Initial sync of the filesystem
    • Start pre-copy checkpointing loop using CRIU
      • Check if maximum number pre-copy iterations has been reached
      • Check if threshold of unchanged memory pages has been reached
      • Transfer container checkpoint
      • Continue pre-copy checkpointing loop if neither of those conditions is true
    • Final container delta checkpoint using CRIU
    • Transfer final delta checkpoint
    • Final sync of the filesystem
    • Restart container on the remote system

    So instead of doing a single checkpoint and transferring it, there are now multiple pre-copy checkpoints and the container keeps on running during those transfers. The container is only suspended during the last delta checkpoint and the transfer of the last delta checkpoint. In many cases this reduces the container downtime during migration, but there is the possibility that pre-copy migration also increases the container downtime during migration. This depends (as always) on the workload.

    To control how many pre-copy iterations LXD does there are two additional variables:

    1. migration.incremental.memory.iterations (defaults to 10)
    2. migration.incremental.memory.goal (defaults to 70%)

    The first variable (iterations) is used to tell LXD how many pre-copy iterations it should do before doing the final dump and the second variable (goal) is used to tell LXD the percentage of pre-copied memory pages that should not change between pre-copy iterations before doing the final dump.

    So LXD, in the default configuration, does either 10 pre-copy iterations before doing the final migration or the final migration is triggered when at least 70% of the memory pages have been transferred by the last pre-copy iteration.

    Now that this pull request is merged and if pre-copy migration is enabled a lxc move <container> <remote>:<container> should live migrate the container with a reduced downtime.

    I want to thank Christian for the collaboration on getting CRIU's pre-copy support into LXD, Tycho for his work preparing LXC and LXD to support migration so nicely and the developers of p.haul for the ideas how to implement pre-copy container migration. Next step: lazy migration.

    Tagged as : criu migration pre-copy
  5. Lazy Migration in CRIU's master branch

    For almost two years Mike Rapoport and I have been working on lazy process migration. Lazy process migration (or post-copy migration) is a technique to decrease the process or container downtime during the live migration. I described the basic functionality in the following previous articles:

    Those articles are not 100% correct anymore as we changed some of the parameters during the last two years, but the concepts stayed the same.

    Mike and I started about two years ago to work on it and the latest CRIU release (3.5) includes the possibility to use lazy migration. Now that the post-copy migration feature has been merged from the criu-dev branch to the master branch it is part of the normal CRIU releases.

    With CRIU's 3.5 release lazy migration can be used on any kernel which supports userfaultfd. I already updated the CRIU packages in Fedora to 3.5 so that lazy process migration can be used just by installing the latest CRIU packages with dnf (still in the testing repository right now).

    More information about container live migration in our upcoming Open Source Summit Europe talk: Container Migration Around The World.

    My pull request to support lazy migration in runC was also recently merged, so that it is now possible to migrate containers using pre-copy migration and post-copy migration. It can also be combined.

    Another interesting change about CRIU is that it started as x86_64 only and now it is also available on aarch64, ppc64le and s390x. The support to run on s390x has just been added with the previous 3.4 release and starting with Fedora 27 the necessary kernel configuration options are also active on s390x in addition to the other supported architectures.

    Tagged as : criu fedora
  6. Linux Plumbers Conference 2016

    It is a bit late but I still wanted to share my presentations from this year's Linux Plumbers Conference:

    On my way back home I had to stay one night in Albuquerque and it looks like the hotel needs to upgrade its TV system. It is still running Fedora 10 which is EOL since 2009-12-18:

    Still Fedora 10

    Tagged as : criu
  7. Influence which PID will be the next

    To restore a checkpointed process with CRIU the process ID (PID) has to be the same it was during checkpointing. CRIU uses /proc/sys/kernel/ns_last_pid to set the PID to one lower as the process to be restored just before fork()-ing into the new process.

    The same interface (/proc/sys/kernel/ns_last_pid) can also be used from the command-line to influence which PID the kernel will use for the next process.

    # cat /proc/sys/kernel/ns_last_pid
    1626
    # echo -n 9999 > /proc/sys/kernel/ns_last_pid
    # cat /proc/sys/kernel/ns_last_pid
    10000
    

    Writing '9999' (without a 'new line') to /proc/sys/kernel/ns_last_pid tells the kernel, that the next PID should be '10000'. This only works if between after writing to /proc/sys/kernel/ns_last_pid and forking the new process no other process has been created. So it is not possible to guarantee which PID the new process will get but it can be influenced.

    There is also a posting which describes how to do the same with C: How to set PID using ns_last_pid

    Tagged as : criu fedora
  8. Combining pre-copy and post-copy migration

    In my last post about CRIU in May 2016 I mentioned lazy memory transfer to decrease process downtime during migration. Since May 2016 Mike Rapoport's patches for remote lazy process migration have been merged into CRIU's criu-dev branch as well as my patches to combine pre-copy and post-copy migration.

    Using pre-copy (criu pre-dump) it has "always" been possible to dump the memory of a process using soft-dirty-tracking. criu pre-dump can be run multiple times and each time only the changed memory pages will be written to the checkpoint directory.

    Depending on the processes to be migrated and how fast they are changing their memory, this can still lead to a situation where the final dump can be rather large which can mean a longer downtime during migration than desired. This is why we started to work on post-copy migration (also know as lazy migration). There are, however, situations where post-copy migration can also increase the process downtime during migration instead of decreasing it.

    The latest changes regarding post-copy migration in the criu-dev branch offer the possibility to combine pre-copy and post-copy migration. The memory pages of the process are pre-dumped using soft-dirty-tracking and transferred to the destination while the process on the source machine keeps on running. Once the process is actually migrated to the destination system everything besides the memory pages is transferred to the destination system. Excluding the memory pages (as the remaining memory pages will be migrated lazily) usually only a few hundred kilobytes have to be transferred which reduces the process downtime during migration significantly.

    Using criu with pre-copy and post-copy could look like this:

    Source system:

    # criu pre-dump -D /tmp/cp/1 -t PID
    # rsync -a /tmp/cp destination:/tmp
    # criu dump -D /tmp/cp/2 -t PID --port 27 --lazy-pages   
     --prev-images-dir ../1/ --track-mem
    

    The first criu command dumps the memory of the process PID and resets the soft-dirty memory tracking. The initial dump is then transferred using rsync to the destination system. During that time the process PID keeps on running. The last criu command starts the lazy page mode which dumps everything besides memory pages which can be transferred lazily and waits for connections over the network on port 27. Only pages which have changed since the last pre-dump are considered for the lazy restore. At this point the process is no longer running and the process downtime starts.

    Destination system:

    # rsync -a source:/tmp/cp /tmp/
    # criu lazy-pages --page-server --address source --port 27   
     -D /tmp/cp/2 &
    # criu restore --lazy-pages -D /tmp/cp/2
    

    Once criu is waiting on port 27 on the source system the remaining checkpoint images can be transferred from the source system to the destination system (using rsync in this case). Now criu can be started in lazy-pages mode connecting to the page server on port 27 on the source system. This is the part we usually call the UFFD daemon. The last step is the actual restore (criu restore).

    The following diagrams try to visualize what happens during the last step: criu restore.

    step1

    It all starts with criu restore (on the right). criu does its magic to restore the process and copies the memory pages from criu pre-dump to the process and marks lazy pages as being handled by userfaultfd. Once everything is restored criu jumps into the restored process and the restored process continues to run where it was when checkpointed. Once the process accesses a userfaultfd marked memory address the process will be paused until a memory page (hopefully the correct one) is copied to that address.

    step2

    The part that we call the UFFD daemon or criu lazy-pages listens on the userfault file descriptor for a message and as soon as a valid UFFD request arrives it requests that page from the source system via TCP where criu is still running in page-server mode. If the page-server finds that memory page it transfers the actual page back to the destination system to the UFFD daemon which injects the page into the kernel using the same userfault file descriptor it previously got the page request from. Now that the page which initially triggered the page-fault or in our case userfault is at its place the restored process continues to run until another missing page is accessed and the whole procedure starts again.

    To be able to remove the UFFD daemon and the page-server at some point we currently push all unused pages into the restored process if there are no further userfaultfd requests for 5 seconds.

    The whole procedure still has a lot of possibilities for optimization but now that we finally can combine pre-copy and post-copy memory migration we are a lot closer to decreasing process downtime during migration.

    The next steps are to get support for pre-copy and post-copy into p.haul (Process Hauler) and into different container runtimes which already support migration via criu.

    My other recently posted criu related articles:

    Tagged as : criu fedora
  9. Lazy Process Migration

    Process Migration

    Using CRIU it is possible to checkpoint/save/dump the state of a process into a set of files which can then be used to restore/restart the process at a later point in time. If the files from the checkpoint operation are transferred from one system to another and then used to restore the process, this is probably the simplest form of process migration.

    Source system:

    • criu dump -D /checkpoint/destination -t PID
    • rsync -a /checkpoint/destination destination.system:/checkpoint/destination

    Destination system:

    • criu restore -D /checkpoint/destination

    For large processes the migration duration can be rather long. For a process using 24GB this can lead to migration duration longer than 280 seconds. The limiting factor in most cases is the interconnect between the systems involved in the process migration.

    Optimization: Pre-Copy

    One existing solution to decrease process downtime during migration is pre-copy. In one or multiple runs the memory of the process is copied from the source to the destination system. With every run only memory pages which have change since the last run have to be transferred. This can lead to situations where the process downtime during migration can be dramatically decreased.

    This depends on the type of application which is migrated and especially how often/fast the memory content is changed. In extreme cases it was possible to decrease process downtime during migration for a 24GB process from 280 seconds to 8 seconds with the help of pre-copy.

    This approach is basically the same if migrating single processes (or process groups) or virtual machines.

    It Always Depends On...

    Unfortunately pre-copy optimization can also lead to situations where the so called optimized case with pre-copy can require more time than the unoptimized case:

    In the example above a process has been migrated during three stages of its lifetime and there are situations (state: Calculation) where pre-copy has enormous advantages (14 seconds with pre-copy and 51 seconds without pre-copy) but there are also situations (state: Initialization) where the pre-copy optimization increases the process downtime during migration (40 seconds with pre-copy and 27 seconds without pre-copy). It depends on the memory change rate.

    Optimization: Post-Copy

    Another approach to reduce the process downtime during migration is post-copy. The required memory pages are not dumped and transferred before restoring the process but on demand. Each time a missing memory page is accessed the migrated process is halted until the required memory pages has been transferred from the source system to the destination system:

    Thanks to userfaultfd this approach (or optimization) can be now integrated into CRIU. With the help of userfaultfd it is possible to mark memory pages to be handled by userfaultfd. If such a memory page is accessed, the process is halted until the requested page is provided. The listener for the userfaultfd requests is running in user-space and listening on a file descriptor. The same approach has already been implemented for QEMU.

    Enough Theory

    With all the background information on why and how the initial code to restore processes with userfaultfd support has been merged into the CRIU development branch: criu-dev. This initial implementation of lazy-pages support does not yet support lazy process migration between two hosts, but with the upstream merged patches it is at least possible to checkpoint a process and to restore the process using userfaultfd. A lazy restore consists of two parts. The usual 'criu restore' part and an additional, what we call uffd daemon, 'criu lazy-pages' part. To better demonstrate the advantages of a lazy restore there are patches to enhance crit (CRiu Image Tool) to remove pages which can be restored with userfaultfd from a checkpoint directory. Using a test case which allocates about 200MB of memory (and which writes one byte in each page over and over) requires after being dumped about 200MB. Using the mentioned crit enhancement make-lazy reduces the size of the checkpoint down to 116KB:

    $ crit make-lazy /tmp/checkpoint/ /tmp/lazy-checkpoint
    $ du -hs /tmp/checkpoint/ /tmp/lazy-checkpoint
         201M       /tmp/checkpoint
         116K       /tmp/lazy-checkpoint
    

    With this the data which actually has to be transferred during process downtime is drastically reduced and the required memory pages are inserted in the restored process on demand using userfaultfd. Restoring the checkpointed process using lazy-restore would look something like this:

    First the uffd daemon:

    $ criu lazy-pages -D /tmp/checkpoint --address /tmp/userfault.socket
    

    And then the actual restore:

    $ criu restore -D /tmp/lazy-checkpoint --lazy-pages --address /tmp/userfault.socket
    

    The socket specified with --address is used to exchange information about the restored process required by the uffd daemon. Once criu restore has done all its magic to restore the process except restoring the lazy memory pages, the process to be restored is actually started and runs until the first userfaultfd handled memory page is accessed. At that point the process hangs and the uffd daemon gets a message to provide the required memory pages. Once the uffd daemon provides the requested memory page, the restored process continues to run until the next page is requested. As potentially not all memory pages are requested, as they might not get accessed for some time, the uffd daemon starts to transfer unrequested memory pages into the restored process so that the uffd daemon can shut down after a certain time.

    Tagged as : criu fedora
  10. Process Migration coming to Fedora 19 (probably)

    With the recent approved review of the package crtools in Fedora I have made a feature proposal for checkpoint/restore.

    To test checkpoint/restore on Fedora you need to run the current development version of Fedora and install crtools using yum (yum install crtools). Until it is decided if it actually will be a Fedora 19 feature and the necessary changes in the Fedora kernel packages have been implemented it is necessary to install a kernel which is not in the repository. I have built a kernel in Fedora's buildsystem which enables the following config options: CHECKPOINT_RESTORE, NAMESPACES, EXPERT.

    A kernel with these changes enabled is available from koji as a scratch build: http://koji.fedoraproject.org/koji/taskinfo?taskID=4899525

    After installing this kernel I am able to migrate a process from one Fedora system to another. For my test case I am migrating a UDP ping pong (udpp.c) program from one system to another while communicating with a third system.

    udpp

    udpp is running in server mode on 129.143.116.10 and on 134.108.34.90 udpp is started in client mode. After a short time I am migrating, with the help of crtools, the udpp client to 85.214.67.247. The following is part of the output on the udpp server:

    -->

    Received ping packet from 134.108.34.90:38374
    Data: This is ping packet 6

    Sending pong packet 6
    <--
    -->

    Received ping packet from 134.108.34.90:38374
    Data: This is ping packet 7

    Sending pong packet 7
    <--
    -->

    Received ping packet from 85.214.67.247:38374
    Data: This is ping packet 8

    Sending pong packet 8
    <--
    -->

    Received ping packet from 85.214.67.247:38374
    Data: This is ping packet 9

    Sending pong packet 9
    <--

    So with only little changes to the kernel configuration it is possible to migrate a process by checkpointing and restoring a process with the help of crtools.

    Tagged as : criu

Page 1 / 1