In my last post about CRIU in May
2016 I mentioned lazy memory transfer to decrease process downtime
during migration. Since May 2016 Mike Rapoport's patches for remote lazy
process migration have been merged into CRIU's
criu-dev branch as well
as my patches to combine pre-copy and post-copy migration.
Using pre-copy (criu pre-dump) it has "always" been possible to dump
the memory of a process using
criu pre-dump can be run multiple times and each time only the changed
memory pages will be written to the checkpoint directory.
Depending on the processes to be migrated and how fast they are changing
their memory, this can still lead to a situation where the final dump
can be rather large which can mean a longer downtime during migration
than desired. This is why we started to work on post-copy migration
(also know as lazy migration). There are, however, situations where
post-copy migration can also increase the process downtime during
migration instead of decreasing it.
The latest changes regarding post-copy migration in the
criu-dev branch offer the
possibility to combine pre-copy and post-copy migration. The memory
pages of the process are pre-dumped using
and transferred to the destination while the process on the source
machine keeps on running. Once the process is actually migrated to the
destination system everything besides the memory pages is transferred to
the destination system. Excluding the memory pages (as the remaining
memory pages will be migrated lazily) usually only a few hundred
kilobytes have to be transferred which reduces the process downtime
during migration significantly.
Using criu with pre-copy and post-copy could look like this:
# criu pre-dump -D /tmp/cp/1 -t PID
# rsync -a /tmp/cp destination:/tmp
# criu dump -D /tmp/cp/2 -t PID --port 27 --lazy-pages
--prev-images-dir ../1/ --track-mem
The first criu command dumps the memory of the process PID and
resets the soft-dirty memory tracking. The initial dump is then
transferred using rsync to the destination system. During that time
the process PID keeps on running. The last criu command starts the
lazy page mode which dumps everything besides memory pages which can
be transferred lazily and waits for connections over the network on port
27. Only pages which have changed since the last pre-dump are
considered for the lazy restore. At this point the process is no longer
running and the process downtime starts.
# rsync -a source:/tmp/cp /tmp/
# criu lazy-pages --page-server --address source --port 27
-D /tmp/cp/2 &
# criu restore --lazy-pages -D /tmp/cp/2
Once criu is waiting on port 27 on the source system the remaining
checkpoint images can be transferred from the source system to the
destination system (using rsync in this case). Now criu can be
started in lazy-pages mode connecting to the page server on port 27 on
the source system. This is the part we usually call the UFFD daemon. The
last step is the actual restore (criu restore).
The following diagrams try to visualize what happens during the last
step: criu restore.
It all starts with criu restore (on the right). criu does its magic
to restore the process and copies the memory pages from criu pre-dump
to the process and marks lazy pages as being handled by userfaultfd.
Once everything is restored criu jumps into the restored process and
the restored process continues to run where it was when checkpointed.
Once the process accesses a userfaultfd marked memory address the
process will be paused until a memory page (hopefully the correct one)
is copied to that address.
The part that we call the UFFD daemon or criu lazy-pages listens on
the userfault file descriptor for a message and as soon as a valid
UFFD request arrives it requests that page from the source system via
TCP where criu is still running in page-server mode. If the
page-server finds that memory page it transfers the actual page back
to the destination system to the UFFD daemon which injects the page
into the kernel using the same userfault file descriptor it previously
got the page request from. Now that the page which initially triggered
the page-fault or in our case userfault is at its place the restored
process continues to run until another missing page is accessed and the
whole procedure starts again.
To be able to remove the UFFD daemon and the page-server at some
point we currently push all unused pages into the restored process if
there are no further userfaultfd requests for 5 seconds.
The whole procedure still has a lot of possibilities for optimization
but now that we finally can combine pre-copy and post-copy memory
migration we are a lot closer to decreasing process downtime during
The next steps are to get support for pre-copy and post-copy into
p.haul (Process Hauler) and into
different container runtimes which already support migration via criu.
My other recently posted criu related articles: