{"id":7,"date":"2016-10-14T15:19:57","date_gmt":"2017-03-30T16:16:25","guid":{"rendered":"https:\/\/lisas.de\/~adrian\/?p=1253"},"modified":"2018-10-10T15:41:01","modified_gmt":"2018-10-10T13:41:01","slug":"combining-pre-copy-and-post-copy-migration","status":"publish","type":"post","link":"https:\/\/lisas.de\/luges\/index.php\/2016\/10\/14\/combining-pre-copy-and-post-copy-migration\/","title":{"rendered":"Combining pre-copy and post-copy migration"},"content":{"rendered":"<p>In my <a href=\"https:\/\/lisas.de\/~adrian\/?p=1183\">last post<\/a> about <em>CRIU<\/em> in May 2016 I mentioned lazy memory transfer to decrease process downtime during migration. Since May 2016 Mike Rapoport&#8217;s patches for remote lazy process migration have been merged into <em>CRIU<\/em>&#8216;s <a href=\"https:\/\/github.com\/xemul\/criu\/tree\/criu-dev\">criu-dev<\/a> branch as well as my patches to combine <em>pre-copy<\/em> and <em>post-copy<\/em> migration.<\/p>\n<p>Using pre-copy (<em>criu pre-dump<\/em>) it has &#8220;always&#8221; been possible to dump the memory of a process using <a href=\"https:\/\/www.kernel.org\/doc\/Documentation\/vm\/soft-dirty.txt\">soft-dirty-tracking<\/a>. <em>criu pre-dump<\/em> can be run multiple times and each time only the changed memory pages will be written to the checkpoint directory.<\/p>\n<p>Depending on the processes to be migrated and how fast they are changing their memory, this can still lead to a situation where the final dump can be rather large which can mean a longer downtime during migration than desired. This is why we started to work on <em>post-copy<\/em> migration (also know as <em>lazy<\/em> migration). There are, however, situations where <em>post-copy<\/em> migration can also increase the process downtime during migration instead of decreasing it.<\/p>\n<p>The latest changes regarding <em>post-copy<\/em> migration in the <a href=\"https:\/\/github.com\/xemul\/criu\/tree\/criu-dev\">criu-dev<\/a> branch offer the possibility to combine <em>pre-copy<\/em> and <em>post-copy<\/em> migration. The memory pages of the process are pre-dumped using <a href=\"https:\/\/www.kernel.org\/doc\/Documentation\/vm\/soft-dirty.txt\">soft-dirty-tracking<\/a> and transferred to the destination while the process on the source machine keeps on running. Once the process is actually migrated to the destination system everything besides the memory pages is transferred to the destination system. Excluding the memory pages (as the remaining memory pages will be migrated <em>lazily<\/em>) usually only a few hundred kilobytes have to be transferred which reduces the process downtime during migration significantly.<\/p>\n<p>Using <em>criu<\/em> with <em>pre-copy<\/em> and <em>post-copy<\/em> could look like this:<\/p>\n<p>Source system:<\/p>\n<pre># criu pre-dump -D \/tmp\/cp\/1 -t PID\n# rsync -a \/tmp\/cp destination:\/tmp\n# criu dump -D \/tmp\/cp\/2 -t PID --port 27 --lazy-pages \n  --prev-images-dir ..\/1\/ --track-mem<\/pre>\n<p>The first <em>criu<\/em> command dumps the memory of the process <em>PID<\/em> and resets the soft-dirty memory tracking. The initial dump is then transferred using <em>rsync<\/em> to the destination system. During that time the process <em>PID<\/em> keeps on running. The last <em>criu<\/em> command starts the <em>lazy page<\/em> mode which dumps everything besides memory pages which can be transferred lazily and waits for connections over the network on port 27. Only pages which have changed since the last <em>pre-dump<\/em> are considered for the lazy restore. At this point the process is no longer running and the process downtime starts.<\/p>\n<p>Destination system:<\/p>\n<pre># rsync -a source:\/tmp\/cp \/tmp\/\n# criu lazy-pages --page-server --address source --port 27 \n  -D \/tmp\/cp\/2 &amp;\n# criu restore --lazy-pages -D \/tmp\/cp\/2<\/pre>\n<p>Once <em>criu<\/em> is waiting on port 27 on the source system the remaining checkpoint images can be transferred from the source system to the destination system (using <em>rsync<\/em> in this case). Now <em>criu<\/em> can be started in <em>lazy-pages<\/em> mode connecting to the page server on port 27 on the source system. This is the part we usually call the UFFD daemon. The last step is the actual restore (<em>criu restore<\/em>).<\/p>\n<p>The following diagrams try to visualize what happens during the last step: <em>criu restore<\/em>.<\/p>\n<p style=\"text-align: center\"><a href=\"https:\/\/lisas.de\/~adrian\/images\/combined-restore-step-1.png\"><img decoding=\"async\" title=\"UFFD\" src=\"https:\/\/lisas.de\/~adrian\/images\/tn\/combined-restore-step-1.png\" alt=\"step1\" \/><\/a><\/p>\n<p>It all starts with <em>criu restore<\/em> (on the right). <em>criu<\/em> does its magic to restore the process and copies the memory pages from <em>criu pre-dump<\/em> to the process and marks <em>lazy pages<\/em> as being handled by <em>userfaultfd<\/em>. Once everything is restored <em>criu<\/em> jumps into the restored process and the restored process continues to run where it was when checkpointed. Once the process accesses a <em>userfaultfd<\/em> marked memory address the process will be paused until a memory page (hopefully the correct one) is copied to that address.<\/p>\n<p style=\"text-align: center\"><a href=\"https:\/\/lisas.de\/~adrian\/images\/combined-restore-step-2.png\"><img decoding=\"async\" title=\"UFFD\" src=\"https:\/\/lisas.de\/~adrian\/images\/tn\/combined-restore-step-2.png\" alt=\"step2\" \/><\/a><\/p>\n<p>The part that we call the <em>UFFD daemon<\/em> or <em>criu lazy-pages<\/em> listens on the userfault file descriptor for a message and as soon as a valid <em>UFFD<\/em> request arrives it requests that page from the source system via TCP where <em>criu<\/em> is still running in <em>page-server<\/em> mode. If the <em>page-server<\/em> finds that memory page it transfers the actual page back to the destination system to the <em>UFFD daemon<\/em> which injects the page into the kernel using the same userfault file descriptor it previously got the page request from. Now that the page which initially triggered the page-fault or in our case <em>userfault<\/em> is at its place the restored process continues to run until another missing page is accessed and the whole procedure starts again.<\/p>\n<p>To be able to remove the <em>UFFD daemon<\/em> and the <em>page-server<\/em> at some point we currently push all unused pages into the restored process if there are no further <em>userfaultfd<\/em> requests for 5 seconds.<\/p>\n<p>The whole procedure still has a lot of possibilities for optimization but now that we finally can combine <em>pre-copy<\/em> and <em>post-copy<\/em> memory migration we are a lot closer to decreasing process downtime during migration.<\/p>\n<p>The next steps are to get support for <em>pre-copy<\/em> and <em>post-copy<\/em> into <a href=\"https:\/\/github.com\/xemul\/p.haul\">p.haul<\/a> (Process Hauler) and into different container runtimes which already support migration via <em>criu<\/em>.<\/p>\n<p>My other recently posted <em>criu<\/em> related articles:<\/p>\n<ul>\n<li><a href=\"https:\/\/lisas.de\/~adrian\/?p=1183\">Lazy Process Migration<\/a><\/li>\n<li><a href=\"https:\/\/access.redhat.com\/articles\/2455211\">CRIU &#8211; Checkpoint\/Restore in user space<\/a><\/li>\n<li><a href=\"http:\/\/rhelblog.redhat.com\/2016\/09\/26\/from-checkpointrestore-to-container-migration\/\">From Checkpoint\/Restore to Container Migration<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>In my last post about CRIU in May 2016 I mentioned lazy memory transfer to decrease process downtime during migration. Since May 2016 Mike Rapoport&#8217;s patches for remote lazy process migration have been merged into CRIU&#8216;s criu-dev branch as well as my patches to combine pre-copy and post-copy migration. Using pre-copy (criu pre-dump) it has [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2,1],"tags":[],"class_list":["post-7","post","type-post","status-publish","format-standard","hentry","category-luges","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/lisas.de\/luges\/index.php\/wp-json\/wp\/v2\/posts\/7","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lisas.de\/luges\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lisas.de\/luges\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lisas.de\/luges\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/lisas.de\/luges\/index.php\/wp-json\/wp\/v2\/comments?post=7"}],"version-history":[{"count":2,"href":"https:\/\/lisas.de\/luges\/index.php\/wp-json\/wp\/v2\/posts\/7\/revisions"}],"predecessor-version":[{"id":525,"href":"https:\/\/lisas.de\/luges\/index.php\/wp-json\/wp\/v2\/posts\/7\/revisions\/525"}],"wp:attachment":[{"href":"https:\/\/lisas.de\/luges\/index.php\/wp-json\/wp\/v2\/media?parent=7"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lisas.de\/luges\/index.php\/wp-json\/wp\/v2\/categories?post=7"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lisas.de\/luges\/index.php\/wp-json\/wp\/v2\/tags?post=7"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}