{"id":12,"date":"2016-05-04T17:39:41","date_gmt":"2017-03-30T16:16:25","guid":{"rendered":"https:\/\/lisas.de\/~adrian\/?p=1183"},"modified":"2018-01-25T20:40:59","modified_gmt":"2018-01-25T18:40:59","slug":"lazy-process-migration","status":"publish","type":"post","link":"https:\/\/lisas.de\/luges\/index.php\/2016\/05\/04\/lazy-process-migration\/","title":{"rendered":"Lazy Process Migration"},"content":{"rendered":"<h3>Process Migration<\/h3>\n<p>Using <a href=\"https:\/\/criu.org\/\">CRIU<\/a> it is possible to checkpoint\/save\/dump the state of a process into a set of files which can then be used to restore\/restart the process at a later point in time. If the files from the checkpoint operation are transferred from one system to another and then used to restore the process, this is probably the simplest form of process migration.<\/p>\n<p>Source system:<\/p>\n<ul>\n<li><tt>criu dump -D \/checkpoint\/destination -t PID<\/tt><\/li>\n<li><tt>rsync -a \/checkpoint\/destination destination.system:\/checkpoint\/destination<\/tt><\/li>\n<\/ul>\n<p>Destination system:<\/p>\n<ul>\n<li><tt>criu restore -D \/checkpoint\/destination<\/tt><\/li>\n<\/ul>\n<p>For large processes the migration duration can be rather long. For a process using 24GB this can lead to migration duration longer than 280 seconds. The limiting factor in most cases is the interconnect between the systems involved in the process migration.<\/p>\n<h3>Optimization: Pre-Copy<\/h3>\n<p>One existing solution to decrease process downtime during migration is pre-copy. In one or multiple runs the memory of the process is copied from the source to the destination system. With every run only memory pages which have change since the last run have to be transferred. This can lead to situations where the process downtime during migration can be dramatically decreased.<\/p>\n<p>This depends on the type of application which is migrated and especially how often\/fast the memory content is changed. In extreme cases it was possible to decrease process downtime during migration for a 24GB process from 280 seconds to 8 seconds with the help of pre-copy.<\/p>\n<p>This approach is basically the same if migrating single processes (or process groups) or virtual machines.<\/p>\n<h3>It Always Depends On&#8230;<\/h3>\n<p>Unfortunately pre-copy optimization can also lead to situations where the so called optimized case with pre-copy can require more time than the unoptimized case:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter\" src=\"https:\/\/lisas.de\/~adrian\/migration-time.png\" width=\"493\" height=\"397\" \/><\/p>\n<p>In the example above a process has been migrated during three stages of its lifetime and there are situations (state: <em>Calculation<\/em>) where pre-copy has enormous advantages (14 seconds with pre-copy and 51 seconds without pre-copy) but there are also situations (state: <em>Initialization<\/em>) where the pre-copy optimization increases the process downtime during migration (40 seconds with pre-copy and 27 seconds without pre-copy). It depends on the memory change rate.<\/p>\n<h3>Optimization: Post-Copy<\/h3>\n<p>Another approach to reduce the process downtime during migration is post-copy. The required memory pages are not dumped and transferred before restoring the process but on demand. Each time a missing memory page is accessed the migrated process is halted until the required memory pages has been transferred from the source system to the destination system:<\/p>\n<p><img decoding=\"async\" class=\"aligncenter\" src=\"https:\/\/lisas.de\/~adrian\/memory-transfer-after-migration.png\" width=\"90%\" \/><\/p>\n<p>Thanks to <em>userfaultfd<\/em> this approach (or optimization) can be now integrated into <em>CRIU<\/em>. With the help of <em>userfaultfd<\/em> it is possible to mark memory pages to be handled by <em>userfaultfd<\/em>. If such a memory page is accessed, the process is halted until the requested page is provided. The listener for the <em>userfaultfd<\/em> requests is running in user-space and listening on a file descriptor. The same approach has already been implemented for <em>QEMU<\/em>.<\/p>\n<h3>Enough Theory<\/h3>\n<p>With all the background information on why and how the initial code to restore processes with <em>userfaultfd<\/em> support has been merged into the <em>CRIU<\/em> development branch: <a href=\"https:\/\/github.com\/xemul\/criu\/tree\/criu-dev\">criu-dev<\/a>. This initial implementation of <a href=\"https:\/\/criu.org\/Userfaultfd\">lazy-pages<\/a> support does not yet support lazy process migration between two hosts, but with the upstream merged patches it is at least possible to checkpoint a process and to restore the process using <em>userfaultfd<\/em>. A lazy restore consists of two parts. The usual &#8216;<tt>criu restore<\/tt>&#8216; part and an additional, what we call <em>uffd daemon<\/em>, &#8216;<tt>criu lazy-pages<\/tt>&#8216; part. To better demonstrate the advantages of a lazy restore there are patches to enhance <a href=\"https:\/\/criu.org\/CRIT\">crit<\/a> (CRiu Image Tool) to remove pages which can be restored with <em>userfaultfd<\/em> from a checkpoint directory. Using a test case which allocates about 200MB of memory (and which writes one byte in each page over and over) requires after being dumped about 200MB. Using the mentioned <em>crit<\/em> enhancement <em>make-lazy<\/em> reduces the size of the checkpoint down to 116KB:<\/p>\n<pre>$ crit make-lazy \/tmp\/checkpoint\/ \/tmp\/lazy-checkpoint\n$ du -hs \/tmp\/checkpoint\/ \/tmp\/lazy-checkpoint\n     201M       \/tmp\/checkpoint\n     116K       \/tmp\/lazy-checkpoint\n<\/pre>\n<p>With this the data which actually has to be transferred during process downtime is drastically reduced and the required memory pages are inserted in the restored process on demand using <em>userfaultfd<\/em>. Restoring the checkpointed process using <em>lazy-restore<\/em> would look something like this:<\/p>\n<p>First the <em>uffd daemon:<\/em><\/p>\n<pre>$ criu lazy-pages -D \/tmp\/checkpoint \n--address \/tmp\/userfault.socket<\/pre>\n<p>And then the actual restore:<\/p>\n<pre>$ criu restore -D \/tmp\/lazy-checkpoint \n--lazy-pages --address \/tmp\/userfault.socket<\/pre>\n<p>The socket specified with <tt>--address<\/tt> is used to exchange information about the restored process required by the <em>uffd daemon<\/em>. Once <tt>criu restore<\/tt> has done all its magic to restore the process except restoring the lazy memory pages, the process to be restored is actually started and runs until the first <em>userfaultfd<\/em> handled memory page is accessed. At that point the process hangs and the <em>uffd daemon<\/em> gets a message to provide the required memory pages. Once the <em>uffd daemon<\/em> provides the requested memory page, the restored process continues to run until the next page is requested. As potentially not all memory pages are requested, as they might not get accessed for some time, the <em>uffd daemon<\/em> starts to transfer unrequested memory pages into the restored process so that the <em>uffd daemon<\/em> can shut down after a certain time.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Process Migration Using CRIU it is possible to checkpoint\/save\/dump the state of a process into a set of files which can then be used to restore\/restart the process at a later point in time. If the files from the checkpoint operation are transferred from one system to another and then used to restore the process, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2,1],"tags":[],"class_list":["post-12","post","type-post","status-publish","format-standard","hentry","category-luges","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/lisas.de\/luges\/index.php\/wp-json\/wp\/v2\/posts\/12","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lisas.de\/luges\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lisas.de\/luges\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lisas.de\/luges\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/lisas.de\/luges\/index.php\/wp-json\/wp\/v2\/comments?post=12"}],"version-history":[{"count":1,"href":"https:\/\/lisas.de\/luges\/index.php\/wp-json\/wp\/v2\/posts\/12\/revisions"}],"predecessor-version":[{"id":22,"href":"https:\/\/lisas.de\/luges\/index.php\/wp-json\/wp\/v2\/posts\/12\/revisions\/22"}],"wp:attachment":[{"href":"https:\/\/lisas.de\/luges\/index.php\/wp-json\/wp\/v2\/media?parent=12"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lisas.de\/luges\/index.php\/wp-json\/wp\/v2\/categories?post=12"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lisas.de\/luges\/index.php\/wp-json\/wp\/v2\/tags?post=12"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}