1. Protocol Changes In Fedora's MirrorManager

    There have been two protocol related issues with MirrorManager open for some time:

    Both issues have been resolved. The first issue, to drop FTP URLs from the metalinks, has been resolved in multiple steps. The first step was to block FTP URLs from being added to Fedora's MirrorManager (Optionally exclude certain protocols from MM, New MirrorManager2 features) and the second step, to remove all remaining FTP URLs from Fedora's MirrorManager, was performed during the last few days and weeks. Using MirrorManager's mirrorlist interface (which is not used very often) only returned FTP if the mirror had no HTTP(S) URLs. So it was already rather unusual to be redirected to a FTP mirror. Using MirrorManager's metalink interface returned all possible URLs for a host. With the removal of all FTP URLs from MirrorManager's database no user should see FTP URLs any more and the problems some clients encoutered (see Drop ftp:// urls from metalinks) should be 'resolved'.

    The other issue (Add a way to specify you want only https urls from metalink) has also been solved by adding a protocol option to the mirrorlist and metalink back-end. The new MirrorManager release (0.7.2) which includes these changes is already running on the staging instance and the result can be seen here:

    To have more HTTPS based mirrors in our database we scanned all existing public mirrors to see if they also provide HTTPS. With this the number of HTTPS URLs was increased from 24 to over 120.

    The option to select which protocol the mirrorlist/metalink mirrors should provide is not yet running on the production instance.

    Tagged as : fedora mirrormanager
  2. Lazy Process Migration

    Process Migration

    Using CRIU it is possible to checkpoint/save/dump the state of a process into a set of files which can then be used to restore/restart the process at a later point in time. If the files from the checkpoint operation are transferred from one system to another and then used to restore the process, this is probably the simplest form of process migration.

    Source system:

    • criu dump -D /checkpoint/destination -t PID
    • rsync -a /checkpoint/destination destination.system:/checkpoint/destination

    Destination system:

    • criu restore -D /checkpoint/destination

    For large processes the migration duration can be rather long. For a process using 24GB this can lead to migration duration longer than 280 seconds. The limiting factor in most cases is the interconnect between the systems involved in the process migration.

    Optimization: Pre-Copy

    One existing solution to decrease process downtime during migration is pre-copy. In one or multiple runs the memory of the process is copied from the source to the destination system. With every run only memory pages which have change since the last run have to be transferred. This can lead to situations where the process downtime during migration can be dramatically decreased.

    This depends on the type of application which is migrated and especially how often/fast the memory content is changed. In extreme cases it was possible to decrease process downtime during migration for a 24GB process from 280 seconds to 8 seconds with the help of pre-copy.

    This approach is basically the same if migrating single processes (or process groups) or virtual machines.

    It Always Depends On...

    Unfortunately pre-copy optimization can also lead to situations where the so called optimized case with pre-copy can require more time than the unoptimized case:

    In the example above a process has been migrated during three stages of its lifetime and there are situations (state: Calculation) where pre-copy has enormous advantages (14 seconds with pre-copy and 51 seconds without pre-copy) but there are also situations (state: Initialization) where the pre-copy optimization increases the process downtime during migration (40 seconds with pre-copy and 27 seconds without pre-copy). It depends on the memory change rate.

    Optimization: Post-Copy

    Another approach to reduce the process downtime during migration is post-copy. The required memory pages are not dumped and transferred before restoring the process but on demand. Each time a missing memory page is accessed the migrated process is halted until the required memory pages has been transferred from the source system to the destination system:

    Thanks to userfaultfd this approach (or optimization) can be now integrated into CRIU. With the help of userfaultfd it is possible to mark memory pages to be handled by userfaultfd. If such a memory page is accessed, the process is halted until the requested page is provided. The listener for the userfaultfd requests is running in user-space and listening on a file descriptor. The same approach has already been implemented for QEMU.

    Enough Theory

    With all the background information on why and how the initial code to restore processes with userfaultfd support has been merged into the CRIU development branch: criu-dev. This initial implementation of lazy-pages support does not yet support lazy process migration between two hosts, but with the upstream merged patches it is at least possible to checkpoint a process and to restore the process using userfaultfd. A lazy restore consists of two parts. The usual 'criu restore' part and an additional, what we call uffd daemon, 'criu lazy-pages' part. To better demonstrate the advantages of a lazy restore there are patches to enhance crit (CRiu Image Tool) to remove pages which can be restored with userfaultfd from a checkpoint directory. Using a test case which allocates about 200MB of memory (and which writes one byte in each page over and over) requires after being dumped about 200MB. Using the mentioned crit enhancement make-lazy reduces the size of the checkpoint down to 116KB:

    $ crit make-lazy /tmp/checkpoint/ /tmp/lazy-checkpoint
    $ du -hs /tmp/checkpoint/ /tmp/lazy-checkpoint
         201M       /tmp/checkpoint
         116K       /tmp/lazy-checkpoint
    

    With this the data which actually has to be transferred during process downtime is drastically reduced and the required memory pages are inserted in the restored process on demand using userfaultfd. Restoring the checkpointed process using lazy-restore would look something like this:

    First the uffd daemon:

    $ criu lazy-pages -D /tmp/checkpoint --address /tmp/userfault.socket
    

    And then the actual restore:

    $ criu restore -D /tmp/lazy-checkpoint --lazy-pages --address /tmp/userfault.socket
    

    The socket specified with --address is used to exchange information about the restored process required by the uffd daemon. Once criu restore has done all its magic to restore the process except restoring the lazy memory pages, the process to be restored is actually started and runs until the first userfaultfd handled memory page is accessed. At that point the process hangs and the uffd daemon gets a message to provide the required memory pages. Once the uffd daemon provides the requested memory page, the restored process continues to run until the next page is requested. As potentially not all memory pages are requested, as they might not get accessed for some time, the uffd daemon starts to transfer unrequested memory pages into the restored process so that the uffd daemon can shut down after a certain time.

    Tagged as : criu fedora
  3. Booting with syslinux

    Having read about using syslinux as a boot-loader for virtual machines I tried to replace grub2 on one of the Fedora 24 virtual machines I am using with syslinux:

    Not completely knowing what to do I did:

    • dnf install syslinux-extlinux.x86_64
    • /sbin/extlinux --install /boot/extlinux/

    The I tried to create a configuration file using grubby:

    • grubby --extlinux --add-kernel=/boot/vmlinuz-4.4.6-300.fc23.x86_64 --title="4.4.6" --initrd=/boot/initramfs-4.4.6-300.fc23.x86_64.img --args="ro root=/dev/sda3"

    Which resulted in:

    # cat /etc/extlinux.conf 
    label 4.4.6
     kernel /vmlinuz-4.4.6-300.fc23.x86_64
     initrd /initramfs-4.4.6-300.fc23.x86_64.img
     append ro root=/dev/sda3
    

    I added following lines to the file manually:

    default 4.4.6
    ui menu.c32
    timeout 50
    

    After that I rebooted and the virtual machine was still using grub2 to load the kernel.

    To write syslinux to the MBR following additional command was required:
    dd if=/usr/share/syslinux/mbr.bin of=/dev/sda bs=440 count=1. I was a bit nervous rebooting the system after overwriting the MBR, but it rebooted successfully. The configuration file was also correctly updated after I installed a new kernel via dnf. I also removed grub2 (dnf remove grub2*) and was able to successfully reboot into the new kernel without grub2.

    Tagged as : 5 fedora rpmfusion
  4. New MirrorManager2 features

    The latest MirrorManager release (0.6.1) which is active since 2015-12-17 in Fedora's infrastructure has a few additional features which provide insights into the mirror network usage.

    The first is called statistics. It gives a daily overview what clients are requesting. It analysis the metalink and mirrorlist accesses and draws diagrams. Each time the local yum or dnf metadata has expired a new mirrorlist/metalink is requested which contains the 'best' mirrors for the client currently requesting the data. The current MirrorManager statistics implementation tries to display how often the different repositories are requested from which country for the available architectures:

    In addition to the statistics where the clients are coming from and which files they are interested in the old code to draw a map of the location of all mirror servers has been re-enabled: maps

    Another new visualization tries to track the propagation. The time the existing mirrors need to carry the latest bits. A script connects to all enabled mirrors and checks which repomd.xml file is currently available on the mirror. This is done for the development branch and all active branches. The script displays how many mirrors have the current repomd.xml file or if the mirror still has the  repomd.xml file from the previous push (or the push before) or if the file is even older: Propagation.

    Another relevant change in Fedora's MirrorManager is that it is no longer possible to enter FTP URLs. This is the first step to remove FTP based URLs  as FTP based mirrors are often, depending on the network topology, difficult to connect to, other protocols (HTTP, RSYNC) are better suited and more mirror server are not providing FTP anyway.

    Tagged as : fedora mirrormanager
  5. Bimini Upgrade

    I finally upgraded my PowerStation from Fedora 18 to Fedora 21. The upgrade went pretty smooth and was not much more than:

    $ yum --releasever=19 --exclude=yaboot --exclude=kernel distro-sync $ yum --releasever=20 --exclude=yaboot --exclude=kernel distro-sync $ yum --releasever=21 --exclude=yaboot --exclude=kernel distro-sync

    As I was doing the upgrade without console access I did not want to change the bootloader from yaboot to grub2 and I also excluded the kernel. Once I have console access I will also upgrade those packages.

    The only difficulty was upgrading from Fedora 20 to Fedora 21 because 32bit packages were dropped from ppc and I was not sure if the system would still boot after removing all 32bit packages (yum remove *ppc). But it just worked and now I have an up to date 64bit ppc Fedora 21 system.

    Tagged as : bimini fedora powerstation
  6. Using the ownCloud address book in mutt

    Now that I have been syncing my ownCloud address book to my mobile devices and my laptop I was missing this address book in mutt. But using pyCardDAV and the instructions at http://got-tty.org/archives/mutt-kontakte-aus-owncloud-nutzen.html it was easy to integrate the ownCloud address book in mutt. As pyCardDAV was already packaged for Fedora it was not much more work than yum install python-carddav, edit ~/.config/pycard/pycard.conf to get the address book synced.

    I was already using a LDAP address book in mutt so that I had to extent the existing configuration to:
    set query_command = "~/bin/mutt_ldap.pl '%s'; /usr/bin/pc_query -m '%s'"

    Now, whenever I press CTRL+T during address input, first the LDAP server is queried and than my local copy of the ownCloud address book.

    Tagged as : fedora mutt owncloud
  7. New external RAID

    Today a new external RAID (connected via Fibre Channel) was attached to our mirror server. To create the filesystem (XFS) I used this command:

    mkfs -t xfs -d su=64k -d sw=13 /dev/sdf1

    According to https://raid.wiki.kernel.org/index.php/RAID_setup#XFS this are the correct options for 13 data disks (15 with RAID6 plus 1 hot spare) and a stripe size of 64k.

    Tagged as : fedora
  8. bcache Follow-Up

    After using bcache for about three weeks it still works without any problems. I am serving around 700GB per day from the bcache device and looking at the munin results cache hits are averaging at about 12000 and cache misses are averaging at around 700. So, only looking at the statistics, it still seems to work very effectively for our setup.

    Tagged as : fedora
  9. RPM Fusion's MirrorManager moved

    After running RPM Fusion's MirrorManager instance for many years on Fedora I moved it to a CentOS 6.4 VM. This was necessary because the MirrorManager installation was really ancient and still running from a modified git checkout I did many years ago. I expected that the biggest obstacle in this upgrade and move would be the database upgrade of MirrorManager as its schema has changed over the years. But I was fortunate and MirrorManager included all the necessary scripts to update the database (thanks Matt). Even from the ancient version I was running.

    RPM Fusion's MirrorManager instance uses postgresql to store its data and so I dumped the data on the one system to import it into the database on the new system. MirrorManager stores information about the files as pickled python data in the database and those columns were not possible to be imported due to problems with the character encoding. As this is data that is provided by the master mirror I just emptied those columns and after the first run MirrorManager recreated those informations.

    Moving the MirrorManager instance to a VM means that, if you are running a RPM Fusion mirror, the crawler which checks if your mirror is up to date will now connect from another IP address (129.143.116.115) to your mirror. The data collected by MirrorManager's crawler is then used to create http://mirrors.rpmfusion.org/mm/publiclist/ and the mirrorlist used by yum (http://mirrors.rpmfusion.org/mirrorlist?repo=free-fedora-updates-released-19&arch=x86_64). There are currently four systems serving as mirrors.rpmfusion.org

    Looking at yesterday's statistics (http://mirrors.rpmfusion.org/statistics/?date=2013-08-20) it seems there were about 400000 accesses per day to our mirrorlist servers.

  10. bcache on Fedora 19

    After having upgraded our mirror server from Fedora 17 to Fedora 19 two weeks ago I was curious to try out bcache. Knowing how important filesystem caching for a file server like ours is we always tried to have as much memory as "possible". The current system has 128GB of memory and at least 90% are used as filesystem cache. So bcache sounds like a very good idea to provide another layer of caching for all the IOs we are doing. By chance I had an external RAID available with 12 x 1TB hard disc drives which I configured as a RAID6 and 4 x 128GB SSDs configured as a RAID10.

    After modprobing the bcache kernel module and installing the necessary bcache-tools I created the bcache backing device and caching device like it is described here. I then created the filesystem like I did it with our previous RAIDs. For RAID6 with 12 hard disc drive and a RAID chunk size of 512KB I used mkfs.ext4 -b 4096 -E stride=128,stripe-width=1280 /dev/bcache0. Although I am unsure how useful these options are when using bcache.

    So far it worked pretty flawlessly. To know what to expect from /dev/bcache0 I benchmarked it using bonnie++. I got 670MB/s for writing and 550MB/s for reading. Again, I am unsure how to interpret these values as bcache tries to detect sequential IO and bypasses the cache device for sequential IO larger than 4MB.

    Anyway. I started copying my fedora and fedora-archive mirror to the bcache device and we are now serving those two mirrors (only about 4.1TB) from our bcache device.

    I have created a munin plugin to monitor the usage of the bcache device and there are many cache hits (right now more than 25K) and some cache misses (about 1K). So it seems that it does what is supposed to do and the number of IOs directly hitting the hard disc drives is much lower than it would be:

    I also increased the cutoff for sequential IO which should bypass the cache from 4MB to 64MB.

    The user-space tools (bcache-tools) are not yet available in Fedora (as far as I can tell) but I found http://terjeros.fedorapeople.org/bcache-tools/ which I updated to the latest git: http://lisas.de/~adrian/bcache-tools/

    Update: as requested the munin plugin: bcache

    Tagged as : fedora

Page 2 / 3