After having received my Raspberry Pi in November, I am finally using it. I have connected it to my television using raspbmc.Using XBMC Remote I can control it without the need for a mouse, keyboard or lirc based remote control and so far it works pretty good. Following are a few pictures with the new case I bought a few days ago:
To test checkpoint/restore on Fedora you need to run the current development version of Fedora and install crtools using yum (yum install crtools). Until it is decided if it actually will be a Fedora 19 feature and the necessary changes in the Fedora kernel packages have been implemented it is necessary to install a kernel which is not in the repository. I have built a kernel in Fedora’s buildsystem which enables the following config options: CHECKPOINT_RESTORE, NAMESPACES, EXPERT.
A kernel with these changes enabled is available from koji as a scratch build: http://koji.fedoraproject.org/koji/taskinfo?taskID=4899525
After installing this kernel I am able to migrate a process from one Fedora system to another. For my test case I am migrating a UDP ping pong (udpp.c) program from one system to another while communicating with a third system.
udpp is running in server mode on 22.214.171.124 and on 126.96.36.199 udpp is started in client mode. After a short time I am migrating, with the help of crtools, the udpp client to 188.8.131.52. The following is part of the output on the udpp server:
Received ping packet from 184.108.40.206:38374
Data: This is ping packet 6
Sending pong packet 6
Received ping packet from 220.127.116.11:38374
Data: This is ping packet 7
Sending pong packet 7
Received ping packet from 18.104.22.168:38374
Data: This is ping packet 8
Sending pong packet 8
Received ping packet from 22.214.171.124:38374
Data: This is ping packet 9
Sending pong packet 9
So with only little changes to the kernel configuration it is possible to migrate a process by checkpointing and restoring a process with the help of crtools.
We have integrated new nodes into our cluster. All of the new nodes have a local SSD for fast temporary scratch data. In order to find which are the best options and IO scheduler I have written a script which tries a lot of combinations (80 to be precise) of file system options and IO schedulers. As the nodes have 64 GB of RAM the first run of the script took 40 hours as I tried to write always twice the size of the RAM for my benchmarks to avoid any caching effects. In order to reduce the amount of available memory I wrote a program called memhog which malloc()s the memory and then also mlock()s it. The usage is really simple
$ ./memhog Usage: memhog <size in GB>
I am now locking 56GB with memhog and I reduced the benchmark file size to 30GB.
So, if you have too much memory and want to waste it… Just use memhog.c.
After having successfully updated libcdio in rawhide to 0.90 and also introduced the split off libcdio-paranoia in Fedora’s development branch, I rebuilt most of on libcdio depending packages. Two packages were no longer building but their maintainers quickly fixed it. The only broken dependent package was kover. As I am still upstream of kover I had to change the code to use the new CD-Text API of libcdio 0.90.
With these changes I have released kover version 6 which is available at http://lisas.de/kover/kover-6.tar.bz2.
I have updated the scripts which are using the mirrored project status information in our database to display even more information about what is going on on our mirror server. In addition to the overall traffic of the last 14 days, 12 months and all the years since we started to collect this data, the overall traffic is now broken down to transferred HTTP, FTP, RSYNC and other data (blue=other, red=http, green=rsync, yellow=ftp). The most traffic is generated by HTTP, followed by RSYNC and last (but not surprising) is FTP.
In addition to breakdown by traffic type I added an overview of the mirror size (in bytes and number files) at the bottom of the status page of each mirrored project. Looking at the status page of our apache mirror it is now possible to see the growth of the mirror since 2005. It started with 7GB in 2005 and has now reached almost 50GB at the end of 2012.
Adding the new functionality to the PHP scripts I had to change code I have written many years ago and unfortunately I must confess that this is embarrassingly bad code and it already hurts looking at it. Adding new functionality to it was even worse, but despite my urge to rewrite it I just added the new functionality which makes the code now even more unreadable.
A few days ago I started to upgrade my PowerStation from Fedora 15 (running my own rebuild) to Fedora 18 Beta.
The update from the running Fedora 15 to Fedora 16 was the really hard part. It seems that the userspace moved from 32bit to 64bit and that was something that yum, understandably, could not handle. So after the first run of all packages updated to Fedora 16 (which required a lot of rpm -e --justdb --nodeps --noscripts) and a reboot the system was broken. systemd tried to start udev but that failed with:
[ 38.164191] systemd: udev.service holdoff time over, scheduling restart.
[ 38.208255] systemd: Job pending for unit, delaying automatic restart.
and systemd kept printing those lines forever. Luckily I still had the original Yellow Dog Linux installation on a second drive and could boot that. Unfortunately I could not chroot into the Fedora 16 installation because the Yellow Dog Linux kernel was too old, but I was able to mount it and disabled every occurrence of udev in systemd. Rebooting with
systemd.unit=emergency.target on the kernel command-line I was able to get the network running and reinstalled with yum the udev and systemd ppc64 packages. After that (and some more fiddling around) it rebooted into Fedora 16.
I then just followed the recommendations on the Fedora wiki to upgrade using yum from F16->F17 and F17->F18. The only difference was that I installed the gpg key, which is used to sign the packages, from https://fedoraproject.org/keys using the keys for the secondary architectures.
Now I have a PowerStation with the latest 64bit Fedora 18 Beta packages up and running.
For our mirror server we now have a third RAID which is also used for the mirror data. The previous external RAIDs (12x1TB as RAID5 + hot spare) were reaching their limits and so additional 11x1TB as RAID6 in the remaining internal slots are a great help to reduce the load and usage of the existing disks. There are now roughly 30TB used for mirror data.
To create the filesystem on the new internal RAID I have used http://busybox.net/~aldot/mkfs_stride.html. With 11 disks, a RAID level of 6, RAID chunk size of 512 KiB and number of filesystem blocks of 4KiB I get the following command to create my ext4 filesystem:
mkfs.ext4 -b 4096 -E stride=128,stripe-width=1152
I am now moving all the data from one of the external RAIDs to the new internal RAID because the older external RAID still uses ext3 and I would like to recreate the filesystem using the same parameter calculation as above. Once the filesystem has been re-created I will distribute our data evenly across the three RAIDs (and maybe also mirror a new project).
Update: After moving the data from one of the external RAIDs to the internal RAID the filesystem has been re-created with:
mkfs.ext4 -b 4096 -E stride=128,stripe-width=1280
[ 62.816884] qla2xxx [0000:21:00.0]-0063:3: Failed to load firmware image (ql2400_fw.bin). [ 62.816889] qla2xxx [0000:21:00.0]-0090:3: Fimware image unavailable. [ 62.816891] qla2xxx [0000:21:00.0]-0091:3: Firmware images can be retrieved from: ftp://ftp.qlogic.com/outgoing/linux/firmware/. [ 63.526024] qla2xxx [0000:21:00.0]-00c2:3: Unable to initialize EFT (258).
and I opened following bug report: qla2xxx firmware not loaded
I thought that it probably is not optimal but as it was working like before I thought that I do not need to act immediately.
Yesterday I got a bug report that one file downloaded from our server was corrupt. This is nothing unusual and happens a few times per year. After checking that the checksums were correct I thought that it probably happened during the download and the user just needs to download the file again and everything is okay.
Unfortunately I got another bug report that evening that files had wrong checksums and they were changing with every try to download the file again: ports/168956: lang/gcc46 version 126.96.36.19920608 checksum mismatch
I was not able to reproduce this but as both files were on the same RAID connected with the same Fibre Channel controller I thought maybe the missing firmware is the problem. I blacklisted the modules in dracut so that they are not included in the ramdisk hoping that the firmware will be loaded if the modules are loaded from the disk instead of being loaded from the ramdisk. And it worked. The firmware is now loaded (at least the driver is no longer complaining about it) and until now I have no further error reports. Let’s see if this was the correct fix.
Update 1: So it seemed pretty unlikely that the firmware of a Fibre Channel controller is responsible for the data corruption. I am now trying something new. I downgraded the kernel from 3.4.0 to 3.3.4.
Yesterday I upgraded our mirror server to Fedora 17. After having neglected the system for some time it still ran Fedora 14. Fedora 14 was extremely stable and the uptime was almost 1 year. Such large uptimes are usually a sign of a lazy admin because with the frequency of kernel updates the system should have been rebooted much more often and Fedora 14 is now almost half a year EOL. The update to Fedora 17 is the first update I did not want make using yum because of the changes necessary for UsrMove. I burned the DVD (actually Martin did it) and even looked at the installation guide. In the installation guide it says:
Before upgrading to Fedora 17 you should first bring your current version up to date. However, it is not then necessary to upgrade to intermediate versions. For example, you can upgrade from Fedora 14 to Fedora 17 directly.
Great, I was already afraid I had to do two upgrades. After dumping the postgresql Database (I even thought about this) I rebooted using the DVD and it started to search for previous installations. It found a Fedora 14 installation and said that it cannot upgrade Fedora 14 to Fedora 17. Just as I expected it. Now Silvio was so nice to burn a Fedora 16 DVD and I started the Fedora 16 upgraded but this time the installer did not even offer the possibility to upgrade and the only possibility was a new installation. After using the shell the installer offers on another VT I found out that we have to many partitions. Not sure what the installer exactly does but it was not able to handle a separate partition for /var and /var/lib which we have been using. It was not able to find the RPM database and aborted the upgrade process. So I increased the size of the LV containing / and copied /var,/var/lib and /usr (because of UsrMove) to the / partition and finally the upgrade could start. After the upgrade finished I inserted the Fedora 17 DVD this upgraded finished without any problems.
After rebooting in the freshly upgraded Fedora 17 I saw that the upgraded to systemd did not went as smooth as is should have been. All service which were converted to systemd unit files were stopped and disabled. Only the jabber server was running (which is my package and has not been converted to systemd (but it will be for Fedora 18)). So I checked all the configuration files and started and enabled one service after another (has been a good systemd training).
After 6 hours most services were running again and the mirror server was happily serving files.
Today I also upgraded my notebook from Fedora 16 to Fedora 17. Using the Fedora 17 DVD from above it upgraded the system without any obvious problems. After rebooting into Fedora 17 I inserted my notebook back into the docking station (two external monitors connected via DVI) and was shocked that the monitors were no longer detected. The gnome-shell process was using 150% of the CPU and the CPU temperature was around 98°C (usually around 55°C). So at first I panicked and wanted Fedora 16 back but then I found at that all I needed was an updated xorg-x11-drv-intel. After a yum update --enablerepo=updates-testing xorg-x11-drv-intel-2.19.0-5.fc17 everything was back as good as Fedora 16 (and better of course).