A New Order

A few weeks ago I upgraded the hard disk in my notebook from 160GB to 250GB. I copied the whole hard disk using dd from the old drive to the new drive. I still had to change the partition layout to use the new space. So I downloaded the gparted live CD, booted it and discovered that I was not able to move an extended partition using gparted. I have the following partitions:

/dev/sda1          7  HPFS/NTFS
/dev/sda2          7  HPFS/NTFS
/dev/sda3   *     83  Linux
/dev/sda4          5  Extended
/dev/sda5         83  Linux

My plan was to increase the Windows partitions as well as the Linux partitions. To increase the size of /dev/sda2 I had to move /dev/sda3 and /dev/sda4. I was not able, however, using gparted, to move /dev/sda4. So I decided that I had to make a backup of /dev/sda5, then delete it (and /dev/sda4), move /dev/sda3 and increase the size of /dev/sda2.

Therefore I booted a Fedora installation DVD in the rescue mode and made a backup of /dev/sda5:

dd if=/dev/sda5 bs=65536 | ssh adrian@backup-server "dd of=sda5.img bs=65536"

Then I booted the gparted live CD and deleted /dev/sda5 and /dev/sda4, moved /dev/sda3 and increased the size of /dev/sda2. After that I created a new extended partition (/dev/sda4) and created /dev/sda5 using the remaining space. After gparted finished I booted the Fedora installation DVD again in the rescue mode and restored the backup:

ssh adrian@backup-server "dd if=sda5.img bs=65536" | dd of=/dev/sda5 bs=65536

At the end of the operation I booted my system and was happy that it still worked. Now I still had to resize the encrypted partition. This was pretty easy:

cryptsetup resize luks-<uuid>
pvresize /dev/mapper/luks-<uuid>

Before doing the lvresize I checked the available extends with vgdisplay and used that number in the following lvresize command:

lvresize -l +16449 /dev/mapper/vg_dcbz-lv_root
resize2fs /dev/mapper/vg_dcbz-lv_root

And that was already it. It took some time (maybe 4 hours), but everything finished without any problems. To make sure everything finished without any problems I forced a fsck (touch /forcefsck; reboot).

Before:

Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_dcbz-lv_root
                       74G   69G  1.4G  99% /

After:

Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_dcbz-lv_root
                      137G   69G   62G  53% /

Archaeology

If I remember it correctly my server at home (file-server, print-server, router, …) has been installed a long time ago using Red Hat Linux 8.0. Since the initial installation I have done live upgrades using rpm, apt-get or yum to its current version (Fedora 11). Now I just started doing a live upgrade using yum to Fedora 13 and I got an interesting dependency problem:

--> Finished Dependency Resolution
lilo-21.4.4-26.i386 from installed has depsolving problems
 --> Missing Dependency: mkinitrd >= 3.4.7 is needed by package lilo-21.4.4-26.i386 (installed)

It seems I still have an unused version of lilo installed on my system and now that mkinitrd has been replaced yum starts complaining. The lilo package is from 2004 and has also been installed in 2004 (according to the RPM database). It is the oldest package on my system but now it has to go.

Cluster Installation Finished

The hardware of our cluster is finally installed and ready. All 180 compute nodes (almost) are ready, Infiniband is working and the lustre is mounted.

First Infiniband benchmarks gave us results of about 23 GBit/s which is the expected bandwidth with our QDR network.

As a mirror admin I am bit frustrated that i cannot use the big filesystem which is mounted on every compute node for my mirror server:

172.31.100.222@o2ib,172.30.100.222@tcp:172.31.100.221@o2ib,172.30.100.221@tcp:/lprod
                       29T  819M   28T   1% /lustre/ws1

Now I still need to install the frontend servers. One is used for the users to log in and submit jobs and the other will contain the grid software as this cluster wil be part of the bwGRiD.

28th Open Grid Forum

Starting tomorrow (2010-03-15), I will be at the 28th Open Grid Forum (OGF28) in Munich for four days.

80 Nodes Up And Running

80 compute nodes from our cluster are up and running. We are now waiting for more switches and the filesystem servers to finally get the complete cluster (with all compute nodes) operational. To get the remaining nodes operational all I have to do is to add their MAC address to a file and with the magic of some scripts everything else is configured automatically. Unfortunately it all depends on the missing ethernet switches which should arrive any day now.

RAID 1 Shrinking

I was not happy with the partitioning of one of the cluster infrastructure servers. It had a software RAID for /boot, one for swap and the rest was a big software RAID for /. I should have used LVM for / for easy resizing, but I forgot and so I had to do it the hard way. I wanted to resize /dev/md2 which was used for / and then use LVM for the rest.

First I had to resize the filesystem. Online shrinking is not supported for resize2fs (at least I was not able to do it) and so I had to boot the CentOS 5.4 rescue system.

After dropping to the shell of the rescue system (without mounting the filesystems) I copied a mdadm.conf from a similar system to /etc so that I would be able to start the RAIDs:

  • mdadm -A /dev/md0
  • mdadm -A /dev/md1
  • mdadm -A /dev/md2

Only starting /dev/md2 would have be enough, but I wanted to make sure that everything is working as it is supposed to. Then, before running resize2fs, I had to do a filesystem check:

  • e2fsck -f /dev/md2 -C 0

Next step was to actually shrink the filesystem and make it smaller than the desired final size:

  • resize2fs /dev/md2 30G

Then I shrunk the RAID to about 40GB:

  • mdadm --grow /dev/md2 -z 40000000

and after that I had to resize the filesystem again to use the 40GB:

  • resize2fs /dev/md2

At this point I mounted the filesystem to see if it actually worked and it looked good (and smaller). Now came the hard part; to use the remaining space I had to re-partition the disk. I started fdisk and deleted the corresponding partitions and created at the same start point smaller partitions (42GB). This was the part were I was really worried about losing all my data which was fortunately backed up (of course). After I created the smaller partitions I tried to start /dev/md2 and it failed, saying that it could not find any RAID partitions.
I then tried to create the RAID again, hoping all data would be still available. I first created the RAID with only one device:

  • mdadm --create /dev/md2 -n 2 -l 1 /dev/sdb3 missing

This seemed to work and after mounting the new RAID I saw that all my files were still there. So the next step was to add the second device to the RAID with:

  • mdadm --manage -a /dev/md2 /dev/sda3

At this point the RAID started to re-sync and 20 minutes later I was able to grow the RAID to the new partition size:

  • mdadm --grow /dev/md2 -z max

Again I had to wait and before doing the final filesystem resize another filesystem check was necessary:

  • e2fsck -f /dev/md2 -C 0
  • resize2fs /dev/md2

And after only two hours I finally had what I wanted. I rebooted the system and it came up with the smaller / partition. I used the remaining space to create a new RAID (/dev/md3) which will probably be used with LVM if I ever need more space on this server in the future.

Without having a backup I would have not done all the steps because I was not always sure it would actually work.

Just Like Three Weeks Ago

Yesterday (2010-02-06) Benjamin and myself were again in Lech/Zürs snowboarding; just like three weeks ago. Last time (2010-01-17) Pattrick and Torsten were also able to join. This time it was only Benjamin and me.

The weather was similar to our last visit. Mostly cloudy with a few peeks of sunshine. This time, however, we had lots of new deep powder and it was freeriding time. Extremely exhausting but great fun.

Cluster Installation: First Nodes Up

Since Monday I am at the High Performance Computing Center Stuttgart (HLRS) and I have started the initial installation of our cluster.The people from the HLRS have offered to support us with the initial installation, which we gladly accepted because they know how to do clusters.

On Monday I installed the three infrastructure servers which are used to control the 180 nodes of the cluster. The cluster is running Scientific Linux and my first task was to get it on those three infrastructure servers.

Those servers have two 500GB disks and they were supposed to be running as software RAID. After the seventh failed attempt to configure the partitions as RAID1 with the Scientific Linux installer we used a Debian install DVD to partition the disks and after the successful configuration of the partitions as RAID1 we installed Scientific Linux on all three systems. Not knowing how to use anaconda to configure a RAID1 (like we wanted to) was a bit embarrassing, but with all the Fedora and CentOS installation I have done I have never configured a software RAID1 from the installer; either the system had only one disk, a hardware RAID controller or I configured the RAID manually after the installation. But at the end of the day all three system were installed and configured for their tasks.

Today (Tuesday) we used the installation to boot the first two nodes of the cluster. All the nodes are running disk-less and are booting over TFTP/NFS from a single read-only image.

Update To Fedora 12

Last week I have finally updated our mirror server to Fedora 12. It was still running Fedora 10 which has reached its end of life. The server was running Fedora 10 for a long time and it was always running with a CentOS kernel. The Fedora kernels were, at the beginning, not stable enough (crashing after three or four days) so that I quickly switched to a CentOS kernel. I know that I should have reported bugs, but in the case of the mirror server I am more concerned to keep it up and running than getting debug data from it. It also not easy for me to get physically to the machine so that I had a lot of good excuses to switch to a CentOS kernel.

Now the system is running using the Fedora 12 kernel and after a week it is still up without any problems.

Updating My RPM Fusion Builder

I am running one of the RPM Fusion builders in a VM using CentOS and after I saw that the newly created VMs on my notebook are using virtio for network and disk access I thought that I will try this also for my builder VM. It was pretty easy and straight forward.

First I had to update from CentOS 5.2 to CentOS 5.4 so that the virtio drivers are available. After that I was just following http://wiki.libvirt.org/page/Virtio.

For the network:

  • shut down the VM
  • edit the XML and add <model type='virtio'/> to the network section
  • start the VM
  • done

For the disk:

  • create a new ramdisk with the virtio drivers: mkinitrd --with virtio_pci --with virtio_blk -f /boot/initrd-$(uname -r).img $(uname -r)
  • or dracut -f --add-drivers "virtio_pci virtio_blk" /boot/initrd-$(uname -r).img $(uname -r) for Fedora 12
  • change /boot/grub/device.map from “(hd0) /dev/hda” to “(hd0) /dev/vda
  • using LVM requires no changes to the root= parameter in /etc/grub.conf
  • shut down the VM
  • edit the XML changing <target dev='hda' bus='ide'/> to <target dev='vda' bus='virtio'/>
  • start the VM
  • done

During the boot of the VM I can now see that it is loading the virtio disk drivers and detecting vda1 and vda2. Using lspci and lsmod I can also verify that the new virtio devices are available and also used. The VM seems to be faster but I have not actually benchmarked it.