Monthly Archive for December, 2009

Storage Trouble

In the night from Friday to Saturday a disk (slot 7) from our external RAID, containing most of the mirror server data, failed and was marked as BAD. No really a big problem, yet. The hot spare drive was activated and the rebuild started. About 24 hours later the rebuild finished. On Sunday (around 16:00) another drive (slot 5) failed and we immediately started to sync all the data to another box in case another drive decides to go off-line, which would mean a complete data loss. All the data on that RAID are (only) mirrored, but to re-sync all the 9TB we currently have would probably take a few weeks. Unfortunately the sync to another box will also take a few days until it is finished, so it is still possible that we might lose a lot. We are waiting for the replacement disks which have been promised to be here by Monday (today), but as the rebuild needs over 24 hours there is still the chance of a data loss.

Update (2009-12-14 23:20): The replacement disks have arrived and after more than twelve hours 25% of the array has been rebuilt.

Update (2009-12-15 11:00): After more than 24 hours 58% of the array has been rebuilt. It seems to rebuild faster during the night.

Back In School

Not really back in school, but it has been now more than one week that I started my new job at my old university in Esslingen at the beginning of December 2009. After only 11 months at my previous workplace (Matrix Vision) I am now working for the faculty of Information Technology.

I will be responsible for the setup and installation of the new cluster of the university. The cluster will be part of the bwGRiD and it will have around 1500 cores and is currently being installed. It is partly water-cooled and a few days ago the racks were delivered and installed. The cluster is from NEC and we are expecting the servers to be delivered in the next few days. The cluster will be running Scientific Linux.

I am now in the same building as my mirror server. This might be a good thing, because now I am much closer to the hardware and can act faster if something unexpected happens… It might also be a bad thing, because now I am much closer and can experiment with things I would not do if I was not in the same building.