RAID Tuning

In order to get maximum performance with the newly setup RAID, I added some udev rules (by placing them in /etc/udev/rules.d/83-md-tune.rules) to increase caching. The file has one entry for each of the involved disks (sdX) to adjust the read-ahead:

ACTION=="add", KERNEL=="sdX", ATTR{bdi/read_ahead_kb}="6144"

And one for the mdX device to adjust the read-ahead as well as the size of the stripe cache:

ACTION=="add", KERNEL=="mdX", ATTR{bdi/read_ahead_kb}="24576", ATTR{md/stripe_cache_size}="8192"

With these settings dd yields the following results when copying a large file:

$ sync; echo 3 > /proc/sys/vm/drop_caches
$ dd if=largefile of=/dev/null bs=16M
20733648232 bytes (21 GB) copied, 60.4592 s, 343 MB/s

Which is nice – and rather pointless as the clients connect with 1G links so they see only one third of that performance at best… Note that the caches will cost extra kernel memory, so if you’re low on RAM you might want to opt for lower cache sizes instead.

Update: I forgot to mention that I also switched from the deadline (which is the default for current Ubuntu systems when installed as servers) to the cfq I/O scheduler as the test results from this article suggest that it is the optimal scheduler for RAID Level 5 no matter whether it is HW or SW controlled.

That time of the decade again – upgrading the RAID

With my Linux SW RAID screaming “grow me!” for quite a while now, I finally brought myself to replace the old 2TB disks with new 6TB ones (RAID 5 with 4 disks). While such a disk-upgrade has to be performed regularly, the frequency is so low that it is hard to remember the details when you finally get to do it again. Unfortunately the “official” method (replace & resync disk-by-disk and then grow the md and the filesystem) as suggested in the Linux RAID Wiki has a few drawbacks:

  • you have no backup in case of failures during the 4 RAID rebuilds
  • you continue to operate on the old filesystem, in my case where the RAID has been full for quite a while you will inherit quite a bit of unnecessary fragmentation – and you cannot switch nor re-tune the filesystem which could make sense for a significantly bigger RAID

Luckily Adrian reminded me of mdadm’s missing parameter, so I could perform this alternate RAID upgrade which I’ll detail below (should come in handy for my next upgrade). Continue reading →

RAID away

During the last week I’ve replaced the disks of my software RAID with larger ones as the capacity was exceeded. While this is theoretically an easy task, I had to learn a few things along the way:

  • Trying to perform such an upgrade on a headless system without console will fail.
  • fdisk silently fails to parse integer values larger than 2147483647.
  • The md superblock is located at the end of the partition/disk that you add to the RAID.
  • If the kernel associates the complete drive to a specific md device instead of the last partition, blocking the use of other partitions for other md devices, resize the last partition to leave some (wasted) space at the end to ensure that the end of the last RAID partition differs from the end of the drive.
  • Some manufactures build ‘green’ disks that constantly unload/load their heads, causing the drive to run out of spec in a very short time. If the manufacturer provides a DOS tool to correct that behavior, a pretty easy solution is to put it onto a bootable CD.
  • This stride calculation script helps to optimize the performance of the filesystem running on a RAID5.
  • Cheap desktop drives might be a bad choice for a RAID, if they break during the first re-sync of the RAID you can try to recover your data by re-creating the RAID – Thanks, Adrian!