Beyond bzip2

Yesterday I wanted to help pushing large chunks of log data (30G) through a ridiculously low-bandwidth channel. I remembered reading about Con Kolivas’ lrzip and wondered whether it would provide better compression than bzip2 in this case. So I ran a little benchmark on a 1G chunk of data and compressed it with bzip2, lrzip and p7zip.

ratio time compr. cost
bzip2 7.14 05:45 100.00% 100%
lrzip 7.26 24:33 101.76% 427%
p7zip 8.42 28:13 118.01% 490%

OK, so I’ll stick with bzip2 for now. At least for that kind of data lrzip is not really an option – ~330% extra effort to get less than 2% improvement just doesn’t seem worth it. I have to say I am impressed what kind of compression p7zip (or better 7-Zip) can achieve but it is very expensive nevertheless. I like the fact that it seems to utilize all available CPUs automatically, though (in realtime it ran only ~19 minutes on a hyperthreading machine).