Yesterday I wanted to help pushing large chunks of log data (30G) through a ridiculously low-bandwidth channel. I remembered reading about Con Kolivas’ lrzip and wondered whether it would provide better compression than bzip2 in this case. So I ran a little benchmark on a 1G chunk of data and compressed it with bzip2, lrzip and p7zip.
ratio | time | compr. | cost | |
bzip2 | 7.14 | 05:45 | 100.00% | 100% |
lrzip | 7.26 | 24:33 | 101.76% | 427% |
p7zip | 8.42 | 28:13 | 118.01% | 490% |
OK, so I’ll stick with bzip2 for now. At least for that kind of data lrzip is not really an option – ~330% extra effort to get less than 2% improvement just doesn’t seem worth it. I have to say I am impressed what kind of compression p7zip (or better 7-Zip) can achieve but it is very expensive nevertheless. I like the fact that it seems to utilize all available CPUs automatically, though (in realtime it ran only ~19 minutes on a hyperthreading machine).