Yesterday I wanted to help pushing large chunks of log data (30G) through a ridiculously low-bandwidth channel. I remembered reading about Con Kolivas’ lrzip and wondered whether it would provide better compression than bzip2 in this case. So I ran a little benchmark on a 1G chunk of data and compressed it with bzip2, lrzip and p7zip.
|
ratio |
time |
compr. |
cost |
| bzip2 |
7.14 |
05:45 |
100.00% |
100% |
| lrzip |
7.26 |
24:33 |
101.76% |
427% |
| p7zip |
8.42 |
28:13 |
118.01% |
490% |
OK, so I’ll stick with bzip2 for now. At least for that kind of data lrzip is not really an option – ~330% extra effort to get less than 2% improvement just doesn’t seem worth it. I have to say I am impressed what kind of compression p7zip (or better 7-Zip) can achieve but it is very expensive nevertheless. I like the fact that it seems to utilize all available CPUs automatically, though (in realtime it ran only ~19 minutes on a hyperthreading machine).