back to main page                                                    txt version

====================== data compression utils ======================

Benchmarks of various data compression tools Index: * results and key observations * tested programs and results for each command * result files

results and key observations ============================

system used : link methodology : link Links contain top results in each category for datasets excluding 06 and 0b-0f due to poor comperssion ratios. fast top results: 1. zstd -3 --long=31 2. zstd -8 --long=31 3. pigz -6 - zstd killer feature: '--long=31'; it makes repetitive data highly compressed at the cost of higher memory usage, but it will still work on machines with at least 8 GB of RAM - 'zstd -3' is great when extremely fast (NVMe PCIe 4.0 level fast) compression is required - pigz offers comparable compression times to 'zstd -3 --long=31', but mostly with much worse compression ratios balanced top results: 1. lrzip -9 1. zstd -17 --long=31 1. pixz -9 / pixz -8 1. plzip -9 / plzip -6 - no obvious winner: pixz and lzip beat zstd most of the time in this category, but zstd beats them when the data is highly repetitive ('--long=31'); lrzip sometimes takes a very long time to decompress data and has high memory usage, but on the other hand has the best compression ratios - 7z and xz were excluded from this category due to very slow decompression - pbzip2 is doing surprisingly well when compressing non-repetitive text data - xz offers similar compression ratios and times to pixz, but doesn't have multi-threaded decompression yet, and this affected decompression times heavily best top results: 1. lrzip --zpaq / lrzip -L 9 2. zstd -22 --long=31 --ultra / zstd -20 --long=31 --ultra 2. pixz -9 / xz -9e 2. 7z a ... -mfb=279 -md=256m / 7z a -t7z -m0=lzma2 -mx=9 - 'lrzip --zpaq' is absolutely the best in every data set, but compression times are highest and decompression times are the same as compression times; 'lrzip -9' doesn't have these drawbacks and compression ratios are comparable - zstd didn't compress 'cs: go' as well as other utils - lrzip and zstd with '--long=31' have no competition when compressing highly repetitive data, like datasets 05 and 07 - 'zstd -20 --long=31' is often twice as fast as 'zstd -22 --long=31', but compresses data up to 1 percentage point less, and the same can be said by comparing 'zstd -17 --long=31' to 'zstd -20 --long=31' - 'pigz -11' is terribly ineffective (cpu time is 8x higher than the next slowest tool), so I stopped testing it after the first dataset - 'plzip -9 -s 256MiB -m 273' has way too high memory usage, so it was excluded from the top rankings other observations: - lrzip has stability issues; this opinion is based on my previous experience using this tool, when I encountered bugs when compressing and/or decompressing; this was few years back, so some of the bugs are probably resolved, but during the benchmarking I also encountered this bug: https://github.com/ckolivas/lrzip/issues/102

tested programs and results for each command ============================================

Links contain benchmark results for programs' various settings performed on one set of text data and one set of binary data. Main focus of this benchmark is on multi-threaded programs. - multi-threaded version pigz 2.4 pbzip2 1.1.13 xz 5.2.5 pixz 1.0.7 plzip 1.8 lrzip 0.631 zstd 1.4.5 * --long=31: link 7z 16.02 - single-threaded version gzip 1.10 bzip2 1.0.8 lzip 1.21 lz4 1.9.2 lzop 1.0.4 brotli 1.0.9

result files ============

How to read result tables: link. Results for each data set: binary 00 - qcow2 image of a fresh installation of arch linux, 12GB 01 - qcow2 image of a fresh installation of windows 7, 18GB 02 - /usr/bin directory of a system with a lot of bloat, 2GB 03 - cleaned /usr/lib directory of a system with a lot of bloat, 15GB 04 - installed counter-strike global offensive, 24GB 05 - old dos games, 14GB 06 - a bunch of bencoded torrent files, 3GB text 07 - linux source code versions 5.9.1, 5.8.11 and 5.8.5, 3GB 08 - dump of passwords found on the internet, 10GB 09 - parsed information from torrent files, 21GB 0a - cat of e-books converted to txt from the Gutenberg Project, 10GB multimedia (just as an experiment to see compression ratios) 0b - TIFF images from Hubble space telescope, 2GB 0c - JPG images with paintings from various sources, 2GB 0d - a bunch of mp3 music files from various sources, 3GB 0e - varioius PDF files containing books, 3GB 0f - video files from various sources, 5GB The above files do not contain benchmarks for zstd with a setting '--long=31'. These are provided separately here. Compared to '--long=30' significant differences exist only for dataset 00.