Get the most out of your Centmin Mod LEMP stack
Become a Member

Sysadmin Compression Comparison Benchmarks: zstd vs brotli vs pigz vs bzip2 vs xz etc

Discussion in 'System Administration' started by eva2000, Sep 3, 2017.

  1. eva2000

    eva2000 Administrator Staff Member

    54,499
    12,208
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,777
    Local Time:
    1:15 AM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    Some folks will know I have a strong focus on performance and efficiency. If I can figure out how to do a task better and faster than the standard way, I will work towards doing that. One task I always focus on is data backup and restoration speeds. The faster I can backup and restore file data, the better.

    Part of the backup and restoration process is compression and decompression speeds which essentially comes down to the type of compression algorithms and tools you use and your system resources you have available i.e. cpu speed, number of cpu cores/threads and memory.

    In the past I have done comparison benchmarks for the various compression algorithms and tools I normally use. This time I have added two new compression algorithms from Facebook's zStandard (zstd) realtime compression algorithm which is said to way faster than gzip/zlib but with comparable compression ratios and Google's Brotli compression algorithm which has better compression ratios.

    Compression Algorithms Tested


    Test Data Files



    The test data set was taken from Silesia Compression Corpus zip file here turned into a tar archive for compression tests.
    Code (Text):
    mkdir -p /home/gziptest/silesia
    cd /home/gziptest/silesia
    wget http://sun.aei.polsl.pl/~sdeor/corpus/silesia.zip
    unzip silesia.zip
    cd /home/gziptest
    rm -rf /home/gziptest/silesia/silesia.zip
    tar -cvf silesia.tar /home/gziptest/silesia
    

    ~203 MB tar archive
    Code (Text):
    ls -la /home/gziptest/silesia.tar
    -rw-r--r-- 1 root root 211957760 Sep  3 00:41 /home/gziptest/silesia.tar
    



    Test System Configuration



    System:
    • OVH MC-32 Intel Core i7 4790K
    • 32GB Memory
    • 2x240GB SSD
    • 250Mbit Network Bandwidth
    • CentOS 7.3 64bit
    • Centmin Mod 123.09beta01 LEMP stack - Nginx 1.13.4, MariaDB 10.1.26 MySQL, + CSF Firewall
    • BHS, Canada

    Compression Comparison Results



    Below are the comparison results for compression tests with links to the raw data as well.
    where
    • compression levels 1-9 were tested despite some compression algorithms allowing to go higher in terms of levels i.e. pigz has specific level 11 for Zopfli compression and zstd/pztd has levels up to 19-22 where it can match xz/pxz in terms of compression ratios.
    • compress and decompress times are in seconds
    • compress and decompress speeds in MB/s
    • compress and decompress cpu % is percentage of cpu utilisation where 100% = 1 cpu thread and 800% = 8 cpu threads
    • compression ratio is ratio of original size to compressed size so larger the compression ratio, the better the compression and smaller the resulting compressed file size
    Summary:
    • Best compression ratio goes to xz/pxz, followed by lzip/plzip and then the various bzip2 implementations. However, plzip2 was faster than pxz so if you want speed + compression ratio, plzip2 would of been better. But memory usage for plzip2 is much higher
    • Best compression speed goes to lbzip2 then Facebook's pzstd followed by pigz. At level 9 compression, lbzip2 = 103MB/s, zstd = 43MB/s, pzstd = 90MB/s and pigz = 72MB/s
    • Best decompression speed goes to Facebook's zstd/pzstd and by pigz. At level 9 compression, zstd = 578MB/s, pzstd = 273MB/s and pigz = 326MB/s
    • Multi-threaded compression tools used the most memory for compression with plzip using the most memory followed by pxz
    • Multi-threaded compression tools used the most memory for decompression with pzstd using the most memory followed by plzip

    compress-test-030917-a.png
    compress-test-030917-b.png

    Did separate zstd/pzstd runs for compression levels 1 to 19 to illustrate the speed and compression ratios that can be obtained.

    To put it into perspective, at pztsd level 16 there's a compression ratio of 3.7581 compressed in 9.01s. If you compared them in terms of comparable compression ratios, it would be equivalent to:
    • pxz level 3 with compression ratio of 3.7823 compressed in 9.15s
    • plzip level 3 with compression ratio of 3.7397 compressed in 6.43s
    • pbzip2 level 5 with compression ratio 3.7899 compressed in 3.14s
    • lbzip2 level 5 with compression ratio 3.7987 compressed in 1.83s
    • bzip2 level 5 with compression ratio 3.8013 compressed in 14.10s
    • brotli level 9 with compression ratio 3.7296 compressed in 21.36s
    Surprising, lbzip2 seems to have the best speed to achieve a compression ratio of 3.73-3.80, followed by pbzip2, then plzip, then pzstd then pxz.

    Raw Data
    zstd / pzstd table

    compress-test-zstd-1-19-030917.png
     
    Last edited: Sep 3, 2017
  2. Kad

    Kad New Member

    4
    2
    3
    Sep 3, 2017
    Ratings:
    +4
    Local Time:
    7:15 AM
    Level 16 is a week spot in the published speed / compression ratio curve of zstd, so it looks like a poor reference for it.
    Level 15 for example feels much better (+25% speed for a negligible compression trade off).

    My feeling is that zstd is great for its speed. I mostly use it at levels 1-8, that's where it shines, and no need for octo-cores to run fast (I actually want to keep those cores for something else, like running business applications). My backup and network applications are no longer cpu-bound.
    Beyond that, zstd uses more and more energy trying to find less and less benefit, so in my opinion, it's the "long tail" of diminishing returns.

    Also : pzstd looks like it is deprecated. Better use zstd -T# directly.
    I generally got better compression ratio from the new setting.
    For example, at level 15, I got silesia.tar to 58006776, instead of 58162166 for pzstd.
    And memory usage is much improved.
     
  3. eva2000

    eva2000 Administrator Staff Member

    54,499
    12,208
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,777
    Local Time:
    1:15 AM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    Cheers thanks for the insight, it's literally the first time I touched zstd/pzstd so lots to learn :) Didn't know pzstd is being deprecated but was slightly confused when i saw zstd command supports the -T option for number of threads. Guess that's where it's coming from. Will rework my testing to use zstd -T and check it out :)

    I tend to try to utilise all cpu threads i.e. mysql multi-threaded backups via mydumper etc. Hence, why my above tests used multi-threaded compression tools at full 8 cpu thread counts. If you're backing up 500+ GB of data, you'd want the process to go as fast as possible :)
     
    Last edited: Sep 3, 2017
  4. eva2000

    eva2000 Administrator Staff Member

    54,499
    12,208
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,777
    Local Time:
    1:15 AM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    @Kad retested zstd using -T 8 option for 8 cpu threads for levels 1 to 19 again. The raw data is here.

    Yup you're correct compression ratios are slightly better using zstd with -T instead of pzstd as is the speeds. At pzstd level 15 took 7.05s to compress to 3.6442 ratio vs zstd -T8 at level 15 took 6.78s to compress to 3.6517 ratio. Though zstd -T8 at level 16, 17 and 19 were slower than pzstd. Memory usage and cpu usage is generally higher zstd -T8 vs pzstd.

    Interesting that zstd -T8 and pzstd at level 16 had lowest cpu usage ?

    compress-test-zstd-threaded-1-19-030917.png
     
    Last edited: Sep 3, 2017
  5. eva2000

    eva2000 Administrator Staff Member

    54,499
    12,208
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,777
    Local Time:
    1:15 AM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    Looks like I need to update to zstd 1.3.1 Release Zstandard v1.3.1 · facebook/zstd · GitHub
    zstd 1.3.0 vs zstd 1.3.1 raw data here. Looks like 1.3.1 reduced memory usage dramatically :)

    compress-test-zstd1.3.1-threaded-1-19-030917.png
     
    Last edited: Sep 3, 2017
  6. eva2000

    eva2000 Administrator Staff Member

    54,499
    12,208
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,777
    Local Time:
    1:15 AM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    zstd has compression levels 1 to 19 and then you can enable 3 additional levels 20-22 via --ultra flag. So lets see what zstd 1.3.1 with -T8 can do for all 22 levels of compression. Raw data results here.

    zstd level 22 has a compression ratio of 4.0247 and took 90.56s to compress which is equivalent to around xz/pxz level 5 and lzip/plzip level 6. But xz level 5 took 57.38s and pxz level 5 took 15.87s to complete. While lzip level 6 took 69.85s and plzip level 6 took 16.34s to compress.

    You can see why zstd levels up to 19 are really the only options if compression time is a concern :)

    compress-test-040917-zstd-1.3.1-1-22.png
     
    Last edited: Sep 4, 2017
  7. eva2000

    eva2000 Administrator Staff Member

    54,499
    12,208
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,777
    Local Time:
    1:15 AM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    Lets compare just the best of the multi-threaded capable compression algoritms + brotli and see which is better than standard gzip/pigz level 9 in terms of compression times and compression ratios.
    • Looks like lbzip2 level 9 is the pick if you want faster compression times and compression ratios than pigz/gzip. Then lbzip2 level 2 if you want an overall balance of compression + decompression times.
    • If you look at cpu max utilisation, you can see that lbzip2 utilised more of the 8 cpu threads available on the server compared to zstd topping out just under 6 cpu thread utilisation at higher compression levels 16+. That does mean that zstd has more room to improve in terms of speed and cpu utilisation ? :)
    • Followed by either brotli level 4 or zstd level 11 depending on if speed or compression ratios are the goal.
    • For decompression speeds, zstd wins so for overall compression + decompression speeds, zstd does look like a good candidate to beat pigz/zlib/gzip :) Though lbzip2 comes close i.e. if you had 1TB of data, pigz would compress/decompress in 243 mins/54 mins = 297 mins total vs zstd compress/decompress in 227 mins/30 mins = 257 mins total vs lbzip2 compress/decompress in 170 mins/122 mins = 292 mins total. Depends on how much memory usage you'd like to allocate to lbzip2 or zstd and compression ratio you want i.e. lbzip2 would beat zstd there. However, if you dropped to lbzip2 level 2, 1TB of data would compress/decompress in 148 mins/52 mins = 200 minutes total beating zstd 257 min total time and with better compression ratio of 3.6457 vs 3.5576.
    For compression
    • pigz level 9 took 2.8s or 72MB/s to compress to a compression ratio of 3.0857 using 9888 KB of memory
    • to beat pigz's compression times, zstd level 11 took 2.63s or 77MB/s to compress to compression ratio of 3.5576 using 393,692 KB memory
    • lbzip2 level 9 took 1.97s or 103MB/s to compress with compression ratio of 3.8784 using 59,764 KB of memory
    • plzip level 1 was slower than pigz taking 3.80s or 53MB/s to compress with compression ratio of 3.5004 using 138,420 KB of memory
    • px level 1 was slower than pigz taking 3.96s or 51MB/s to compress with compression ratio of 3.6089 using 93,768 KB of memory
    • brotli level 4 took 2.50s or 81MB/s to compress with compression ratio of 3.2413 using 71,528 KB of memory
    For decompression based on above compared ideal compression levels
    • pigz level 9 took 0.62s or 326MB/s to decompress using 1,044 KB of memory
    • zstd level 11 took 0.35s or 578MB/s to decompress using 5,408 KB memory
    • lbzip2 level 9 took 1.41s or 143MB/s to decompress using 59,764 KB of memory
    • plzip level 1 took 0.76s or 266MB/s to decompressusing 11,992 KB of memory
    • px level 1 took 3.03s or 67MB/s to decompress using 2,040 KB of memory
    • brotli level 4 took 1.07s or 189MB/s to decompress using 5,840 KB of memory
    compress-test-040917-multi-threaded-01.png
     
    Last edited: Sep 5, 2017
  8. Kad

    Kad New Member

    4
    2
    3
    Sep 3, 2017
    Ratings:
    +4
    Local Time:
    7:15 AM
    Wow ! That's such an impressive evaluation ! Much nicer and detailed than I ever attempted.

    I don't have much to add. Your results are great.
    We just use them differently, due to a difference in usage scenario.

    We typically snapshot a database content, and stream the backup over LAN while the db engine is still running, serving customers. So there's no downtime. It's neat,
    but thing is, the server is still active and still needs the CPU to serve clients, so backup happens in parallel and must take the least amount of CPU.
    1 or 2 cores is typically what I can save for the task.

    bzip2 seems great at ramping up with number of cores. With 8 cores freely available for the task, it's a serious contender. It scales almost linear, which is impressive. Though it's true both ways, which means with less cores it has to run slower, and single thread performance is what it is.
    zstd gives us the possibility to pay some compression ratio to gain some speed with less cores available, which just happens to fits our needs.
    (also, if I do remember correctly, bzip2 is great at text, but worse at binary data, so ranking could actually depend on test set).

    I never go to the level 15+ territory.
    I've been told it's useful to "bake" a config once, which is then deployed many times, and that's were the decompression speed makes wonders (and compression speed does not matter much). But that's not something I work on.
    Note : it seems in your test that levels 21+ were not multithreaded. That being said, given the memory usage with a single thread, that's probably better...
     
  9. eva2000

    eva2000 Administrator Staff Member

    54,499
    12,208
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,777
    Local Time:
    1:15 AM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    They were multi-threaded with -T8 but guess zstd didn't really use all the cpu threads ? But yes the raw results show even lower cpu utilisation at levels 20-22 for zstd 99-147%. Looks like Facebook decided for zstd levels 20-22 to give up on speed and optimise memory usage for focus on higher compression ratios.

    Yes will have to test other data sets to see how compression tools handle them. True about less cores = less performance for lbzip2. Another thing to test for later on is cpu thread scaling for multi-threaded compression algorithms :)

    Yeah I pair my backup routines with nice and ionice so process and disk priority is lower which still gives a good overall mix for performance without taking too much away from general client and more important tasks i.e. 8 cpu threads at lower process and disk priority would still be faster than 1 to 2 cpu threads at normal priority
     
  10. eva2000

    eva2000 Administrator Staff Member

    54,499
    12,208
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,777
    Local Time:
    1:15 AM
    Nginx 1.27.x
    MariaDB 10.x/11.4+

    CPU Thread Scaling For Pigz



    Now to test cpu thread scaling for multi-threaded compression algorithms. I'll start with Pigz level 1-9 for cpu threads 1-8. I don't recall the specific reason why I scripted my compression testing script to leave out decompression tests when doing cpu thread scaling as I wrote the script several years ago :) Will probably need to revise the script to re-add decompression tests when doing cpu thread scaling - not that all multi-threaded compression algorithms even use multiple threads for decompression.

    Raw data here.

    compress-test-050917-cpuscaling-pigz.png

    Charting min, default and max compression levels

    compress-test-050917-cpuscaling-pigz-chart1.png
     
    Last edited: Sep 5, 2017
  11. eva2000

    eva2000 Administrator Staff Member

    54,499
    12,208
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,777
    Local Time:
    1:15 AM
    Nginx 1.27.x
    MariaDB 10.x/11.4+

    CPU Thread Scaling For lbzip2



    Now to test cpu thread scaling for multi-threaded compression algorithms. I'll start with lbzip2 level 1-9 for cpu threads 1-8.

    Raw data here.

    compress-test-050917-cpuscaling-lbzip2.png

    Charting min, default and max compression levels

    compress-test-050917-cpuscaling-lbzip2-chart1.png
     
    Last edited: Sep 5, 2017
  12. eva2000

    eva2000 Administrator Staff Member

    54,499
    12,208
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,777
    Local Time:
    1:15 AM
    Nginx 1.27.x
    MariaDB 10.x/11.4+

    CPU Thread Scaling For zstd



    Now to test cpu thread scaling for multi-threaded compression algorithms. I'll start with zstd level 1-22 for cpu threads 1-8.

    Raw data here.

    The chart is very long due to zstd having 22 levels of compression, so click the below link to expand and display the chart.

    compress-test-050917-cpuscaling-zstd-part1.png
    compress-test-050917-cpuscaling-zstd-part2.png

    Charting min, default and max compression levels

    compress-test-050917-cpuscaling-zstd-chart1.png
     
    Last edited: Sep 5, 2017
  13. eva2000

    eva2000 Administrator Staff Member

    54,499
    12,208
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,777
    Local Time:
    1:15 AM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    If you compare the previous 3 posts for cpu scaling for pigz vs lbzip2 vs zstd just for 1 and 2 cpu thread compression results, then yup zstd is much faster. So if you only had 1 or 2 cpu threads available on your server, zstd would be faster though compression ratios are lower.

    pigz vs lbzip2 vs zstd for 1 and 2 cpu thread scaling for the first 9 levels of compression
    • zstd level 9 for 1 cpu thread ran at 53MB/s for 3.4931 ratio
    • zstd level 9 for 2 cpu thread ran at 78MB/s for 3.4879 ratio
    • for zstd to match lbzip2 compression ratios, for 1 cpu thread, zstd level 18 would be needed which ran at 5MB/s for 3.8270 ratio and for 2 cpu thread, zstd level 18 would of ran at 9MB/s for 3.8222 ratio
    • lbzip2 level 9 for 1 cpu thread ran at 23MB/s for 3.8784 ratio
    • lbzip2 level 9 for 2 cpu thread ran at 45MB/s for 3.8784 ratio
    • pigz level 9 for 1 cpu thread ran at 12MB/s for 3.0857 ratio
    • pigz level 9 for 2 cpu thread ran at 25MB/s for 3.0857 ratio
    • So if you only had 1 cpu thread available, zstd would definitely be the choice. But if you had 2+ cpu threads available, lbzip2 even at level 2-4 would be better if compression ratios were important.
    compress-test-050917-cpuscaling-1-2threads-pigz-vs-lbzip2-vs-zstd.png
     
    Last edited: Sep 5, 2017
  14. eva2000

    eva2000 Administrator Staff Member

    54,499
    12,208
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,777
    Local Time:
    1:15 AM
    Nginx 1.27.x
    MariaDB 10.x/11.4+

    CPU Thread Scaling For pxz



    Now to test cpu thread scaling for multi-threaded compression algorithms. I'll start with pxz level 1-9 for cpu threads 1-8.

    Raw data here.

    compress-test-050917-cpuscaling-pxz.png

    Charting min, default and max compression levels

    compress-test-050917-cpuscaling-pxz-chart1.png
     
    Last edited: Sep 5, 2017
  15. Kad

    Kad New Member

    4
    2
    3
    Sep 3, 2017
    Ratings:
    +4
    Local Time:
    7:15 AM
    Great charts !

    They made me realise that you test on a 4-cores == 8 hyperthread machines.
    That's likely why you get linear speed up up to 4 threads.

    After that threshold, further gains are more difficult, due to cache contention and general cpu resource sharing.
     
  16. eva2000

    eva2000 Administrator Staff Member

    54,499
    12,208
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,777
    Local Time:
    1:15 AM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    Ah yes you're spot on 4 physical cpu cores + 4 hyper threads = 8 threads, so yes might fall off after 5+ threads with some compression algorithms :) But potential is nice if you have more cpu cores and threads available. AMD Ryzen Threadripper with 16 cores/32 threads would be a nice test platform to have :D

    This thread and benchmarks highlight one thing in that if you have more cpu cores/threads, use them where you can !
     
  17. eva2000

    eva2000 Administrator Staff Member

    54,499
    12,208
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,777
    Local Time:
    1:15 AM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    All multi-threaded compression algorithms in one table for 8 cpu threads :)

    compress-test-030917-multi-threaded-only-01.png
     
  18. eva2000

    eva2000 Administrator Staff Member

    54,499
    12,208
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,777
    Local Time:
    1:15 AM
    Nginx 1.27.x
    MariaDB 10.x/11.4+

    Linux 4.13 Tar Compression Test



    Updated testing changing the target data set to the official Linux 4.1.3 tar image which will be compressed and decompressed. Raw data here.

    Compression Algorithms Tested

    Test Data Files



    The test data set was taken from official Linux 4.1.3 tar image just extracting the tar file from tar.xz package.
    Code (Text):
    mkdir -p /home/gziptest/
    cd /home/gziptest/
    wget https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.13.tar.xz
    pxz -d linux-4.13.tar.xz
    

    ~780 MB tar archive
    Code (Text):
    ls -la /home/gziptest/linux-4.13.tar
    -rw-r--r-- 1 root root 817817600 Sep  3 21:10 /home/gziptest/linux-4.13.tar
    


    Test System Configuration



    System:
    • OVH MC-32 Intel Core i7 4790K
    • 32GB Memory
    • 2x240GB SSD
    • 250Mbit Network Bandwidth
    • CentOS 7.3 64bit
    • Centmin Mod 123.09beta01 LEMP stack - Nginx 1.13.4, MariaDB 10.1.26 MySQL, + CSF Firewall
    • BHS, Canada
    Compression results with compression/decompression speed in MB/s

    compress-test-060917-kernel-leveltest-01-part1.png
    compress-test-060917-kernel-leveltest-01-part2.png

    Multi-threaded pigz vs lbzip2 vs pxz vs zstd vs plzip compress speed versus compress ratio

    compress-test-060917-kernel-leveltest-chart-01.png
     
    Last edited: Sep 6, 2017
  19. eva2000

    eva2000 Administrator Staff Member

    54,499
    12,208
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,777
    Local Time:
    1:15 AM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    Possibly good news work on zstd compression binding for Nginx might be in the works to join the likes of brotli and gzip compression for HTTP requests.
     
  20. Brad Knowles

    Brad Knowles New Member

    4
    2
    3
    May 28, 2018
    Austin, Texas
    Ratings:
    +3
    Local Time:
    3:15 PM
    1.12
    None
    So, I'm curious -- have you tried adding some really large files to the benchmark? For example, pwned-passwords-ordered-2.0.txt from the page at Have I Been Pwned: Pwned Passwords?