Want more timely Centmin Mod News Updates?
Become a Member

A LZSS microdeduplicator tagetting huge texts, with C source

Discussion in 'System Administration' started by Sanmayce, Jan 14, 2019.

  1. Sanmayce

    Sanmayce New Member

    17
    7
    3
    Jun 9, 2018
    Sofia
    Ratings:
    +22
    Local Time:
    6:40 AM
    Since I enliked the forum and the appreciation of Sysadmin for fast compression, decided to create a thread for my incoming amateurish LZSS compressor - nothing special except it exploits some unseen techniques as using millions of highly optimized B-trees of order 3 (MAX 2 keys, 3 pointers) as a matchfinder.

    My wish is C community (not only *nix, but Windows folks as well) to have one 100% FREE etude of textual decompression. The main goals at the very beginning were:
    - superfast decompression;
    - strong textual compression using 1TB sliding window.

    By exploiting the already fully-tested B-tree implementation of mine, the finding of an key/match could be adjusted to be found in less than 4 RAM fetches! Simply, by enlarging the HASH table (I call it HASHPOT) the number of B-trees could be 4 billion if the whole 32bit of hash are used, usually I use ~130 million B-trees with 27bit slots, where each slot houses the root (8 bytes) of a B-tree, therefore (1<<27)*8=1GB, if enwik9 is to be compressed with its 1- billion keys then the maximum attempts to find a key is less than 4 because no B-tree is higher than 4 levels.

    The nifty thing about my B-tree implementation is that the memory in use can be either internal or external RAM, toggled by one variable. Thus, when some big file, say the Wikipedia XML dump being ~62GB, is to be compressed, the matchfinder will use one external pool (a file of its own) to house B-trees, one 512GB SSD allows ~500GB virtual RAM, in the past I have made some benchmarks with this codepath and the speeds were nearing the 11,000 IOPS - reading&writing small (512 bytes) packets RANDOMLY across 60GB of the SSD!

    To share my current excitement of the incoming Nakamichi_Ryuugan-ditto-1TB_btree.c, I used the Leprechaun movies as inspiration:

    Incoming_btree_ugly.png


    The current revision (attached) has the B-tree included but still not integrated with the parser (to replace the ultraslow memmem() approach).
    I guess, by the end of the month, a working revision will be ready, to be shared in here...

    Code (Text):
    D:\_KAZE_unsorted\N1TB_bigtime_b-tree>"Nakamichi_Ryuugan-ditto-1TB_(1xQWORD+1xXMM)_Intel_15.0_64bit_archSSE41.exe" The_Complete_Sherlock_Holmes_-_Doyle_Arthur_Conan.txt
    Nakamichi 'Ryuugan-ditto-1TB', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced, muffinesque suggestion by Jim Dempsey enforced.
    Note0: Nakamichi 'Dragoneye' is 100% FREE, licenseless that is.
    Note1: Hamid Buzidi's LzTurbo ([a] FASTEST [Textual] Decompressor, Levels 19/29/39) retains kingship, his TurboBench (2017-Apr-07) proves the supremacy of LzTurbo, Turbo-Amazing!
    Note2: Conor Stokes' LZSSE2 ([a] FASTEST Textual Decompressor, Level 17) is embedded, all credits along with many thanks go to him.
    Note3: 'Ryuugan' predecessors are Washigan, Okamigan, Zato, Tsubame, Tengu-Tsuyo, Tengu, Rakka, Kokuen, Kinroba, Yoko, Kinutora, Jiten, Butsuhira, Suiken, Keigan, Kumataka, Washi, Aratama, Hitomi, Nek
    omata, Kitsune, Kinezumi, Sanbashi, Kaiko, Inazuma, Zangetsu, Hanabi, Hanazakari, Sanshi and Kaidanji.
    Note4: This variant is the SLOWEST compressor under the Sun! This compile can handle files up to 5120MB.
    Note5: The matchfinder is either 'Railgun_Trolldom' (matches longer than 18) or Leprechaun's B-tree order 3 (matches less or equal to 18).
    Note6: Instead of '_mm_loadu_si128' '_mm_lddqu_si128' is used.
    Note7: The lookahead 'Tsuyo' heuristic which looks one char ahead is applied thrice, still not strengthened, though.
    Note8: The compile made 2017-Oct-22, the decompression time measuring is done in 16x8 passes choosing the top score from 64 back-to-back runs - the goal - to enter [maximal] Turbo Mode.
    Note9: In GP/SSE4.1/AVX2 compile, the 24 matches become 3xQWORD/1xQWORD+1xXMMWORD/1xYMMWORD.
    NoteA: Maximum compression ratio is 44:1, for 704 bytes long matches within 1TB Sliding Window.
    NoteB: Please send me (at sanmayce@sanmayce.com) decompression results obtained on machines with fast CPU-RAM subsystems.
    NoteC: In this latest (2019-Jan-13) compile, clock() was replaced with time() - to counter bigtime stats misreporting.
    Current priority class is REALTIME_PRIORITY_CLASS.
    Allocating Source-Buffer 3 MB ...
    Allocating Source-Buffer 3 MB (REVERSED) ...
    Allocating Target-Buffer 35 MB ...
    Allocating Verification-Buffer 3 MB ...
    Leprechaun: Memory pool for B-tress is 1,612,048 KB.
    Leprechaun: In this revision 8MB 1-way hash is used which results in 1,048,576 internal B-Trees of order 3.
    Leprechaun: In this revision, 1 pass is to be made.
    Leprechaun: Allocating HASH memory 8,388,673 bytes ... OK
    Leprechaun: Allocating memory for B-tress 1575 MB ... OK
    Leprechaun: Size of input file: 3,714,387
    
    Leprechaun: Inserting keys/BBs of order 004 into B-trees, free RAM in B-tree pool is 1,612,048 KB ... OK
    Leprechaun: Number Of Distinct keys/BBs (all orders): 59,081
    Leprechaun: Number Of TREEs(GREATER THE BETTER): 54,297
    Leprechaun: Number Of LEAFs(littler THE BETTER) not counting ROOT LEAFs: 673
    Leprechaun: Highest Tree not counting ROOT Level i.e. CORONA levels(littler THE BETTER): 1
    Leprechaun: Used value for third parameter in KB: 1,612,048
    Leprechaun: Use next time as third parameter: 6,979
    Leprechaun: Total Attempts to Find/Put keys into B-trees order 3: 266,958
    
    Leprechaun: Inserting keys/BBs of order 006 into B-trees, free RAM in B-tree pool is 1,605,069 KB ... OK
    Leprechaun: Number Of Distinct keys/BBs (all orders): 432,623
    Leprechaun: Number Of TREEs(GREATER THE BETTER): 307,140
    Leprechaun: Number Of LEAFs(littler THE BETTER) not counting ROOT LEAFs: 51,165
    Leprechaun: Highest Tree not counting ROOT Level i.e. CORONA levels(littler THE BETTER): 2
    Leprechaun: Used value for third parameter in KB: 1,612,048
    Leprechaun: Use next time as third parameter: 45,574
    Leprechaun: Total Attempts to Find/Put keys into B-trees order 3: 1,066,896
    
    Leprechaun: Inserting keys/BBs of order 008 into B-trees, free RAM in B-tree pool is 1,566,474 KB ... OK
    Leprechaun: Number Of Distinct keys/BBs (all orders): 1,473,624
    Leprechaun: Number Of TREEs(GREATER THE BETTER): 739,168
    Leprechaun: Number Of LEAFs(littler THE BETTER) not counting ROOT LEAFs: 368,120
    Leprechaun: Highest Tree not counting ROOT Level i.e. CORONA levels(littler THE BETTER): 2
    Leprechaun: Used value for third parameter in KB: 1,612,048
    Leprechaun: Use next time as third parameter: 147,155
    Leprechaun: Total Attempts to Find/Put keys into B-trees order 3: 2,509,474
    
    Leprechaun: Inserting keys/BBs of order 010 into B-trees, free RAM in B-tree pool is 1,464,893 KB ... OK
    Leprechaun: Number Of Distinct keys/BBs (all orders): 3,304,694
    Leprechaun: Number Of TREEs(GREATER THE BETTER): 910,611
    Leprechaun: Number Of LEAFs(littler THE BETTER) not counting ROOT LEAFs: 1,234,656
    Leprechaun: Highest Tree not counting ROOT Level i.e. CORONA levels(littler THE BETTER): 3
    Leprechaun: Used value for third parameter in KB: 1,612,048
    Leprechaun: Use next time as third parameter: 325,664
    Leprechaun: Total Attempts to Find/Put keys into B-trees order 3: 5,424,374
    
    Leprechaun: Inserting keys/BBs of order 012 into B-trees, free RAM in B-tree pool is 1,286,384 KB ... OK
    Leprechaun: Number Of Distinct keys/BBs (all orders): 5,811,731
    Leprechaun: Number Of TREEs(GREATER THE BETTER): 1,043,037
    Leprechaun: Number Of LEAFs(littler THE BETTER) not counting ROOT LEAFs: 1,806,063
    Leprechaun: Highest Tree not counting ROOT Level i.e. CORONA levels(littler THE BETTER): 3
    Leprechaun: Used value for third parameter in KB: 1,612,048
    Leprechaun: Use next time as third parameter: 571,761
    Leprechaun: Total Attempts to Find/Put keys into B-trees order 3: 8,813,528
    
    Leprechaun: Inserting keys/BBs of order 014 into B-trees, free RAM in B-tree pool is 1,040,287 KB ... OK
    Leprechaun: Number Of Distinct keys/BBs (all orders): 8,795,024
    Leprechaun: Number Of TREEs(GREATER THE BETTER): 1,047,790
    Leprechaun: Number Of LEAFs(littler THE BETTER) not counting ROOT LEAFs: 2,339,050
    Leprechaun: Highest Tree not counting ROOT Level i.e. CORONA levels(littler THE BETTER): 3
    Leprechaun: Used value for third parameter in KB: 1,612,048
    Leprechaun: Use next time as third parameter: 869,314
    Leprechaun: Total Attempts to Find/Put keys into B-trees order 3: 12,690,670
    
    Leprechaun: Inserting keys/BBs of order 016 into B-trees, free RAM in B-tree pool is 742,734 KB ... OK
    Leprechaun: Number Of Distinct keys/BBs (all orders): 12,078,810
    Leprechaun: Number Of TREEs(GREATER THE BETTER): 1,048,551
    Leprechaun: Number Of LEAFs(littler THE BETTER) not counting ROOT LEAFs: 2,574,284
    Leprechaun: Highest Tree not counting ROOT Level i.e. CORONA levels(littler THE BETTER): 3
    Leprechaun: Used value for third parameter in KB: 1,612,048
    Leprechaun: Use next time as third parameter: 1,196,224
    Leprechaun: Total Attempts to Find/Put keys into B-trees order 3: 16,700,443
    
    Leprechaun: Inserting keys/BBs of order 018 into B-trees, free RAM in B-tree pool is 415,824 KB ... OK
    Leprechaun: Number Of Distinct keys/BBs (all orders): 15,542,161
    Leprechaun: Number Of TREEs(GREATER THE BETTER): 1,048,573
    Leprechaun: Number Of LEAFs(littler THE BETTER) not counting ROOT LEAFs: 2,700,527
    Leprechaun: Highest Tree not counting ROOT Level i.e. CORONA levels(littler THE BETTER): 3
    Leprechaun: Used value for third parameter in KB: 1,612,048
    Leprechaun: Use next time as third parameter: 1,539,067
    Leprechaun: Total Attempts to Find/Put keys into B-trees order 3: 20,645,236
    
    ...

    In parallel I am running a medium size (for now) TEXTORAMIC (as the Japanorama documentary, but adjectival Textorama) benchmark:
    TEXTORAMIC_Decompression_Showdown_2018-Dec-31.pdf
     

    Attached Files:

  2. Sanmayce

    Sanmayce New Member

    17
    7
    3
    Jun 9, 2018
    Sofia
    Ratings:
    +22
    Local Time:
    6:40 AM
    Glad to share (attached) a revision compilable with GCC. On Windows, after installing the portable (ready-to-go) MinGW the compile lines are:

    Code (Text):
    C:\>cd MinGW
    
    C:\MinGW>dir
     Volume in drive C has no label.
     Volume Serial Number is 7E3E-0444
    
     Directory of C:\MinGW
    
    05/03/2018  05:11 PM    <DIR>          .
    05/03/2018  05:11 PM    <DIR>          ..
    05/03/2018  05:08 PM    <DIR>          bin
    05/03/2018  05:09 PM    <DIR>          git
    05/03/2018  05:11 PM    <DIR>          include
    05/03/2018  05:11 PM    <DIR>          lib
    05/03/2018  05:11 PM    <DIR>          libexec
    03/18/2012  09:45 PM               709 open_distro_window.bat
    02/26/2018  10:34 PM            18,216 README_STL.txt
    05/03/2018  05:11 PM    <DIR>          scripts-15.4
    07/30/2017  05:05 PM               780 set_distro_paths.bat
    05/03/2018  05:11 PM    <DIR>          x86_64-w64-mingw32
    
    C:\MinGW>set_distro_paths.bat
    
    C:\MinGW>gcc -v
    Using built-in specs.
    COLLECT_GCC=gcc
    COLLECT_LTO_WRAPPER=c:/mingw/bin/../libexec/gcc/x86_64-w64-mingw32/7.3.0/lto-wrapper.exe
    Target: x86_64-w64-mingw32
    Configured with: ../src/configure --enable-languages=c,c++ --build=x86_64-w64-mingw32 --host=x86_64-w64-mingw32 --target=x86_64-w64-mingw32 --disable-multilib --prefix=/c/temp/gcc/dest --with-sysroot=/c/temp/gcc/dest --disable-libstdcxx-pch --disable-libstdcxx-verbose --disable-nls --disable-shared --disable-win32-registry --with-tune=haswell --enable-threads=posix --enable-libgomp
    Thread model: posix
    gcc version 7.3.0 (GCC)
    
    E:\N>dir
    
    11/14/2018  11:29 AM            55,397 lzsse2.cpp
    11/14/2018  11:29 AM             4,858 lzsse2.h
    11/14/2018  11:29 AM             2,481 lzsse2_platform.h
    01/13/2019  01:22 PM           634,462 Nakamichi_Ryuugan-ditto-1TB_btree.c
    01/14/2019  08:21 AM               797 MakeEXEs_Ryuugan-ditto-1TB_btree_GCC.bat
    
    D:\N1TB_bigtime_b-tree>MakeEXEs_Ryuugan-ditto-1TB_btree_GCC.bat
    
    D:\N1TB_bigtime_b-tree>gcc -O3 -msse4.1 -fomit-frame-pointer Nakamichi_Ryuugan-ditto-1TB_btree.c -o Nakamichi_Ryuugan-ditto-1TB_RAM_GCC730.exe -D_N_XMM -D_N_prefetch_4096 -D_N_alone -D_N_HIGH_PRIORITY -DHashInBITS=20 -DHashChunkSizeInBITS=20 -DRAMpoolInKB=4612048
    
    D:\N1TB_bigtime_b-tree>gcc -O3 -msse4.1 -fomit-frame-pointer Nakamichi_Ryuugan-ditto-1TB_btree.c -o Nakamichi_Ryuugan-ditto-1TB_SSD_GCC730.exe -D_N_XMM -D_N_prefetch_4096 -D_N_alone -D_N_HIGH_PRIORITY -DHashInBITS=20 -DHashChunkSizeInBITS=20 -DRAMpoolInKB=4612048 -DExternalRAM
    
    For *nix add:
    -D_POSIX_ENVIRONMENT_


    Above two lines generate an 'SSD' and 'RAM' executables, below, the 'SSD' is under inspection.

    Wanted, as a side-effect i.e. by-product, to have a real-world SSD random read/write of small packets benchmarker...
    Here comes such torturer of 130 bytes long packets, reading-n-writing randomly, it reports real IOPS since it counts the real invocations of each 'fread()' and 'fwrite()' function.

    The testmachine is my laptop with i5-7200u, the SSD is 240GB Kingston A400 (cheapest of all) and connected via USB 2.0, Windows caching has gone out of ... the window:

    Edit 2019-Jan-17:
    Grmbl! I'm sorry for the wrong statistics posted few days ago, had to replace them.
    The culprit was my forgetfulness, did forget that when final traversal was done the pointers were zeroed, it was on purpose back then, nothing wrong actually.

    The first example, the testdataset is '1001 nights' ~15MB long file:
    The "score" is 163,328 IOPS for 402,619,834 'freads()' and 268,170,926 'fwrites()' (of packets 50..106 bytes long) during loading traversing all orders. Roughly, 1.5:1 read vs write operations. For realistic results, the filesize should be > 512MB.

    Code (Text):
    C:\Windows\system32>cd\
    
    C:\>cd MinGW
    
    C:\MinGW>set_distro_paths.bat
    
    D:\Nakamichi_Ryuugan-ditto-1TB_bigtime_b-tree_2019-Jan-20>MakeEXEs_Ryuugan-ditto-1TB_btree_GCC.bat
    
    D:\Nakamichi_Ryuugan-ditto-1TB_bigtime_b-tree_2019-Jan-20>gcc -O3 -msse4.1 -fomit-frame-pointer Nakamichi_Ryuugan-ditto-1TB_btree.c -o Nakamichi_Ryuugan-ditto-1TB_RAM_GCC730.exe -D_N_XMM -D_N_prefetch_4096 -D_N_alone -D_N_HI
    GH_PRIORITY -DHashInBITS=20 -DHashChunkSizeInBITS=20 -DRAMpoolInKB=6612048
    
    D:\Nakamichi_Ryuugan-ditto-1TB_bigtime_b-tree_2019-Jan-20>gcc -O3 -msse4.1 -fomit-frame-pointer Nakamichi_Ryuugan-ditto-1TB_btree.c -o Nakamichi_Ryuugan-ditto-1TB_SSD_GCC730.exe -D_N_XMM -D_N_prefetch_4096 -D_N_alone -D_N_HI
    GH_PRIORITY -DHashInBITS=20 -DHashChunkSizeInBITS=20 -DRAMpoolInKB=6612048 -DExternalRAM
    
    D:\Nakamichi_Ryuugan-ditto-1TB_bigtime_b-tree_2019-Jan-20>Nakamichi_Ryuugan-ditto-1TB_RAM_GCC730.exe Arabian_Nights_complete.html
    Nakamichi 'Ryuugan-ditto-1TB', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced, muffinesque suggestion by Jim Dempsey enforced.
    Note0: Nakamichi 'Dragoneye' is 100% FREE, licenseless that is.
    Note1: Hamid Buzidi's LzTurbo ([a] FASTEST [Textual] Decompressor, Levels 19/29/39) retains kingship, his TurboBench (2017-Apr-07) proves the supremacy of LzTurbo, Turbo-Amazing!
    Note2: Conor Stokes' LZSSE2 ([a] FASTEST Textual Decompressor, Level 17) is embedded, all credits along with many thanks go to him.
    Note3: 'Ryuugan' predecessors are Washigan, Okamigan, Zato, Tsubame, Tengu-Tsuyo, Tengu, Rakka, Kokuen, Kinroba, Yoko, Kinutora, Jiten, Butsuhira, Suiken, Keigan, Kumataka, Washi, Aratama, Hitomi, Nekomata, Kitsune, Kinezumi, Sanbashi, Kaiko, Inazuma, Zangetsu, Hanabi, Hanazakari, Sanshi and Kaidanji.
    Note4: This variant is the SLOWEST compressor under the Sun! This compile can handle files up to 5120MB.
    Note5: The matchfinder is either 'Railgun_Trolldom' (matches longer than 18) or Leprechaun's B-tree order 3 (matches less or equal to 18).
    Note6: Instead of '_mm_loadu_si128' '_mm_lddqu_si128' is used.
    Note7: The lookahead 'Tsuyo' heuristic which looks one char ahead is applied thrice, still not strengthened, though.
    Note8: The compile made 2017-Oct-22, the decompression time measuring is done in 16x8 passes choosing the top score from 64 back-to-back runs - the goal - to enter [maximal] Turbo Mode.
    Note9: In GP/SSE4.1/AVX2 compile, the 24 matches become 3xQWORD/1xQWORD+1xXMMWORD/1xYMMWORD.
    NoteA: Maximum compression ratio is 44:1, for 704 bytes long matches within 1TB Sliding Window.
    NoteB: Please send me (at sanmayce@sanmayce.com) decompression results obtained on machines with fast CPU-RAM subsystems.
    NoteC: In this latest (2019-Jan-20) compile, clock() was replaced with time() - to counter bigtime stats misreporting.
    NoteD: Multi-way hashing allows each KeySize to occupy its own HASH pool, thus less RAM is in use - the LEAF is smaller.
    Current priority class is REALTIME_PRIORITY_CLASS.
    Allocating Source-Buffer 14 MB ...
    Allocating Source-Buffer 14 MB (REVERSED) ...
    Allocating Target-Buffer 46 MB ...
    Allocating Verification-Buffer 14 MB ...
    Leprechaun: Memory pool for B-tress is 6,612,048 KB.
    Leprechaun: In this revision 8MB 8-way hash is used which results in 8 x 1,048,576 internal B-Trees of order 3.
    Leprechaun: In this revision, 1 pass is to be made.
    Leprechaun: Allocating HASH memory 67,108,929 bytes ... OK
    Leprechaun: Allocating memory for B-tress 6458 MB ... OK
    Leprechaun: Size of input file: 15,583,440
    
    Leprechaun: Inserting keys/BBs of order 004 into B-trees, free RAM in B-tree pool is 6,612,048 KB ...
    Leprechaun: Inserting keys/BBs of order 006 into B-trees, free RAM in B-tree pool is 6,604,089 KB ...
    Leprechaun: Inserting keys/BBs of order 008 into B-trees, free RAM in B-tree pool is 6,550,882 KB ...
    Leprechaun: Inserting keys/BBs of order 010 into B-trees, free RAM in B-tree pool is 6,388,585 KB ...
    Leprechaun: Inserting keys/BBs of order 012 into B-trees, free RAM in B-tree pool is 6,055,645 KB ...
    Leprechaun: Inserting keys/BBs of order 014 into B-trees, free RAM in B-tree pool is 5,532,175 KB ...
    Leprechaun: Inserting keys/BBs of order 016 into B-trees, free RAM in B-tree pool is 4,825,455 KB ...
    Leprechaun: Inserting keys/BBs of order 018 into B-trees, free RAM in B-tree pool is 3,954,416 KB ...
    
    Leprechaun: Number Of Total/Distinct/Undistinct keys/BBs (all orders): 124,667,440/55,177,480/69,489,960
    Leprechaun: Number Of TREEs(GREATER THE BETTER): 6,768,818
    Leprechaun: Number Of LEAFs(littler THE BETTER) not counting ROOT LEAFs: 34,942,972
    Leprechaun: Highest Tree not counting ROOT Level i.e. CORONA levels(littler THE BETTER): 4
    Leprechaun: Used value for B-trees pool in KB: 6,612,048
    Leprechaun: Use next time for B-trees pool in KB: 3,672,584
    Leprechaun: Total keys (all orders) into B-trees order 3 (during traversal/dump): 55,177,480
    Leprechaun: Total keys MISSES/HITS (all orders) into B-trees order 3: 55,177,480/69,489,960
    
    Leprechaun: B-trees building speed (counting ROOT LEAFs): 1,069,533 LEAFs/s
    Leprechaun: B-trees traverse speed (counting ROOT LEAFs): 5,958,827 LEAFs/s
    Leprechaun: Total Searches-n-Inserts Per Second: 2,770,387 SNIPS
    Leprechaun: RAM needed to house B-trees (relative to the file being ripped): 241N
    
    Compressing 15,583,440 bytes ...
    ^C
    
    D:\Nakamichi_Ryuugan-ditto-1TB_bigtime_b-tree_2019-Jan-20>Nakamichi_Ryuugan-ditto-1TB_ssd_GCC730.exe Arabian_Nights_complete.html
    Nakamichi 'Ryuugan-ditto-1TB', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced, muffinesque suggestion by Jim Dempsey enforced.
    Note0: Nakamichi 'Dragoneye' is 100% FREE, licenseless that is.
    Note1: Hamid Buzidi's LzTurbo ([a] FASTEST [Textual] Decompressor, Levels 19/29/39) retains kingship, his TurboBench (2017-Apr-07) proves the supremacy of LzTurbo, Turbo-Amazing!
    Note2: Conor Stokes' LZSSE2 ([a] FASTEST Textual Decompressor, Level 17) is embedded, all credits along with many thanks go to him.
    Note3: 'Ryuugan' predecessors are Washigan, Okamigan, Zato, Tsubame, Tengu-Tsuyo, Tengu, Rakka, Kokuen, Kinroba, Yoko, Kinutora, Jiten, Butsuhira, Suiken, Keigan, Kumataka, Washi, Aratama, Hitomi, Nekomata, Kitsune, Kinezumi, Sanbashi, Kaiko, Inazuma, Zangetsu, Hanabi, Hanazakari, Sanshi and Kaidanji.
    Note4: This variant is the SLOWEST compressor under the Sun! This compile can handle files up to 5120MB.
    Note5: The matchfinder is either 'Railgun_Trolldom' (matches longer than 18) or Leprechaun's B-tree order 3 (matches less or equal to 18).
    Note6: Instead of '_mm_loadu_si128' '_mm_lddqu_si128' is used.
    Note7: The lookahead 'Tsuyo' heuristic which looks one char ahead is applied thrice, still not strengthened, though.
    Note8: The compile made 2017-Oct-22, the decompression time measuring is done in 16x8 passes choosing the top score from 64 back-to-back runs - the goal - to enter [maximal] Turbo Mode.
    Note9: In GP/SSE4.1/AVX2 compile, the 24 matches become 3xQWORD/1xQWORD+1xXMMWORD/1xYMMWORD.
    NoteA: Maximum compression ratio is 44:1, for 704 bytes long matches within 1TB Sliding Window.
    NoteB: Please send me (at sanmayce@sanmayce.com) decompression results obtained on machines with fast CPU-RAM subsystems.
    NoteC: In this latest (2019-Jan-20) compile, clock() was replaced with time() - to counter bigtime stats misreporting.
    NoteD: Multi-way hashing allows each KeySize to occupy its own HASH pool, thus less RAM is in use - the LEAF is smaller.
    Current priority class is REALTIME_PRIORITY_CLASS.
    Allocating Source-Buffer 14 MB ...
    Allocating Source-Buffer 14 MB (REVERSED) ...
    Allocating Target-Buffer 46 MB ...
    Allocating Verification-Buffer 14 MB ...
    Leprechaun: Memory pool for B-tress is 6,612,048 KB.
    Leprechaun: In this revision 8MB 8-way hash is used which results in 8 x 1,048,576 external B-Trees of order 3.
    Leprechaun: In this revision, 1 pass is to be made.
    Leprechaun: Allocating HASH memory 67,108,929 bytes ... OK
    Leprechaun: Allocating/ZEROing 6,770,737,166 bytes swap file ... OK
    Leprechaun: Size of input file: 15,583,440
    
    Leprechaun: Inserting keys/BBs of order 004 into B-trees, free RAM in B-tree pool is 6,612,048 KB ...
    Leprechaun: Inserting keys/BBs of order 006 into B-trees, free RAM in B-tree pool is 6,604,089 KB ...
    Leprechaun: Inserting keys/BBs of order 008 into B-trees, free RAM in B-tree pool is 6,550,882 KB ...
    Leprechaun: Inserting keys/BBs of order 010 into B-trees, free RAM in B-tree pool is 6,388,585 KB ...
    Leprechaun: Inserting keys/BBs of order 012 into B-trees, free RAM in B-tree pool is 6,055,645 KB ...
    Leprechaun: Inserting keys/BBs of order 014 into B-trees, free RAM in B-tree pool is 5,532,175 KB ...
    Leprechaun: Inserting keys/BBs of order 016 into B-trees, free RAM in B-tree pool is 4,825,455 KB ...
    Leprechaun: Inserting keys/BBs of order 018 into B-trees, free RAM in B-tree pool is 3,954,416 KB ...
    
    Leprechaun: Number Of Total/Distinct/Undistinct keys/BBs (all orders): 124,667,440/55,177,480/69,489,960
    Leprechaun: Number Of TREEs(GREATER THE BETTER): 6,768,818
    Leprechaun: Number Of LEAFs(littler THE BETTER) not counting ROOT LEAFs: 34,942,972
    Leprechaun: Highest Tree not counting ROOT Level i.e. CORONA levels(littler THE BETTER): 4
    Leprechaun: Used value for B-trees pool in KB: 6,612,048
    Leprechaun: Use next time for B-trees pool in KB: 3,672,584
    Leprechaun: Total keys (all orders) into B-trees order 3 (during traversal/dump): 55,177,480
    Leprechaun: Total keys MISSES/HITS (all orders) into B-trees order 3: 55,177,480/69,489,960
    
    Leprechaun: B-trees building speed (counting ROOT LEAFs): 13,322 LEAFs/s
    Leprechaun: B-trees traverse speed (counting ROOT LEAFs): 42,693 LEAFs/s
    Leprechaun: Total Searches-n-Inserts Per Second: 30,354 SNIPS
    Leprechaun: RAM needed to house B-trees (relative to the file being ripped): 241N
    Leprechaun: Total IOPS for 402,619,834 'freads' and 268,170,926 'fwrites' (of packets 106 bytes long) during loading traversing all orders: 163,328 IOPS
    
    Compressing 15,583,440 bytes ...
    ^C


    Had I have $999.99 I would buy the ideal choice for such virtual RAM loads - Samsung SSD 860 PRO 2.5" SATA III 4TB:
    Random Read (4KB, QD1): Up to 11,000 IOPS Random Read
    Random Write (4KB, QD1): Up to 43,000 IOPS Random Write
    [​IMG]
    Currently, I have the smallest model - Samsung 256GB 860 PRO, all the benchmarking done on this model will be more informative since it has the smallest cache - thus when the tested files are big enough they will torture the uncached regions.

    CACHE MEMORY
    512 MB Low Power DDR4 (256 GB, 512 GB)
    1 GB Low Power DDR4 (1,024 GB)
    2 GB Low Power DDR4 (2,048 GB)
    4 GB Low Power DDR4 (4,096 GB)

    What an atrocity, nowadays machines still go with 4GB Main RAM, while the drives already have 4GB?

    My eyes are locked onto future machines, long ago I decided to write C etudes disregarding the current conjuncture/nomenclature and utilize RAM and SSD space as if they are in spades! I hate [seeing] crippling some etude for sake of fitting in the status quo, think for a moment, what happens when next decade brings 1TB DIMMs or/and 16TB SSDs as mainstream! What will all those "blind" for the available resources tools/implementations do? Suck, as I see it.

    Too often, frightened people ask me why the scary evil elf is my logo/talisman, first off, every decent ripper doesn't sugarcoat - it rips brutally, hence the ... adequate choice of mine, moreover Leprechaun is a ruler of the forest/trees, he is a master-rhymer, a trickster, and mostly a brutal master of wizardcraft.

    Incoming_btree_ugly2.png
    Besides, the idiom "as hell" (slang, to the maximum degree) is unknown to them, usually.
    I find the naming convention for foreground/background programs in *nix very on point - the elves/daemons carry the spirit of magicalness.

    Incoming_btree_ugly3.png

    Ripping fast as hell, that's what I envision.

    Or as Tarja sings "The elf-folk is calling me":
    "The Elvenpath
    It's the honesty of these worlds
    Ruled by magic and mighty swords
    That makes my soul long for the past"

    Edit 2019-Jan-20:
    Few changes and a bugfix (a missed check for allocating External RAM) were done:

    The second example, the testdataset is 'Complete Dickens' ~41MB long file:
    The "score" is 67,632 IOPS for 1,320,112,595 'freads()' and 735,431,907 'fwrites()' (of packets 50..106 bytes long) during loading traversing all orders. Roughly, 1.8:1 read vs write operations. For realistic results, the filesize should be > 512MB.

    Code (Text):
    D:\Nakamichi_Ryuugan-ditto-1TB_bigtime_b-tree_2019-Jan-20>MakeEXEs_Ryuugan-ditto-1TB_btree_GCC.bat
    
    D:\Nakamichi_Ryuugan-ditto-1TB_bigtime_b-tree_2019-Jan-20>gcc -O3 -msse4.1 -fomit-frame-pointer Nakamichi_Ryuugan-ditto-1TB_btree.c -o Nakamichi_Ryuugan-ditto-1TB_RAM_GCC730.exe -D_N_XMM -D_N_prefetch_4096 -D_N_alone -D_N_HI
    GH_PRIORITY -DHashInBITS=20 -DHashChunkSizeInBITS=20 -DRAMpoolInKB=13612048
    
    D:\Nakamichi_Ryuugan-ditto-1TB_bigtime_b-tree_2019-Jan-20>gcc -O3 -msse4.1 -fomit-frame-pointer Nakamichi_Ryuugan-ditto-1TB_btree.c -o Nakamichi_Ryuugan-ditto-1TB_SSD_GCC730.exe -D_N_XMM -D_N_prefetch_4096 -D_N_alone -D_N_HI
    GH_PRIORITY -DHashInBITS=20 -DHashChunkSizeInBITS=20 -DRAMpoolInKB=13612048 -DExternalRAM
    
    D:\Nakamichi_Ryuugan-ditto-1TB_bigtime_b-tree_2019-Jan-20>Nakamichi_Ryuugan-ditto-1TB_ssd_GCC730.exe Complete_Works_of_Charles_Dickens.txt
    Nakamichi 'Ryuugan-ditto-1TB', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced, muffinesque suggestion by Jim Dempsey enforced.
    Note0: Nakamichi 'Dragoneye' is 100% FREE, licenseless that is.
    Note1: Hamid Buzidi's LzTurbo ([a] FASTEST [Textual] Decompressor, Levels 19/29/39) retains kingship, his TurboBench (2017-Apr-07) proves the supremacy of LzTurbo, Turbo-Amazing!
    Note2: Conor Stokes' LZSSE2 ([a] FASTEST Textual Decompressor, Level 17) is embedded, all credits along with many thanks go to him.
    Note3: 'Ryuugan' predecessors are Washigan, Okamigan, Zato, Tsubame, Tengu-Tsuyo, Tengu, Rakka, Kokuen, Kinroba, Yoko, Kinutora, Jiten, Butsuhira, Suiken, Keigan, Kumataka, Washi, Aratama, Hitomi, Nekomata, Kitsune, Kinezumi, Sanbashi, Kaiko, Inazuma, Zangetsu, Hanabi, Hanazakari, Sanshi and Kaidanji.
    Note4: This variant is the SLOWEST compressor under the Sun! This compile can handle files up to 5120MB.
    Note5: The matchfinder is either 'Railgun_Trolldom' (matches longer than 18) or Leprechaun's B-tree order 3 (matches less or equal to 18).
    Note6: Instead of '_mm_loadu_si128' '_mm_lddqu_si128' is used.
    Note7: The lookahead 'Tsuyo' heuristic which looks one char ahead is applied thrice, still not strengthened, though.
    Note8: The compile made 2017-Oct-22, the decompression time measuring is done in 16x8 passes choosing the top score from 64 back-to-back runs - the goal - to enter [maximal] Turbo Mode.
    Note9: In GP/SSE4.1/AVX2 compile, the 24 matches become 3xQWORD/1xQWORD+1xXMMWORD/1xYMMWORD.
    NoteA: Maximum compression ratio is 44:1, for 704 bytes long matches within 1TB Sliding Window.
    NoteB: Please send me (at sanmayce@sanmayce.com) decompression results obtained on machines with fast CPU-RAM subsystems.
    NoteC: In this latest (2019-Jan-20) compile, clock() was replaced with time() - to counter bigtime stats misreporting.
    NoteD: Multi-way hashing allows each KeySize to occupy its own HASH pool, thus less RAM is in use - the LEAF is smaller.
    Current priority class is REALTIME_PRIORITY_CLASS.
    Allocating Source-Buffer 40 MB ...
    Allocating Source-Buffer 40 MB (REVERSED) ...
    Allocating Target-Buffer 72 MB ...
    Allocating Verification-Buffer 40 MB ...
    Leprechaun: Memory pool for B-tress is 13,612,048 KB.
    Leprechaun: In this revision 8MB 8-way hash is used which results in 8 x 1,048,576 external B-Trees of order 3.
    Leprechaun: In this revision, 1 pass is to be made.
    Leprechaun: Allocating HASH memory 67,108,929 bytes ... OK
    Leprechaun: Allocating/ZEROing 13,938,737,166 bytes swap file ... OK
    Leprechaun: Size of input file: 42,935,960
    
    Leprechaun: Inserting keys/BBs of order 004 into B-trees, free RAM in B-tree pool is 13,612,048 KB ...
    Leprechaun: Inserting keys/BBs of order 006 into B-trees, free RAM in B-tree pool is 13,605,572 KB ...
    Leprechaun: Inserting keys/BBs of order 008 into B-trees, free RAM in B-tree pool is 13,546,308 KB ...
    Leprechaun: Inserting keys/BBs of order 010 into B-trees, free RAM in B-tree pool is 13,285,138 KB ...
    Leprechaun: Inserting keys/BBs of order 012 into B-trees, free RAM in B-tree pool is 12,604,428 KB ...
    Leprechaun: Inserting keys/BBs of order 014 into B-trees, free RAM in B-tree pool is 11,354,612 KB ...
    Leprechaun: Inserting keys/BBs of order 016 into B-trees, free RAM in B-tree pool is 9,485,965 KB ...
    Leprechaun: Inserting keys/BBs of order 018 into B-trees, free RAM in B-tree pool is 7,061,527 KB ...
    
    Leprechaun: Number Of Total/Distinct/Undistinct keys/BBs (all orders): 343,487,600/139,455,560/204,032,040
    Leprechaun: Number Of TREEs(GREATER THE BETTER): 6,869,619
    Leprechaun: Number Of LEAFs(littler THE BETTER) not counting ROOT LEAFs: 97,898,274
    Leprechaun: Highest Tree not counting ROOT Level i.e. CORONA levels(littler THE BETTER): 5
    Leprechaun: Used value for B-trees pool in KB: 13,612,048
    Leprechaun: Use next time for B-trees pool in KB: 9,440,253
    Leprechaun: Total keys (all orders) into B-trees order 3 (during traversal/dump): 139,455,560
    Leprechaun: Total keys MISSES/HITS (all orders) into B-trees order 3: 139,455,560/204,032,040
    
    Leprechaun: B-trees building speed (counting ROOT LEAFs): 5,116 LEAFs/s
    Leprechaun: B-trees traverse speed (counting ROOT LEAFs): 10,564 LEAFs/s
    Leprechaun: Total Searches-n-Inserts Per Second: 11,301 SNIPS
    Leprechaun: RAM needed to house B-trees (relative to the file being ripped): 225N
    Leprechaun: Total IOPS for 1,320,112,595 'freads' and 735,431,907 'fwrites' (of packets 106 bytes long) during loading traversing all orders: 67,632 IOPS
    
    Compressing 42,935,960 bytes ...
    ^C


    Looking at the stats above, maximum 5 random seeks/reads are needed to find a key into 139,455,560 keys. In this revision the LEAF size was reduced, more economically the RAM is used. Having IOPS is a must, but the more telling stats are SNIPS:
    Total Searches-n-Inserts Per Second: 11,301 SNIPS
    It says how many actual FINDINGS you can do.
     

    Attached Files:

    Last edited: Jan 21, 2019
  3. eva2000

    eva2000 Administrator Staff Member

    55,458
    12,257
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,841
    Local Time:
    2:40 PM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    thanks for sharing ?? do you have you project up on Github.com that we can follow ?
     
  4. Sanmayce

    Sanmayce New Member

    17
    7
    3
    Jun 9, 2018
    Sofia
    Ratings:
    +22
    Local Time:
    6:40 AM
    >thanks for sharing ?? do you have you project up on Github.com that we can follow ?

    My pleasure, sharing is better than receiving - an old truth is that.

    If you don't mind this thread will be the initial/development/benchmark page for Nakamichi featuring a B-tree matchfinder.
    Since I don't run a *nix system, it would be nice Nakamichi to have an alternative home on an appreciative *nix site.
    My development environment is Windows XP paired with Intel C compiler v15. I'm kinda disappointed by unappreciativeness shown left and right, most "programmers" don't let the numbers speak, optimizing basic etudes is ... underneath their gravity. Have had my bitter experience with some old "all knowing" PROs on a bigsite, they several times confronted and refused to make visible my simple etudes, the last one was my 256bit Assembly branchless LCSS - it failed to fit their bills. I said "So be it."

    As for GitHub, I used it only to share not to promote.
    To me, GitHub and such are for PROs, I will die an amateur, nothing can change that.

    Even now, before even mixing the B-trees into Compress() function, the C source from the previous post serves well as a benchmarker. That's what I love about practical etudes, they allow answering many questions, such as:
    Q: - How many times an SSD performance is lower than RAM?
    A: - Surprisingly, only 2,833,350 SNIPS / 38,101 SNIPS = 74x, for 'Arabian_Nights_complete.html' 15,583,440 bytes long.

    announce3.png
    The '1001 Nights' testdataset is downloadable at:
    The Book of The Thousand Nights and a Night


    For all who want to have a hardcopy of the source, this is mine, the latest DRAFT:
    Nakamichi_Ryuugan-ditto-1TB_btree_source.pdf
     
    Last edited: Jan 18, 2019
  5. Sanmayce

    Sanmayce New Member

    17
    7
    3
    Jun 9, 2018
    Sofia
    Ratings:
    +22
    Local Time:
    6:40 AM
    Gladly I'm sharing the first fully operational Nakamichi powered by my Leprechaunish B-tree matchfinder.

    Code (Text):
    C:\>cd MinGW
    
    C:\MinGW>set_distro_paths.bat
    
    C:\MinGW>d:
    
    D:\>cd Nakamichi_Ryuugan-ditto-1TB_bigtime_b-tree_2019-Feb-02_
    
    D:\Nakamichi_Ryuugan-ditto-1TB_bigtime_b-tree_2019-Feb-02_>MakeEXEs_Ryuugan-ditto-1TB_btree_GCC.bat
    
    D:\Nakamichi_Ryuugan-ditto-1TB_bigtime_b-tree_2019-Feb-02_>gcc -O3 -msse4.1 -fomit-frame-pointer Nakamichi_Ryuugan-ditto-1TB_btree.c -o Nakamichi_Ryuugan-ditto-1TB_RAM_GCC730.exe -D_N_XMM -D_N_prefetch_4096 -D_N_alone -D_N_HIGH_PRIORITY -DHashInBITS=24 -DHashChunkSizeInBITS=24  -DRAMpoolInKB=5612048 -DBtreeHEURISTIC
    
    D:\Nakamichi_Ryuugan-ditto-1TB_bigtime_b-tree_2019-Feb-02_>gcc -O3 -msse4.1 -fomit-frame-pointer Nakamichi_Ryuugan-ditto-1TB_btree.c -o Nakamichi_Ryuugan-ditto-1TB_SSD_GCC730.exe -D_N_XMM -D_N_prefetch_4096 -D_N_alone -D_N_HIGH_PRIORITY -DHashInBITS=24 -DHashChunkSizeInBITS=24  -DRAMpoolInKB=5612048 -DExternalRAM -DBtreeHEURISTIC
    
    For *nix add:
    -D_POSIX_ENVIRONMENT_
    
    
    D:\Nakamichi_Ryuugan-ditto-1TB_bigtime_b-tree_2019-Feb-02_>Nakamichi_Ryuugan-ditto-1TB_RAM_GCC730.exe Arabian_Nights_complete.html
    Nakamichi 'Ryuugan-ditto-1TB', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced, muffinesque suggestion by Jim Dempsey enforced.
    Note0: Nakamichi 'Dragoneye' is 100% FREE, licenseless that is.
    Note1: Hamid Buzidi's LzTurbo ([a] FASTEST [Textual] Decompressor, Levels 19/29/39) retains kingship, his TurboBench (2017-Apr-07) proves the supremacy of LzTurbo, Turbo-Amazing!
    Note2: Conor Stokes' LZSSE2 ([a] FASTEST Textual Decompressor, Level 17) is embedded, all credits along with many thanks go to him.
    Note3: 'Ryuugan' predecessors are Washigan, Okamigan, Zato, Tsubame, Tengu-Tsuyo, Tengu, Rakka, Kokuen, Kinroba, Yoko, Kinutora, Jiten, Butsuhira, Suiken, Keigan, Kumataka, Washi, Aratama, Hitomi, Nekomata, Kitsune, Kinezumi, Sanbashi, Kaiko, Inazuma, Zangetsu, Hanabi, Hanazakari, Sanshi and Kaidanji.
    Note4: This variant is the SLOWEST compressor under the Sun! This compile can handle files up to 5120MB.
    Note5: The matchfinder is either 'Railgun_Trolldom' (matches longer than 18) or Leprechaun's B-tree order 3 (matches less or equal to 18).
    Note6: Instead of '_mm_loadu_si128' '_mm_lddqu_si128' is used.
    Note7: The lookahead 'Tsuyo' heuristic which looks one char ahead is applied thrice, still not strengthened, though.
    Note8: The compile made 2017-Oct-22, the decompression time measuring is done in 16x8 passes choosing the top score from 64 back-to-back runs - the goal - to enter [maximal] Turbo Mode.
    Note9: In GP/SSE4.1/AVX2 compile, the 24 matches become 3xQWORD/1xQWORD+1xXMMWORD/1xYMMWORD.
    NoteA: Maximum compression ratio is 44:1, for 704 bytes long matches within 1TB Sliding Window.
    NoteB: Please send me (at sanmayce@sanmayce.com) decompression results obtained on machines with fast CPU-RAM subsystems.
    NoteC: In this compile, clock() was replaced with time() - to counter bigtime stats misreporting.
    NoteD: Multi-way hashing allows each KeySize to occupy its own HASH pool, thus less RAM is in use - the LEAF is smaller.
    NoteE: In this revision, B-tree heuristics are in use, allowing skipping many unnecessary memmem() invocations.
    NoteF: In this latest (2019-Feb-02) compile, B-tree is the primary matchfinder, memmem() is used for 18+ matches, and for marginal cases.
    NoteG: The file being compressed should be 18 bytes or longer due to Building-Blocks being in range 4..18.
    Current priority class is HIGH_PRIORITY_CLASS.
    Allocating Source-Buffer 14 MB ...
    Allocating Source-Buffer 14 MB (REVERSED) ...
    Allocating Target-Buffer 46 MB ...
    Allocating Verification-Buffer 14 MB ...
    Leprechaun: Memory pool for B-tress is 5,612,048 KB.
    Leprechaun: In this revision 128MB 8-way hash is used which results in 8 x 16,777,216 internal B-Trees of order 3.
    Leprechaun: In this revision, 1 pass is to be made.
    Leprechaun: Allocating HASH memory 1,073,741,889 bytes ... OK
    Leprechaun: Allocating memory for B-tress 5481 MB ... OK
    Leprechaun: Size of input file: 15,583,440
    
    Leprechaun: Inserting keys/BBs of order 004 into B-trees, free RAM in B-tree pool is 5,612,048 KB ...
    Leprechaun: Inserting keys/BBs of order 006 into B-trees, free RAM in B-tree pool is 5,602,086 KB ...
    Leprechaun: Inserting keys/BBs of order 008 into B-trees, free RAM in B-tree pool is 5,533,534 KB ...
    Leprechaun: Inserting keys/BBs of order 010 into B-trees, free RAM in B-tree pool is 5,328,114 KB ...
    Leprechaun: Inserting keys/BBs of order 012 into B-trees, free RAM in B-tree pool is 4,918,554 KB ...
    Leprechaun: Inserting keys/BBs of order 014 into B-trees, free RAM in B-tree pool is 4,288,798 KB ...
    Leprechaun: Inserting keys/BBs of order 016 into B-trees, free RAM in B-tree pool is 3,451,465 KB ...
    Leprechaun: Inserting keys/BBs of order 018 into B-trees, free RAM in B-tree pool is 2,435,992 KB ...
    Leprechaun: Total Searches-n-Inserts Per Second: 4,617,312 SNIPS
    Leprechaun: RAM needed to house B-trees (relative to the file being ripped): 285N
    
    Compressing 15,583,440 bytes ...
    /; Each rotation means 64KB are encoded; Speed: 0,011,577 B/s; Done 100%; Compression Ratio: 3.09:1; Matches(24/32/48): 163,277/39,583/2,722; 128[+] long matches: 959; ETA: 0.00 days
    NumberOfFullLiterals (lower-the-better): 749
    Tsuyo_HEURISTIC_APPLIED_thrice_back-to-back: 0
    NumberOf(Tiny)Matches[Micro]Window (4)[16B]: 5865
    NumberOfMatches[Bheema]Window [128GB window]: 1058
    RAM-to-RAM performance: 11577 B/s.
    Compressed to 5,039,053 bytes.
    Source-file-Hash(FNV1A_YoshimitsuTRIAD) = 0x065e,e94a
    Target-file-Hash(FNV1A_YoshimitsuTRIAD) = 0xb651,2c30
    Decompressing 5,039,053 (being the compressed stream) bytes ...
    RAM-to-RAM performance: 681 MB/s.
    Verification (input and output sizes match) OK.
    Verification (input and output blocks match) OK.
    
    D:\Nakamichi_Ryuugan-ditto-1TB_bigtime_b-tree_2019-Feb-02_>
    


    Right now waiting for SSD compile to finish, wanna report the SSD vs RAM actual speeds...

    Code (Text):
    D:\Nakamichi_Ryuugan-ditto-1TB_bigtime_b-tree_2019-Feb-02_>Nakamichi_Ryuugan-ditto-1TB_ssd_GCC730.exe Arabian_Nights_complete.html
    Nakamichi 'Ryuugan-ditto-1TB', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced, muffinesque suggestion by Jim Dempsey enforced.
    Note0: Nakamichi 'Dragoneye' is 100% FREE, licenseless that is.
    Note1: Hamid Buzidi's LzTurbo ([a] FASTEST [Textual] Decompressor, Levels 19/29/39) retains kingship, his TurboBench (2017-Apr-07) proves the supremacy of LzTurbo, Turbo-Amazing!
    Note2: Conor Stokes' LZSSE2 ([a] FASTEST Textual Decompressor, Level 17) is embedded, all credits along with many thanks go to him.
    Note3: 'Ryuugan' predecessors are Washigan, Okamigan, Zato, Tsubame, Tengu-Tsuyo, Tengu, Rakka, Kokuen, Kinroba, Yoko, Kinutora, Jiten, Butsuhira, Suiken, Keigan, Kumataka, Washi, Aratama, Hitomi, Nekomata, Kitsune, Kinezumi, Sanbashi, Kaiko, Inazuma, Zangetsu, Hanabi, Hanazakari, Sanshi and Kaidanji.
    Note4: This variant is the SLOWEST compressor under the Sun! This compile can handle files up to 5120MB.
    Note5: The matchfinder is either 'Railgun_Trolldom' (matches longer than 18) or Leprechaun's B-tree order 3 (matches less or equal to 18).
    Note6: Instead of '_mm_loadu_si128' '_mm_lddqu_si128' is used.
    Note7: The lookahead 'Tsuyo' heuristic which looks one char ahead is applied thrice, still not strengthened, though.
    Note8: The compile made 2017-Oct-22, the decompression time measuring is done in 16x8 passes choosing the top score from 64 back-to-back runs - the goal - to enter [maximal] Turbo Mode.
    Note9: In GP/SSE4.1/AVX2 compile, the 24 matches become 3xQWORD/1xQWORD+1xXMMWORD/1xYMMWORD.
    NoteA: Maximum compression ratio is 44:1, for 704 bytes long matches within 1TB Sliding Window.
    NoteB: Please send me (at sanmayce@sanmayce.com) decompression results obtained on machines with fast CPU-RAM subsystems.
    NoteC: In this compile, clock() was replaced with time() - to counter bigtime stats misreporting.
    NoteD: Multi-way hashing allows each KeySize to occupy its own HASH pool, thus less RAM is in use - the LEAF is smaller.
    NoteE: In this revision, B-tree heuristics are in use, allowing skipping many unnecessary memmem() invocations.
    NoteF: In this latest (2019-Feb-02) compile, B-tree is the primary matchfinder, memmem() is used for 18+ matches, and for marginal cases.
    NoteG: The file being compressed should be 18 bytes or longer due to Building-Blocks being in range 4..18.
    Current priority class is HIGH_PRIORITY_CLASS.
    Allocating Source-Buffer 14 MB ...
    Allocating Source-Buffer 14 MB (REVERSED) ...
    Allocating Target-Buffer 46 MB ...
    Allocating Verification-Buffer 14 MB ...
    Leprechaun: Memory pool for B-tress is 5,612,048 KB.
    Leprechaun: In this revision 128MB 8-way hash is used which results in 8 x 16,777,216 external B-Trees of order 3.
    Leprechaun: In this revision, 1 pass is to be made.
    Leprechaun: Allocating HASH memory 1,073,741,889 bytes ... OK
    Leprechaun: Allocating/ZEROing 5,746,737,166 bytes swap file ... OK
    Leprechaun: Size of input file: 15,583,440
    
    Leprechaun: Inserting keys/BBs of order 004 into B-trees, free RAM in B-tree pool is 5,612,048 KB ...
    Leprechaun: Inserting keys/BBs of order 006 into B-trees, free RAM in B-tree pool is 5,602,086 KB ...
    Leprechaun: Inserting keys/BBs of order 008 into B-trees, free RAM in B-tree pool is 5,533,534 KB ...
    Leprechaun: Inserting keys/BBs of order 010 into B-trees, free RAM in B-tree pool is 5,328,114 KB ...
    Leprechaun: Inserting keys/BBs of order 012 into B-trees, free RAM in B-tree pool is 4,918,554 KB ...
    Leprechaun: Inserting keys/BBs of order 014 into B-trees, free RAM in B-tree pool is 4,288,798 KB ...
    Leprechaun: Inserting keys/BBs of order 016 into B-trees, free RAM in B-tree pool is 3,451,465 KB ...
    Leprechaun: Inserting keys/BBs of order 018 into B-trees, free RAM in B-tree pool is 2,435,992 KB ...
    Leprechaun: Total Searches-n-Inserts Per Second: 83,222 SNIPS
    Leprechaun: RAM needed to house B-trees (relative to the file being ripped): 285N
    Leprechaun: Total IOPS for 109,387,689 'freads' and 140,066,348 'fwrites' (of packets 114 bytes long) during loading traversing all orders: 166,524 IOPS
    
    Compressing 15,583,440 bytes ...
    /; Each rotation means 64KB are encoded; Speed: 0,001,764 B/s; Done 100%; Compression Ratio: 3.09:1; Matches(24/32/48): 163,277/39,583/2,722; 128[+] long matches: 959; ETA: 0.00 days
    NumberOfFullLiterals (lower-the-better): 749
    Tsuyo_HEURISTIC_APPLIED_thrice_back-to-back: 0
    NumberOf(Tiny)Matches[Micro]Window (4)[16B]: 5865
    NumberOfMatches[Bheema]Window [128GB window]: 1058
    RAM-to-RAM performance: 1764 B/s.
    Compressed to 5,039,053 bytes.
    Source-file-Hash(FNV1A_YoshimitsuTRIAD) = 0x065e,e94a
    Target-file-Hash(FNV1A_YoshimitsuTRIAD) = 0xb651,2c30
    Decompressing 5,039,053 (being the compressed stream) bytes ...
    RAM-to-RAM performance: 681 MB/s.
    Verification (input and output sizes match) OK.
    Verification (input and output blocks match) OK.
    
    D:\Nakamichi_Ryuugan-ditto-1TB_bigtime_b-tree_2019-Feb-02_>


    Okay, the RAM (internal B-trees) vs SSD (external B-trees) speed ratios is 11,577 B/s : 1,764 B/s or 6.5:1, discrepancy is so minimal, especially with USB connected (not in super-speed) and write caching disabled cheapest 240GB model.

    Love people like Sir Burton, his immortal work stands, that is why these 16 volumes are one of my primary testdatasets:

    [​IMG]

    The full package is on my Internet Drive:
    Nakamichi_Ryuugan-ditto-1TB_bigtime_b-tree_2019-Feb-02.7z

    It simply opens the gate to teraheavy matchfinding :p
     

    Attached Files:

    Last edited: Feb 3, 2019
  6. Sanmayce

    Sanmayce New Member

    17
    7
    3
    Jun 9, 2018
    Sofia
    Ratings:
    +22
    Local Time:
    6:40 AM
    Some heavy trashing of my Samsung 860 PRO 256GB SSD commenced...
    In the PDF excerpt below 600+ million B-trees are rooted in the hash table (each slot is a root), if the machine has 85GB of system RAM then all the B-trees will be in the internal/physical not in the external/SSD RAM.

    TFD_booklet_half.png

    For better resolution of above image try this:
    www.sanmayce.com/Nakamichi/TFD_booklet.png

    To reiterate, Nakamichi is written in such a way that it is capable of utilizing internal&external RAM ... greedily.
    If the machine has 128GB of RAM then the 10 keysizes/matchlengths can be put into their own segments of the HASHPOT, e.g. if the hashsize=30 then the hash pool for a single keysize/matchlength is (8bytes x 2^30) = 8GB, for the 10 keysizes the total hash block is 80GB, it equals possible 10x1=10 billion B-trees - this means matches within 1GB long file can be found in maximum 2 attempts - the MAX height of the B-trees!

    As a side-effect I will torture (i.e. ripping apart) SSDs to their death, thus be instrumental in assessing the endurance of PRO SSDs, hopefully Samsung and heavy-duty users will thank me. I took a snapshot of the Samsung's status as reported by Sentinel before all the ripping and today:

    BEFORE_ALL_RUNS.png

    As you can see, a fortnight later the health went from 100% down to 96% :(

    BEFORE_ALL_RUNS_2019-feb-24.png

    TEXTORAMIC_Decompression_Showdown_2019-Feb-21.pdf:
    https://drive.google.com/file/d/162cikKQ3QDiXhUaz_uBJG9unXhoJWwTQ/view?usp=sharing

    Nakamichi_Ryuugan-ditto-1TB_btree_source.pdf:
    https://drive.google.com/file/d/1dFtfvpcE-TUo_D_Ol9C8FSDXlbzyIll9/view?usp=sharing
     
    Last edited: Feb 25, 2019
  7. Sanmayce

    Sanmayce New Member

    17
    7
    3
    Jun 9, 2018
    Sofia
    Ratings:
    +22
    Local Time:
    6:40 AM
    Glad to share the C source of latest-and-greatest Nakamichi with compile lines for *nix and Windows:
    Code (Text):
    E:\Nakamichi_2019-Jul-01>type _MakeELF_Nakamichi_GCC.sh
    gcc -O3 -static -msse4.1 -fomit-frame-pointer Nakamichi_Ryuugan-ditto-1TB_btree.c -o Nakamichi_Ryuugan-ditto-1TB_btree.elf -D_N_XMM -D_N_prefetch_4096 -D_N_alone -DHashInBITS=24 -DHashChunkSizeInBITS=24 -DRAMpool
    InKB=5120 -DBtreeHEURISTIC -D_POSIX_ENVIRONMENT_ -DLongestLineInclusive=64
    
    E:\Nakamichi_2019-Jul-01>
    

    After 4 months of thrashing, my SSD 860 PRO 2.5" SATA III 256GB still walks tall, Samsung claims 300TBW (TerabytesWritten) life expectancy, so far 114TBW are done - no errors detected, but the wearing level (Health Indicator) became 21%.

    The attached package contains:
    Code (Text):
    E:\Nakamichi_2019-Jul-01>dir
    
    07/01/2019  02:19 PM            12,897 Chunkerito.c
    07/01/2019  02:19 PM            30,286 Chunkerito.exe
    07/01/2019  02:19 PM           503,609 lzsse2.cod
    07/01/2019  02:19 PM            55,397 lzsse2.cpp
    07/01/2019  02:19 PM             4,858 lzsse2.h
    07/01/2019  02:19 PM             2,481 lzsse2_platform.h
    07/01/2019  02:19 PM             1,633 MokujIN Amber 224 prompt.lnk
    07/01/2019  02:19 PM             1,775 MokujIN CYAN 224 prompt.lnk
    07/01/2019  02:19 PM             1,633 MokujIN GREEN 224 prompt.lnk
    07/01/2019  02:19 PM               576 Nakamichify_16MB.bat
    07/01/2019  02:19 PM           648,142 Nakamichi_Ryuugan-ditto-1TB_btree.c
    07/01/2019  02:19 PM           197,120 Nakamichi_Ryuugan-ditto-1TB_btree.exe
    07/01/2019  02:19 PM               304 _MakeELF_Chunkerito_GCC.sh
    07/01/2019  02:19 PM               494 _MakeELF_Nakamichi_GCC.sh
    07/01/2019  02:19 PM                92 _MakeEXE_Chunkerito_GCC.bat
    07/01/2019  02:19 PM               284 _MakeEXE_Nakamichi_GCC.bat
    07/01/2019  02:19 PM               249 _MakeEXE_Nakamichi_ICL.bat
    
    E:\Nakamichi_2019-Jul-01>
    

    These days very interesting heavy DNA benchmarks are prepared with Nakamichi in the mix, more info/charts on that topic incoming...

    By the way, the homepage (under development) of the people's microdeduplicator:
    The Zennish Microdeduplicator: Home of Nakamichi

    After 10 hours or so will share the results for George's 'kernel' benchmark... with lzbench and turbobench and Nakamichi console log... the latter decompresses kinda poorly at 1,100 MB/s on laptop with 3rd gen (Ivy Bridge) and 16GB DDR3.

    In the meantime let me just throw a thought of mine regarding the compression strength on a tiny supercomputer having 3TB RAM.
    The ability of Nakamichi to utilize (!) the internal-n-external RAM is unmatched, for example the 'Salamander' genome, AFAIK, is 32GB, in order to decompress FAST this file on a 64GB machine later, having superb compression ratio, Nakamichi comes screaming with its solid compression - those 32GB are seen as one block/window, my expectations are that only Hamid's LzTurbo 29 can beat it, to be seen...

    Next INSANE example shows how to compress 32GB DNA file into physical RAM, using (2^30hashslots) x 8bytes x 10hashpools = 80GB hash and 3+ million megabytes for B-trees memory pool (for DNA the needed memory is ~90N, N being the size of uncompressed file).
    Code (Text):
    Nakamichi.elf Salamander Salamander.Nakamichi 30 3123456 i
    

    And for those who want to participate in 'Salamander' benchmark:
    ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/915/635/GCA_002915635.2_ASM291563v2/GCA_002915635.2_ASM291563v2_genomic.fna.gz

    GCA_002915635.2_ASM291563v2_genomic.fna.gz 7.70 GB (8,274,496,729 bytes)
    GCA_002915635.2_ASM291563v2_genomic.fna 30.5 GB (32,810,886,576 bytes)

    Salamanders - SUPER RESISTANT TO CANCER! Masters of Regeneration!

    [​IMG]
    These cuties are nearly magical, we should look into their DNA.

    Regeneration
    "The feature of the salamander that attracts most attention is its healing ability:..."

    Genome
    "The 32 billion base pair long sequence of the axolotl's genome was published in 2018 and is the largest animal genome completed so far. It revealed species-specific genetic pathways that may be responsible for limb regeneration.[24] Although the axolotl genome is about 10 times as large as the human genome, it encodes a similar number of proteins, namely 23,251[24] (the human genome encodes about 20,000 proteins)."
    Source: https://en.wikipedia.org/wiki/Axolotl

    Another beautiful aspect of Nakamichi is compressing with arbitrary blocksize - EXPLOITING the fact of RAWNESS of .Nakamichi "format", meaning all .Nakamichi files can be concatenated with binary copy as copy/b *.Nakamichi MONOLITH

    Now Nakamichi has unlimited number of compressing modes, even more than 22 :p

    "With 6x 512GB DIMMs per socket, that means that systems will have up to 3TB of Optane per socket..."
    Next-Gen Intel Cascade Lake 28C Supporting 3.84TB Memory

    [​IMG]

    Simply, Nakamichi is the first, known to me, 100% FREE, open-source utility to enter the tiny supercomputer realm :p

    Is out there another compressor able to see more than 2GB backwards?!
     

    Attached Files:

    Last edited: Jul 2, 2019
  8. Sanmayce

    Sanmayce New Member

    17
    7
    3
    Jun 9, 2018
    Sofia
    Ratings:
    +22
    Local Time:
    6:40 AM
    Nice, very informative tables and a must-see bunch of standard performers at:
    Sysadmin - Round 3: Compression Comparison Benchmarks: zstd vs brotli vs pigz vs bzip2 vs xz etc

    Allow me to complement this thread with my runs on i7-3630QM 3.40 GHz:

    Some strong compressors for good measure:
    Code (Text):
     72,961,424 linux-5.1-rc5.tar.method511.zpaq    ! "zpaq_v7.05_x64.exe" add linux-5.1-rc5.tar.method511.zpaq linux-5.1-rc5.tar -method 511 -threads 1 !
    101,250,614 linux-5.1-rc5.tar.O16.PPMd_varI     ! PPMd_varI_rev2_Intel15_32bit.exe e -o16 -m256 -flinux-5.1-rc5.tar.O16.PPMd_varI linux-5.1-rc5.tar !
    101,345,038 linux-5.1-rc5.tar.ST6Block1024.bsc  ! "bsc_v3.1.0_x64.exe" e linux-5.1-rc5.tar linux-5.1-rc5.tar.ST6Block1024.bsc -b1024 -m6 -cp -Tt !
    102,599,972 linux-5.1-rc5.tar.L9Dict1024.xz     ! "xz_v5.2.3_x64.exe" -z -k -f -9 -e -v -v --lzma2=dict=1024MiB --threads=1 linux-5.1-rc5.tar !
    105,936,267 linux-5.1-rc5.tar.2GB.L22.zst       ! zstd-v1.4.0-win64.exe --ultra -22 --zstd=wlog=31,clog=30,hlog=30,slog=26 linux-5.1-rc5.tar !
    116,202,373 linux-5.1-rc5.tar.rar560_m5_m1g     ! rar-x64-560.exe a -m5 -ma5 -md1g linux-5.1-rc5.tar.rar560_m5_m1g linux-5.1-rc5.tar !
    134,096,935 linux-5.1-rc5.tar.method211.zpaq    ! "zpaq_v7.05_x64.exe" add linux-5.1-rc5.tar.method211.zpaq linux-5.1-rc5.tar -method 211 -threads 1 !
    871,557,120 linux-5.1-rc5.tar
    

    The built-in benchmark of latest Zstd:
    Code (Text):
    F:\TEXTORAMIC_benchmarking_2019-Apr-29>zstd-v1.4.0-win64.exe -b1e22 -i9 --priority=rt "linux-5.1-rc5.tar"
    Note : switching to real-time priority
     1#linux-5.1-rc5.tar : 871557120 -> 177235953 (4.917), 307.6 MB/s , 875.9 MB/s
    Note : switching to real-time priority
     2#linux-5.1-rc5.tar : 871557120 -> 168071052 (5.186), 267.4 MB/s , 834.8 MB/s
    Note : switching to real-time priority
     3#linux-5.1-rc5.tar : 871557120 -> 158346401 (5.504), 235.7 MB/s , 876.5 MB/s
    Note : switching to real-time priority
     4#linux-5.1-rc5.tar : 871557120 -> 157781348 (5.524), 214.5 MB/s , 858.2 MB/s
    Note : switching to real-time priority
     5#linux-5.1-rc5.tar : 871557120 -> 148663028 (5.863), 120.5 MB/s , 841.7 MB/s
    Note : switching to real-time priority
     6#linux-5.1-rc5.tar : 871557120 -> 143969084 (6.054),  95.5 MB/s , 873.5 MB/s
    Note : switching to real-time priority
     7#linux-5.1-rc5.tar : 871557120 -> 137361139 (6.345),  72.0 MB/s , 936.1 MB/s
    Note : switching to real-time priority
     8#linux-5.1-rc5.tar : 871557120 -> 135193716 (6.447),  58.4 MB/s , 966.1 MB/s
    Note : switching to real-time priority
     9#linux-5.1-rc5.tar : 871557120 -> 133409131 (6.533),  42.0 MB/s , 983.7 MB/s
    Note : switching to real-time priority
    10#linux-5.1-rc5.tar : 871557120 -> 131025317 (6.652),  33.1 MB/s , 988.6 MB/s
    Note : switching to real-time priority
    11#linux-5.1-rc5.tar : 871557120 -> 130252372 (6.691),  25.9 MB/s , 989.5 MB/s
    Note : switching to real-time priority
    12#linux-5.1-rc5.tar : 871557120 -> 128982813 (6.757),  17.5 MB/s ,1004.0 MB/s
    Note : switching to real-time priority
    13#linux-5.1-rc5.tar : 871557120 -> 126940428 (6.866),  11.5 MB/s ,1015.5 MB/s
    Note : switching to real-time priority
    14#linux-5.1-rc5.tar : 871557120 -> 125665841 (6.936),  9.42 MB/s ,1019.5 MB/s
    Note : switching to real-time priority
    15#linux-5.1-rc5.tar : 871557120 -> 124450553 (7.003),  6.85 MB/s ,1016.7 MB/s
    Note : switching to real-time priority
    16#linux-5.1-rc5.tar : 871557120 -> 118626173 (7.347),  5.65 MB/s , 995.7 MB/s
    Note : switching to real-time priority
    17#linux-5.1-rc5.tar : 871557120 -> 115473642 (7.548),  4.36 MB/s , 970.8 MB/s
    Note : switching to real-time priority
    18#linux-5.1-rc5.tar : 871557120 -> 114185321 (7.633),  3.29 MB/s , 930.4 MB/s
    Note : switching to real-time priority
    19#linux-5.1-rc5.tar : 871557120 -> 112606029 (7.740),  2.38 MB/s , 896.8 MB/s
    Note : switching to real-time priority
    20#linux-5.1-rc5.tar : 871557120 -> 110148832 (7.913),  2.15 MB/s , 811.6 MB/s
    Note : switching to real-time priority
    21#linux-5.1-rc5.tar : 871557120 -> 108930061 (8.001),  1.79 MB/s , 803.0 MB/s
    Note : switching to real-time priority
    22#linux-5.1-rc5.tar : 871557120 -> 107836731 (8.082),  1.51 MB/s , 794.3 MB/s
    

    The built-in benchmark of latest lz4:
    Code (Text):
    F:\TEXTORAMIC_benchmarking_2019-Apr-29>lz4_v1_9_0_win64.exe -b1e12 -i9 --no-frame-crc linux-5.1-rc5.tar
    Benchmarking levels from 1 to 12
     1#linux-5.1-rc5.tar : 871557120 -> 269047566 (3.239), 466.0 MB/s ,2986.6 MB/s
     2#linux-5.1-rc5.tar : 871557120 -> 269047566 (3.239), 466.2 MB/s ,2989.4 MB/s
     3#linux-5.1-rc5.tar : 871557120 -> 202240169 (4.310),  99.0 MB/s ,3092.6 MB/s
     4#linux-5.1-rc5.tar : 871557120 -> 198635531 (4.388),  82.5 MB/s ,3122.5 MB/s
     5#linux-5.1-rc5.tar : 871557120 -> 196628907 (4.432),  67.7 MB/s ,3142.6 MB/s
     6#linux-5.1-rc5.tar : 871557120 -> 195557111 (4.457),  55.6 MB/s ,3155.5 MB/s
     7#linux-5.1-rc5.tar : 871557120 -> 195019358 (4.469),  46.0 MB/s ,3164.1 MB/s
     8#linux-5.1-rc5.tar : 871557120 -> 194779489 (4.475),  38.7 MB/s ,3175.7 MB/s
     9#linux-5.1-rc5.tar : 871557120 -> 194637129 (4.478),  33.6 MB/s ,3176.3 MB/s
    10#linux-5.1-rc5.tar : 871557120 -> 193611216 (4.502),  24.9 MB/s ,3249.5 MB/s
    11#linux-5.1-rc5.tar : 871557120 -> 192756776 (4.522),  15.4 MB/s ,3291.7 MB/s
    12#linux-5.1-rc5.tar : 871557120 -> 192607909 (4.525),  14.8 MB/s ,3266.2 MB/s
    

    The powturbo's awesome TurboBench:
    Code (Text):
    F:\TEXTORAMIC_benchmarking_2019-Apr-29>"turbobench_v18.05_-_build_04_May_2018" linux-5.1-rc5.tar -ebzip2/lzma,9d30:fb273:mf=bt4/oodle,89,91,95,99,111,115,119,129,139/lzturbo,19,12,10,29,22,20,39,32,30,59/brotli,11d30/trle -I3 -J31 -k1 -B2G
          C Size  ratio%     C MB/s     D MB/s   Name            File
       101296106    11.6       1.19     147.41   lzma 9d30:fb273:mf=bt4           linux-5.1-rc5.tar
       101840402    11.7       0.40     359.32   brotli 11d30                     linux-5.1-rc5.tar
       104902973    12.0       0.17     848.75   oodle 139 'Leviathan'            linux-5.1-rc5.tar
       105244114    12.1       0.23     944.62   oodle 89 'Kraken'                linux-5.1-rc5.tar
       105245522    12.1       0.13     947.25   oodle 129 'Hydra'                linux-5.1-rc5.tar
       105253536    12.1       0.81    1694.20   lzturbo 39                       linux-5.1-rc5.tar
       108584801    12.5       9.01      33.55   lzturbo 59                       linux-5.1-rc5.tar
       125385534    14.4       0.42    2069.38   oodle 99 'Mermaid'               linux-5.1-rc5.tar
       127103702    14.6      12.36      45.42   bzip2                            linux-5.1-rc5.tar
       128610109    14.8       0.79    2078.73   oodle 95                         linux-5.1-rc5.tar
       129697563    14.9       0.81    1295.45   lzturbo 29                       linux-5.1-rc5.tar
       133182334    15.3      50.23     955.20   lzturbo 32                       linux-5.1-rc5.tar
    
       155314555                       1100      Nakamichi 'Ryuugan-ditto-1TB'    ! outside turbobench !
    
       163000894    18.7       0.45    2903.83   oodle 119 'Selkie'               linux-5.1-rc5.tar
       167472552    19.2       0.87    2955.39   oodle 115                        linux-5.1-rc5.tar
       174394966    20.0      49.68    1525.11   lzturbo 22                       linux-5.1-rc5.tar
       181251127    20.8     283.22    1338.07   lzturbo 30                       linux-5.1-rc5.tar
       187704878    21.5     207.26    1961.56   oodle 91                         linux-5.1-rc5.tar
       193407657    22.2       0.91    3400.77   lzturbo 19                       linux-5.1-rc5.tar
       211748416    24.3      86.24    3362.51   lzturbo 12                       linux-5.1-rc5.tar
       245272214    28.1     485.79    1461.28   lzturbo 20                       linux-5.1-rc5.tar
       248782597    28.5     240.79    3415.57   oodle 111                        linux-5.1-rc5.tar
       277205657    31.8     533.12    3014.54   lzturbo 10                       linux-5.1-rc5.tar
       782808336    89.8     205.88    6021.58   trle                             linux-5.1-rc5.tar
    

    The inikep's superb lzbench:
    Code (Text):
    F:\TEXTORAMIC_benchmarking_2019-Apr-29>lzbench173 -c4 -i1,15 -o3 -etornado,16/blosclz,9/brieflz/crush,2/csc,5/density,3/fastlz,2/gipfeli/lzo1b,999/libdeflate,1,12/lz4hc,1,12/lizard,19,29,39,49/lzf,1/lzfse/lzg,9/lzjb/lzlib,9/lzma,9/lzrw,5/lzsse2,17/lzsse4,17/lzsse8,17/lzvn/pithy,9/quicklz,3/snappy/slz_zlib,3/ucl_nrv2b,9/ucl_nrv2d,9/ucl_nrv2e,9/xpack,1,9/xz,9/yalz77,12/yappy,99/zlib,1,5,9/zling,4/shrinker/wflz/lzmat linux-5.1-rc5.tar
    lzbench 1.7.3 (64-bit Windows)   Assembled by P.Skibinski
    Compressor name         Compress. Decompress.  Orig. size  Compr. size  Ratio Filename
    memcpy                   8626 MB/s  9264 MB/s   871557120    871557120 100.00 linux-5.1-rc5.tar
    lzlib 1.8 -9             1.46 MB/s    93 MB/s   871557120    104859766  12.03 linux-5.1-rc5.tar
    lzma 16.04 -9            1.91 MB/s   138 MB/s   871557120    105363316  12.09 linux-5.1-rc5.tar
    xz 5.2.3 -9              2.08 MB/s   130 MB/s   871557120    105365178  12.09 linux-5.1-rc5.tar
    csc 2016-10-13 -5        2.73 MB/s   123 MB/s   871557120    108753888  12.48 linux-5.1-rc5.tar
    tornado 0.6a -16         1.59 MB/s   294 MB/s   871557120    110061031  12.63 linux-5.1-rc5.tar
    zling 2016-01-10 -4        58 MB/s   279 MB/s   871557120    127870534  14.67 linux-5.1-rc5.tar
    crush 1.0 -2             0.84 MB/s   448 MB/s   871557120    138783283  15.92 linux-5.1-rc5.tar
    xpack 2016-06-02 -9        20 MB/s   872 MB/s   871557120    145146540  16.65 linux-5.1-rc5.tar
    
    Nakamichi 'Ryuugan-ditto-1TB'       1100 MB/s                155314555        ! outside lzbench !
    
    lizard 1.0 -49           1.19 MB/s  1917 MB/s   871557120    155873352  17.88 linux-5.1-rc5.tar
    lizard 1.0 -29           1.24 MB/s  2074 MB/s   871557120    156885742  18.00 linux-5.1-rc5.tar
    libdeflate 0.7 -12       4.50 MB/s   884 MB/s   871557120    157281649  18.05 linux-5.1-rc5.tar
    lzfse 2017-03-08           60 MB/s   771 MB/s   871557120    160228645  18.38 linux-5.1-rc5.tar
    ucl_nrv2e 1.03 -9        2.87 MB/s   380 MB/s   871557120    160822096  18.45 linux-5.1-rc5.tar
    ucl_nrv2d 1.03 -9        2.87 MB/s   403 MB/s   871557120    162203786  18.61 linux-5.1-rc5.tar
    zlib 1.2.11 -9             20 MB/s   388 MB/s   871557120    164097959  18.83 linux-5.1-rc5.tar
    ucl_nrv2b 1.03 -9        2.70 MB/s   403 MB/s   871557120    164465689  18.87 linux-5.1-rc5.tar
    zlib 1.2.11 -5             52 MB/s   378 MB/s   871557120    169373400  19.43 linux-5.1-rc5.tar
    lzmat 1.01                 45 MB/s   377 MB/s   871557120    180331945  20.69 linux-5.1-rc5.tar
    lzo1b 2.09 -999            19 MB/s   841 MB/s   871557120    187378473  21.50 linux-5.1-rc5.tar
    xpack 2016-06-02 -1       154 MB/s   658 MB/s   871557120    188175690  21.59 linux-5.1-rc5.tar
    libdeflate 0.7 -1         169 MB/s   821 MB/s   871557120    192057235  22.04 linux-5.1-rc5.tar
    lz4hc 1.8.0 -12          8.91 MB/s  2430 MB/s   871557120    192608108  22.10 linux-5.1-rc5.tar
    lizard 1.0 -19           2.27 MB/s  2740 MB/s   871557120    193509752  22.20 linux-5.1-rc5.tar
    lzg 1.0.8 -9             1.39 MB/s   704 MB/s   871557120    194373817  22.30 linux-5.1-rc5.tar
    lizard 1.0 -39           2.47 MB/s  2606 MB/s   871557120    195660777  22.45 linux-5.1-rc5.tar
    quicklz 1.5.0 -3           60 MB/s   941 MB/s   871557120    200595365  23.02 linux-5.1-rc5.tar
    brieflz 1.1.0             152 MB/s   204 MB/s   871557120    201104232  23.07 linux-5.1-rc5.tar
    lzvn 2017-03-08            55 MB/s   924 MB/s   871557120    205726840  23.60 linux-5.1-rc5.tar
    lz4hc 1.8.0 -1            120 MB/s  2228 MB/s   871557120    208457499  23.92 linux-5.1-rc5.tar
    zlib 1.2.11 -1            105 MB/s   337 MB/s   871557120    208478566  23.92 linux-5.1-rc5.tar
    yalz77 2015-09-19 -12      30 MB/s   404 MB/s   871557120    212404493  24.37 linux-5.1-rc5.tar
    pithy 2011-12-24 -9       389 MB/s  1635 MB/s   871557120    228713454  26.24 linux-5.1-rc5.tar
    gipfeli 2016-07-13        339 MB/s   518 MB/s   871557120    229111668  26.29 linux-5.1-rc5.tar
    lzrw 15-Jul-1991 -5       146 MB/s   581 MB/s   871557120    249228934  28.60 linux-5.1-rc5.tar
    slz_zlib 1.0.0 -3         238 MB/s   291 MB/s   871557120    257661990  29.56 linux-5.1-rc5.tar
    yappy 2014-03-22 -99       78 MB/s  2341 MB/s   871557120    264231099  30.32 linux-5.1-rc5.tar
    density 0.12.5 beta -3    303 MB/s   288 MB/s   871557120    267496998  30.69 linux-5.1-rc5.tar
    fastlz 0.1 -2             318 MB/s   534 MB/s   871557120    268851492  30.85 linux-5.1-rc5.tar
    snappy 1.1.4              388 MB/s  1264 MB/s   871557120    276267532  31.70 linux-5.1-rc5.tar
    blosclz 2015-11-10 -9     273 MB/s   913 MB/s   871557120    278909674  32.00 linux-5.1-rc5.tar
    lzf 3.6 -1                327 MB/s   618 MB/s   871557120    282981879  32.47 linux-5.1-rc5.tar
    wflz 2015-09-16           311 MB/s   803 MB/s   871557120    321645793  36.90 linux-5.1-rc5.tar
    lzjb 2010                 267 MB/s   474 MB/s   871557120    371401543  42.61 linux-5.1-rc5.tar
    shrinker 0.1             3286 MB/s  5454 MB/s   871557120    823262668  94.46 linux-5.1-rc5.tar
    


    Notes:
    - Latest available ‘oo2core_6_win64.dll’ was used;
    - The tweak value spacespeedtradeoff=[64..1024] “tweak size vs decode time” has not been played with!
    - Oodle Levels: HyperFast = -4..-1; None = 0; SuperFast = 1; VeryFast = 2; Fast = 3; Normal = 4; Optimal1,2,3,4 = 5,6,7,8;
    - Oodle Compressors: Kraken = 8 (Default); Leviathan = 13 (Best); Mermaid = 9 (Crazy fast); Selkie = 11 (Fastest); Hydra = 12 (Tuneable composite of above);

    My personal opinion regarding the TOP 5:

    #1 The most CONSISTENT performer - The supremator LzTurbo 39, straight up;
    #2 With "kernel", Leviathan, Kraken and Hydra are no match to the supermonster LzTurbo 39, however, Oodle 'Mermaid' saves the day by delivering awesome sweetness;
    #3 lizard -49 - the open-source topdog, inikep made good;
    #4 zpaq method 511 (i.e. 2^11MB block) sets the roofline;
    #5 Zstd 22, the open-source superb counterpart to Oodle and LzTurbo brings versatility and speed.
     
  9. Sanmayce

    Sanmayce New Member

    17
    7
    3
    Jun 9, 2018
    Sofia
    Ratings:
    +22
    Local Time:
    6:40 AM
    Have been "side-tracked" for 2 months playing with my new hasher (for look-ups), now replacing the old one, that is, Nakamichi hashes keys already with the fastest hasher 'FNV1A_Pippip'.

    Also, I broke the Crucial MX500 SSD after only 42TB written (out of 180TB guaranteed):
    https://twitter.com/Sanmayce/status/1203658729838780416

    crc_4.png

    Will stick to Samsung in the future.

    And glad to share the latest revision of Nakamichi, featuring:
    - Needs 3N physical RAM (+ adjustable physical RAM for HASH pools), where N is the size of file being compressed;
    - Automatic setting of PASSES (allows to fit building B-trees into RAM (the third N used to house output file) and just then to rebuild the current PASS either on RAM or SSD) - it lowers drastically the wearing of external RAM;
    - Significantly Lowered Memory Footprint - now, only the NON-UNIQUE keys are put into B-trees.

    For example, now the 'SILVA' DNA corpus, from Kirill's Sequence Compression Benchmark, 1.1GB big needs only 10N =~ 11GB block to fit all the B-trees (on the screenshot).
    Benchmarkwise, very interesting DNA corpora are there:
    Sequence Compression Benchmark
    To see, how it fares against the usual suspects, one picture tells a thousand words:

    dna.PNG
    The actual improvement is shown in next ASCII block-scheme:

    ASCII_1.jpg
    ASCII_2.PNG

    The source (and how to compile) is attached.
    In the near future, will refine further Nakamichi, wanna see it as a useful console tool...
     

    Attached Files:

    Last edited: Dec 20, 2019
  10. Sanmayce

    Sanmayce New Member

    17
    7
    3
    Jun 9, 2018
    Sofia
    Ratings:
    +22
    Local Time:
    6:40 AM
    Gladden I am to share the most refined revision of my texttoy.

    Nakamichi already entered Dr. Mahoney's LTCB (Large Text Compression Benchmark) roster as Pareto frontier setter i.e. no other (open-source) decompressor is faster with this compression ratio.
    Large Text Compression Benchmark

    Big news! Now, the matchfinder is able to find a match with ONE only random (external) access, one random fetch done within a hashtable (based on physical RAM), and one random fetch done within a B-tree (based on physical/external RAM).

    The revision achieving above results is from 2019-Aug-06, today gladly I share the latest much superior revision featuring:
    - Needs 3N physical RAM (+ adjustable physical RAM for HASH pools), where N is the size of file being compressed;
    - Automatic setting of PASSES (allows to fit building B-trees into RAM (the third N used to house output file) and just then to rebuild the current PASS either on RAM or SSD) - it lowers drastically the wearing of external RAM;
    - Significantly Lowered Memory Footprint - now, only the NON-UNIQUE keys are put into B-trees;
    - All B-trees are DEFRAGMENTED i.e. all their leaves are concatenated i.e. each B-tree is housed in one SOLID block - thus PREFETCH-FRIENDLY they became.

    The beauty of the Nakamichi C source comes from it being a practical etude/benchmark of superfast implementations of:
    - FASTEST known to me memmem(), namely the function char * Railgun_Trolldom(char * pbTarget, char * pbPattern, uint32_t cbTarget, uint32_t cbPattern);
    - FASTEST known to me LOOKUP hasher, namely the uint32_t FNV1A_Pippip_Yurii(const char *str, size_t wrdlen);
    - FASTEST known to me B-tree, namely the void B_tree_Non_Unique_Only_DEFRAGMENTED(int argc, char *argv[], char* SourceBlock, uint64_t SourceSize, char* VerifyBlock), it is capable transparently to work with internal/external RAM.

    [​IMG]

    The "normal" case (which is advocated even in Prof. Knuth's bible) is to use a LEAF with many many keys - to avoid TAPE/HDD revolvements/repositions.
    Footprintwise, I always considered this advice out of fashion/style, even today's NAND and Xpoint SSDs cannot deliver magical speeds unless the B-tree is both compact (my order 3 is more size effective) and prefetch-friendly.
    My two benchmarking laptops are now busy, but in the next year will share how in practice this revision works with some big corpora as:
    - SILVA DNA corpus ~1.1 GB strong;
    - Human Genome DNA corpus ~3.3 GB strong.

    The needed memory for latter is 20N i.e. 66GB, the FRAGMENTED revision is already on the 4th percent, after finishing will run the DEFRAGMENTED one...

    Code:
    Parsing all the BBs (Building-Blocks) of length 4,6,8,10,12,14,16,18,36,64 at each position:
    
    PASS #1 – TEMPORARY STUFF:
    One POOL based/located always on physical (internal RAM) – the last/hidden one of all HASH SUB-POOLS:
    -----------------------------------------------------------------------                
    | Hash BLOCK (TEMPORARY HASH Sub-Pool), size = command line parameter |---------------------------------------------------------------------------------------------\
    -----------------------------------------------------------------------                                                                                             |
     |                                                                                                                                                                  |
     | All slots point to B-tree roots in the TEMPORARY B-tree POOL                                                                                                     |
     \--------------------------------------------------------------\                                                                                                   |
                                                                    |                                                                                                   |
    Three POOLs based/located always on physical (internal RAM):   \ /                                                                                                  |
    -------------------------- ----------------------------------- ---------------------------------------------------------                                            |
    | Source Block, size = N | | Source REVERSED Block, size = N | | Target Block, size = N, or, B-trees BLOCK (TEMPORARY) |                                            |
    -------------------------- ----------------------------------- ---------------------------------------------------------                                            |
                                                                                                                                                                        |
    PASS #2 – TEMPORARY-TO-NONTEMPORARY STUFF (takes into account what above two temporary blocks house):                                                               |
    -------------------------------------------------                                                                                                                   |
    | Source Pool: Houses the file being compressed |                                                                                                                   |
    -------------------------------------------------                                                                                                                   |
     |\--------------------------------------------------------------------------------------------------------------------------\                                      |
     | |                                                                                                                         |Hash 64 bytes                         |
     | |Hash 6 bytes                                                                                                             |                                      |
     | \--------------------------------------------------------\                                                                |                                      |
     |                                                          |                                                                |                                      |
     |Hash 4 bytes                                              |                                                                |                                      |
     \                                                          |                                                                |                                      |
     \/                                                        \ /                                                              \ /                                    \ /
    |----------------------------------------------------------|----------------------------------------------------------|-----|-------------------------------------|-------------------------|
    | Hash Sub-Pool for BBs 4 bytes long                       | Hash Sub-Pool for BBs 6 bytes long                       | ... | Hash Sub-Pool for BBs 64 bytes long | TEMPORARY HASH Sub-Pool |
    |----------------------------------------------------------|----------------------------------------------------------|-----|-------------------------------------|-------------------------|
    | Slot # 0         | ... | Slot # (1<<HashInBITS_GLOBAL)-1 | Slot # 0         | ... | Slot # (1<<HashInBITS_GLOBAL)-1 |     |                                     |                         |
    | Address 8 bytes  | ... | Address 8 bytes                 | Address 8 bytes  | ... | Address 8 bytes                 |     |                                     |                         |
    | to a B-tree root | ... | to a B-tree root                | to a B-tree root | ... | to a B-tree root                |     |                                     |                         |
    |----------------------------------------------------------|----------------------------------------------------------|-----|-------------------------------------|-------------------------|
     |                                                                             |
     |                                                                             \---------------------------------------------------------\
     \----------------------------------------------\ LEAF #1 is on level 0                                                                  |
                                                    |                                                                                        |
                                                    |         /----------------------------------------------\ LEAF #5 is on level 2         |
                                                   \ /        |                                              \/                             \ /
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ! Located either on
    | Btree Pool: Houses the leaves of all B-trees | ROOT #1 | LEAF #2                  | LEAF #3 | LEAF #4 | LEAF #5 | ... some leaves ... | ROOT #2 | ... some leaves/roots ... | ! Internal or
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ! External RAM
                                                    |||       /\                         / \       / \
                                                    ||\-------/ LEAF #2 is on level 1     |         |
                                                    |\------------------------------------/         |
                                                    \-----------------------------------------------/
    
    B-tree order 3 (MAX 3 pointers, MAX 2 keys) LEAF #1 (ROOT is also a LEAF – if no successors) structure:
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    | LeftPointer,    | MiddlePointer,  | RightPointer,   | DynamicLeftKey (LeftKeySize + LeftKey       + LeftQWORD)| DynamicRightKey (RightKeySize + RightKey       + RightQWORD) | ! LEVEL 0
    | Address 8 bytes | Address 8 bytes | Address 8 bytes |                 1 byte      + 1..255 bytes  + 8 bytes   |                  1 byte       + 1..255 bytes   + 8 bytes     |
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     |                  |                 |
     |                  |                 \--------------------------------------------------------------------\
     |                  |                                                                                      |
     |                  \-----------------------------\                                                        |
    \ /                                               \/                                                      \ /
    -----------------------------------------------  -------------------------------------------------------  -----------------------------------------------
    | LEAF #2 (housing keys smaller than LeftKey) |  | LEAF #3 (housing keys between LeftKey and RightKey) |  | LEAF #4 (housing keys bigger than RightKey) | ! LEVEL 1
    -----------------------------------------------  -------------------------------------------------------  -----------------------------------------------
    
    Note1: Hash Pool: Houses (10+1) consecutive Sub-Pools, the 11th (the last one actually) is used in PASS #1 only (as temporary, its slots point to physical RAM only addresses located in TargetBlock of size N=SourceSize).
    Note2: Usually 'LeftQWORD' and 'RightQWORD' are used either for OCCURRENCES or LastSeenOffest of that key.
    Note3: If 'LeftKeySize' or 'RightKeySize' is 0 then the field is not used - there is no KEY.
    
    In the near future I intend to reduce more memmem() invocations which are the main culprit for speed drops, it is a matter of few hours, and ... more SSD memory.

    As always, the source and binary attached.

    dickens1.jpg
    dickens2.png

    Hm, Oodle and LzTurbo kick asses :blackeye::blackeye:, the latest Zstd kinda slow :p, in decompression department while ultrafast in compression.
     

    Attached Files:

  11. Sanmayce

    Sanmayce New Member

    17
    7
    3
    Jun 9, 2018
    Sofia
    Ratings:
    +22
    Local Time:
    6:40 AM
    Very very glad to share my latest refined and fully functional Nakamichi - it became ready for multi-gigabytes benchmarks.

    Currently, I test how this final revision fares against the master corpus BNC (British-National-Corpus_XML-edition) 4.5GB in size, freely downloadable on Internet, and against all usual suspects i.e. top decompressors, I expect my texttoy to shine since its main target is huge English texts corpora, the console of my laptop with i7-3630QM and 16GB DDR3, Samsung 860 PRO:
    Code (Text):
    F:\i7_incoming\TEXTUAL_MADNESS_bare-minimum_2020_May_09>timer64.exe Satanichi_GCC730_64bit.exe Machine-Learning_British-National-Corpus_XML-edition.tar Machine-Learning_British-National-Corpus_XML-edition.tar.Nakamichi 20 190345 e
    ...
    Nakamichi 'Ryuugan-ditto-1TB', written by Kaze, inspired by Haruhiko Okumura sharing, based on Nobuo Ito's LZSS source, muffinesque tips by m^2, Jim Dempsey and Kirill Kryukov enforced.
    Downloadable at: https://twitter.com/Sanmayce/status/1211005691873439744
    Note0: Nakamichi 'Dragoneye', (AUTO-RESETTING PASSES and DEFRAGMENTED B-trees revision) is 100% FREE, licenseless that is.
    Note1: Hamid Buzidi's LzTurbo ([a] FASTEST [Textual] Decompressor, Levels 19/29/39) retains kingship, his TurboBench (2017-Apr-07) proves the supremacy of LzTurbo, Turbo-Amazing!
    Note2: Conor Stokes' LZSSE2 ([a] FASTEST Textual Decompressor, Level 17) is embedded, all credits along with many thanks go to him.
    Note3: The matchfinder is either 'Railgun_Trolldom' (matches longer than 18, except 36 and 64) or Leprechaun's B-tree order 3.
    Note4: Instead of '_mm_loadu_si128' '_mm_lddqu_si128' is used.
    Note5: Maximum compression ratio is 44:1, for 704 bytes long matches within 1TB Sliding Window.
    Note6: Please send me (at sanmayce@sanmayce.com) decompression results obtained on machines with fast CPU-RAM subsystems.
    Note7: In this compile, clock() was replaced with time() - to counter bigtime stats misreporting.
    Note8: Multi-way hashing allows each KeySize to occupy its own HASH pool, thus less RAM is in use - the LEAF is smaller.
    Note9: In this revision, B-tree heuristics are in use, allowing skipping many unnecessary memmem() invocations.
    NoteA: The file being compressed should be 4 bytes or longer due to Building-Blocks being in range 4..18, 24,28,36,48,56,64,128.
    NoteB: In this compile, the keysizes in the LEAF are not HEXed i.e. not doubled. Also, the hash in use: FNV1A_Pippip_Yurii, the fastest one known to me.
    NoteC: In this latest (2020-May-09) compile, SHA3-224 is enabled at compile time, it is optional, that is.
    Current priority class is REALTIME_PRIORITY_CLASS.
    This compile uses B-trees only, no memmem() invocations - it compresses worse but much faster.
    Allocating Source-Buffer 4,463 MB ...
    Allocating Source-Buffer 4,463 MB (REVERSED) ...
    Allocating Target-Buffer 4,495 MB ...
    Leprechaun: Memory pool for B-trees is 190,345 MB.
    Leprechaun: In this revision 8MB 15-way hash is used which results in 15 x 1,048,576 external B-Trees of order 3.
    Leprechaun: Allocating HASH memory 134,217,793 bytes = 8*(15+1)*2^(20) ... OK
    Leprechaun: Allocating/ZEROing 199,591,198,734 bytes swap file ... OK
    Leprechaun: Size of input file: 4,680,140,800
    
    Leprechaun: Inserting keys/BBs of order 004 into B-trees, free RAM in B-tree pool is 00,190,345 MB; Pass #000,001 of 000,001 ... DONE; 00,000,439,072 B-trees have been rooted so far.
    Leprechaun: Inserting keys/BBs of order 006 into B-trees, free RAM in B-tree pool is 00,190,322 MB; Pass #000,001 of 000,001 ... DONE; 00,001,406,833 B-trees have been rooted so far.
    Leprechaun: Inserting keys/BBs of order 008 into B-trees, free RAM in B-tree pool is 00,190,215 MB; Pass #000,001 of 000,001 ... DONE; 00,002,448,098 B-trees have been rooted so far.
    Leprechaun: Inserting keys/BBs of order 010 into B-trees, free RAM in B-tree pool is 00,189,996 MB; Pass #000,001 of 000,001 ... DONE; 00,003,495,965 B-trees have been rooted so far.
    Leprechaun: Inserting keys/BBs of order 012 into B-trees, free RAM in B-tree pool is 00,189,647 MB; Pass #000,001 of 000,001 ... DONE; 00,004,544,488 B-trees have been rooted so far.
    Leprechaun: Inserting keys/BBs of order 014 into B-trees, free RAM in B-tree pool is 00,189,148 MB; Pass #000,001 of 000,001 ... DONE; 00,005,593,063 B-trees have been rooted so far.
    Leprechaun: Inserting keys/BBs of order 016 into B-trees, free RAM in B-tree pool is 00,188,454 MB; Pass #000,001 of 000,001 ... DONE; 00,006,641,639 B-trees have been rooted so far.
    Leprechaun: Inserting keys/BBs of order 018 into B-trees, free RAM in B-tree pool is 00,187,510 MB; Pass #000,001 of 000,001 ... DONE; 00,007,690,215 B-trees have been rooted so far.
    Leprechaun: Inserting keys/BBs of order 036 into B-trees, free RAM in B-tree pool is 00,186,258 MB; Pass #000,001 of 000,001 ...
    Leprechaun: Failure! Increment 'Memory for B-trees'!
    Automatically increasing number of passes in order to fit B-trees into Target Buffer.
    Leprechaun: Inserting keys/BBs of order 036 into B-trees, free RAM in B-tree pool is 00,186,258 MB; Pass #000,001 of 000,002 ...
    Leprechaun: Failure! Increment 'Memory for B-trees'!
    Automatically increasing number of passes in order to fit B-trees into Target Buffer.
    Leprechaun: Inserting keys/BBs of order 036 into B-trees, free RAM in B-tree pool is 00,186,258 MB; Pass #000,001 of 000,004 ...
    Leprechaun: Failure! Increment 'Memory for B-trees'!
    Automatically increasing number of passes in order to fit B-trees into Target Buffer.
    Leprechaun: Inserting keys/BBs of order 036 into B-trees, free RAM in B-tree pool is 00,178,585 MB; Pass #000,008 of 000,008 ... DONE; 00,008,738,791 B-trees have been rooted so far.
    Leprechaun: Inserting keys/BBs of order 064 into B-trees, free RAM in B-tree pool is 00,177,488 MB; Pass #000,001 of 000,008 ...
    Leprechaun: Failure! Increment 'Memory for B-trees'!
    Automatically increasing number of passes in order to fit B-trees into Target Buffer.
    Leprechaun: Inserting keys/BBs of order 064 into B-trees, free RAM in B-tree pool is 00,177,488 MB; Pass #000,001 of 000,016 ...
    Leprechaun: Failure! Increment 'Memory for B-trees'!
    Automatically increasing number of passes in order to fit B-trees into Target Buffer.
    Leprechaun: Inserting keys/BBs of order 064 into B-trees, free RAM in B-tree pool is 00,143,820 MB; Pass #000,032 of 000,032 ... DONE; 00,009,787,367 B-trees have been rooted so far.
    Leprechaun: Inserting keys/BBs of order 024 into B-trees, free RAM in B-tree pool is 00,140,230 MB; Pass #000,032 of 000,032 ... DONE; 00,010,835,943 B-trees have been rooted so far.
    Leprechaun: Inserting keys/BBs of order 028 into B-trees, free RAM in B-tree pool is 00,136,038 MB; Pass #000,032 of 000,032 ... DONE; 00,011,884,519 B-trees have been rooted so far.
    Leprechaun: Inserting keys/BBs of order 048 into B-trees, free RAM in B-tree pool is 00,118,304 MB; Pass #000,032 of 000,032 ... DONE; 00,012,933,095 B-trees have been rooted so far.
    Leprechaun: Inserting keys/BBs of order 056 into B-trees, free RAM in B-tree pool is 00,093,148 MB; Pass #000,032 of 000,032 ... DONE; 00,013,981,671 B-trees have been rooted so far.
    Leprechaun: Inserting keys/BBs of order 128 into B-trees, free RAM in B-tree pool is 00,092,354 MB; Pass #000,001 of 000,032 ...
    Leprechaun: Failure! Increment 'Memory for B-trees'!
    Automatically increasing number of passes in order to fit B-trees into Target Buffer.
    Leprechaun: Inserting keys/BBs of order 128 into B-trees, free RAM in B-tree pool is 00,092,354 MB; Pass #000,001 of 000,064 ...
    Leprechaun: Failure! Increment 'Memory for B-trees'!
    Automatically increasing number of passes in order to fit B-trees into Target Buffer.
    Leprechaun: Inserting keys/BBs of order 128 into B-trees, free RAM in B-tree pool is 00,092,354 MB; Pass #000,001 of 000,128 ...
    Leprechaun: Failure! Increment 'Memory for B-trees'!
    Automatically increasing number of passes in order to fit B-trees into Target Buffer.
    Leprechaun: Inserting keys/BBs of order 128 into B-trees, free RAM in B-tree pool is 00,010,667 MB; Pass #000,256 of 000,256 ... DONE; 00,015,030,247 B-trees have been rooted so far.
    
    Leprechaun: Total Searches-n-Inserts Per Second: 16,828,138 SNIPS = 2,042,936,033,237 keys in 121,400 seconds
    Leprechaun: RAM needed to house B-trees (relative to the file being ripped): 40N = 179,998MB
    Leprechaun: RAM needed to build B-trees IN ONE PASS: (Target-Buffer = 4,495 MB) x 256 passes = 1,150,720MB
    Note: In case of RAM in spades then may use compile option: -DSpeedUpBuilding=1150720
    Leprechaun: Total IOPS for 8,544,969,110 'freads' and 3,249,216,034 'fwrites' (of packets 298 bytes long) during loading traversing all orders: 97,151 IOPS
    
    Compressing 4,680,140,800 bytes ...
    |; Each rotation means 64KB are encoded; Speed: 0,000,298 B/s; Done 7%; Compression Ratio: 11.04:1; Matches(16/24/48): 402,086/1,196,484/2,835,548; 128[+] long matches: 667,816; ETA: 4031.05 hours
    

    In above console, Nakamichi gives interesting stats:
    Code (Text):
    Leprechaun: RAM needed to house B-trees (relative to the file being ripped): 40N = 179,998MB
    Leprechaun: RAM needed to build B-trees IN ONE PASS: (Target-Buffer = 4,495 MB) x 256 passes = 1,150,720MB
    Note: In case of RAM in spades then may use compile option: -DSpeedUpBuilding=1150720
    Leprechaun: Total IOPS for 8,544,969,110 'freads' and 3,249,216,034 'fwrites' (of packets 298 bytes long) during loading traversing all orders: 97,151 IOPS
    


    - The first line tells us that 180GB RAM are needed to house all the B-trees, in this case, that memory is located on the SSD (since command line option 'e' was specified).
    - The second line tells us that if you want to build all the B-trees in ONE PASS then you should compile Nakamichi with -DSpeedUpBuilding=1150720, ugh, it equals 1.2TB RAM, but then compression will be done entirely in physical RAM not using any swapping, indeed, Nakamichi's time is still ahead (has not come yet), maybe after a decade, this very attached code will show LINEAR COMPRESSION RATE of 200KB/s... anyhow, after 4031 hours will share how BNC corpus was crunched.

    In the process of tuning I trashed two Samsung 860 EVO 500GB (made 2019-Oct and 2020-Mar) - I won't recommend to heavy-duty users this series while would say only good words about the PRO counterpart (at same price range) - the Samsung 860 PRO 256GB - it reached 250TB written being a workhorse (y)

    Recently, the SCB (Kirill's DNA benchmark) got some momentum, as I see it, it will become a compressstation (as in playstation) - a place where many general-purpose decompressors (the accent there) can show their capabilities, in practice.

    To showcase how fast one can draw from the DNA archive e.g. a coronavirus sequence, here comes one-sheet-stop:

    [​IMG]

    And the .PDF booklet containing above page plus 7 small corpora benchmarked with latest ZSTD 1.4.5 and latest Oodle (rev.8 of the DLL):
    www.sanmayce.com/Nakamichi/TEXTORAMIC_Decompression_Showdown_2020-May-26.pdf

    _MakeELF_Nakamichi_GCC.sh:
    Code:
    gcc -O3 -static -msse4.1 -fomit-frame-pointer Nakamichi_Ryuugan-ditto-1TB_btree.c -o Nakamichi_Vanilla_LITE_GCC_64bit.elf -D_N_XMM -D_N_prefetch_4096 -D_N_alone -DHashInBITS=24 -DHashChunkSizeInBITS=24 -DRAMpoolInKB=5120 -DBtreeHEURISTIC -D_POSIX_ENVIRONMENT_ -DLongestLineInclusive=128 -DSpeedUpBuilding=32 -DLITE
    The .C source and how to compile it is attached, enjoy!
     
    Last edited: May 27, 2020
  12. Sanmayce

    Sanmayce New Member

    17
    7
    3
    Jun 9, 2018
    Sofia
    Ratings:
    +22
    Local Time:
    6:40 AM
    Kinda, finally, migrated from Windows to Linux :p
    For the first time I have the linux box I always wanted.
    From now on, I am allowed to dive into command prompt scripts and benchmarking.

    My brother helped me bigtime in setting the Fedora 33, gotta say, I love it, this darkmate theme is all I want - Pluma, Caja and Terminal are as if designed for my taste.

    Also, my textual deduplicator is already finalized! Nakamichi is ready for unseen compression benchmarks, the first one shall be The Definitive Russian language Compression Corpus being 15GB strong, 13555-books_(UTF-8) tarred. It includes full 42 book series, modern Russian prose.

    Another interesting thing is the testmachine that will crunch it:
    Laptop running Fedora 33, Ryzen 4800H, 64GB RAM, entire 2TB Firecuda dedicated as SWAP partition!

    Since, my texttoy is the greediest among all file-to-file compressors, I decided to exploit fully the resources presented by my brother's super laptop, I intend to run Nakamichify_internal-2TB.sh:

    Code (Text):
    sha1sum "$1"
    
    /usr/bin/time -v ./Nakamichi_DD-192_256GB-BTREE-BUFFER.elf "$1" "$1.Nakamichi" 30 1723000 i
    
    ./Nakamichi_DD-192_256GB-BTREE-BUFFER.elf "$1.Nakamichi">"$1.NKMCH"
    
    sha1sum "$1.NKMCH"
    
    ./Nakamichi_DD-192_256GB-BTREE-BUFFER.elf "$1.Nakamichi" /bench
    
    /usr/bin/time -v ./Nakamichi_DD-192_256GB-BTREE-BUFFER.elf "$1.Nakamichi" > /dev/null
    perf stat -d ./Nakamichi_DD-192_256GB-BTREE-BUFFER.elf "$1.Nakamichi" > /dev/null
    
    ls -l $1*
    

    As you can see, three extreme features are activated:
    - at run-time, hash size pool being 30bit, yes 1 billion B-trees per keysize, crazy;
    - at compile-time, the additional buffer for building B-trees is boosted by 262,144 MB (thus passes for building are brutally lowered, most likely to 1 pass);
    - at run-time, 1,723,000 MB malloc()-ed, thus we will witness how virtual memory is managed by the latest kernel, in one heavy scenario.

    The preprocessor command for doing this is: -DSpeedUpBuilding=262144

    @eva2000 Despite this topic not being related directly to the forum, please allow me to share the results here. I am new to linux, however I am actively starting to use it, in a way, other *nix users could see something of interest.

    Glad to share the latest and LAST Nakamichi, attached, GCC 10.2 was used, the compile scripts are included:
    Code (Text):
    E:\Nakamichi_2021-Mar-12>dir
    
    03/13/2021  09:34 PM               114 BZIP2ify.sh
    03/13/2021  09:34 PM               344 Nakamichify_internal-128GB.sh
    03/13/2021  09:34 PM               241 ZSTDify.sh
    
    03/13/2021  09:34 PM         1,491,168 Nakamichi_Ryuugan-ditto-1TB_btree.c
    03/13/2021  09:34 PM             2,137 _MakeELF_Nakamichi_GCC.sh
    03/13/2021  09:34 PM           857,952 Nakamichi_DD-192.elf
    

    Just did the GLIBC showdown:
    Code:
    Testmachine:  laptop 'Compressionette' i5-7200U max turbo 3.1GHz, 36GB DDR4 2133MHz, Fedora 33
    Testfile: glibc-2.32.tar (226,877,440 bytes)
    
    +----------------------------+---------------+-------------------------+----------------+----------------+
    | Decompressor, MAX settings |          Size | Decompression Wall Time |   instructions | insn per cycle |
    +----------------------------+---------------+-------------------------+----------------+----------------+
    | Zstd 1.4.9                 |    18,813,175 |               0.315 sec |  1,355,290,139 |           2.09 |
    | Nakamichi 'Double-Deuce'   |    32,595,292 |               0.218 sec |    366,715,085 |           0.94 |
    | bzip2                      |    23,982,240 |               5.408 sec | 17,918,477,508 |           1.09 |
    +----------------------------+---------------+-------------------------+----------------+----------------+
    
    Compression lines:
    /usr/bin/time -v zstd --single-thread --ultra -22 --long=31 --zstd=wlog=31,clog=30,hlog=30,slog=26 "$1" -o "$1.L22W2GB.zst"
    /usr/bin/time -v ./Nakamichi_DD-192.elf "$1" "$1.Nakamichi" 26 123000 i
    /usr/bin/time -v bzip2 -9 -k "$1"
    
    Decompression lines:
    perf stat -d zstd -d --long=31 "$1.L22W2GB.zst" -o /dev/null
    perf stat -d ./Nakamichi_DD-192.elf "$1.Nakamichi" > /dev/null
    cat "$1.bz2" | perf stat -d bzip2 -d -k > /dev/null
    
    +----------------+----------------+----------------+
    | Zstd 1.4.9     | Nakamichi 'DD' | bzip2 1.0.8    |
    +----------------+----------------+----------------+-------------------------+
    |          0.990 |          0.995 |          1.000 | CPUs utilized           |
    |           2.02 |           0.94 |           1.09 | insn per cycle          |
    |         312.01 |         217.46 |       5,407.13 | msec task-clock:u       |
    |  1,368,619,089 |    366,715,085 | 17,918,477,508 | instructions:u          |
    |    155,243,093 |     54,632,574 |  3,526,925,660 | branches:u              |
    |      4,887,072 |      6,490,361 |    127,177,075 | branch-misses:u         |
    |    380,228,246 |     35,730,747 |  4,344,466,576 | L1-dcache-loads:u       |
    |     19,137,410 |      7,682,673 |    187,276,863 | L1-dcache-load-misses:u |
    |      1,628,108 |      3,365,779 |     79,885,744 | LLC-loads:u             |
    |        286,959 |        829,414 |      9,377,915 | LLC-load-misses:u       |
    +----------------+----------------+----------------+-------------------------+
    
    Yesterday, Yann confirmed above settings were the MAX ones:
    In 1.4.9, decoding failed having combined --long=31 with --zstd=wlog=31 · Issue #2530 · facebook/zstd

    Currently, I am beginning (the long postponed and awaited) SUPRAPIG quest:
    The Zennish Microdeduplicator: Home of Nakamichi

    Before crunching the 15GB Russian supercorpus, I will feed these two to the monster:

    - 500 MB (524,794,368 bytes) SUPRAPIG_The_Prague_Stringology_Club_(302-PostScript-files).tar (The Prague Stringology Club), sha1sum: 79b9c48428daa2c7bd606df7f139423bd2c9c7d2:
    - 793 MB (832,290,816 bytes) book_serie_SPETSNAZ_(981_UTF-8_novels_Russian).tar, sha1sum: b5a38bcc70ccc77daa2b9c459a9535b2e7660e5e:

    Right below the SUPRAPIG content, you can see the two rosters/tables for above two corpora, filled with BEST-n-FASTEST [de]compressors of today...
     

    Attached Files:

    Last edited: Mar 14, 2021