Discover Centmin Mod today
Register Now

A LZSS microdeduplicator tagetting huge texts, with C source

Discussion in 'System Administration' started by Sanmayce, Jan 14, 2019.

  1. Sanmayce

    Sanmayce New Member

    9
    1
    3
    Jun 9, 2018
    Sofia
    Ratings:
    +8
    Local Time:
    4:18 AM
    Since I enliked the forum and the appreciation of Sysadmin for fast compression, decided to create a thread for my incoming amateurish LZSS compressor - nothing special except it exploits some unseen techniques as using millions of highly optimized B-trees of order 3 (MAX 2 keys, 3 pointers) as a matchfinder.

    My wish is C community (not only *nix, but Windows folks as well) to have one 100% FREE etude of textual decompression. The main goals at the very beginning were:
    - superfast decompression;
    - strong textual compression using 1TB sliding window.

    By exploiting the already fully-tested B-tree implementation of mine, the finding of an key/match could be adjusted to be found in less than 4 RAM fetches! Simply, by enlarging the HASH table (I call it HASHPOT) the number of B-trees could be 4 billion if the whole 32bit of hash are used, usually I use ~130 million B-trees with 27bit slots, where each slot houses the root (8 bytes) of a B-tree, therefore (1<<27)*8=1GB, if enwik9 is to be compressed with its 1- billion keys then the maximum attempts to find a key is less than 4 because no B-tree is higher than 4 levels.

    The nifty thing about my B-tree implementation is that the memory in use can be either internal or external RAM, toggled by one variable. Thus, when some big file, say the Wikipedia XML dump being ~62GB, is to be compressed, the matchfinder will use one external pool (a file of its own) to house B-trees, one 512GB SSD allows ~500GB virtual RAM, in the past I have made some benchmarks with this codepath and the speeds were nearing the 11,000 IOPS - reading&writing small (512 bytes) packets RANDOMLY across 60GB of the SSD!

    To share my current excitement of the incoming Nakamichi_Ryuugan-ditto-1TB_btree.c, I used the Leprechaun movies as inspiration:

    Incoming_btree_ugly.png

    The current revision (attached) has the B-tree included but still not integrated with the parser (to replace the ultraslow memmem() approach).
    I guess, by the end of the month, a working revision will be ready, to be shared in here...

    Code (Text):
    D:\_KAZE_unsorted\N1TB_bigtime_b-tree>"Nakamichi_Ryuugan-ditto-1TB_(1xQWORD+1xXMM)_Intel_15.0_64bit_archSSE41.exe" The_Complete_Sherlock_Holmes_-_Doyle_Arthur_Conan.txt
    Nakamichi 'Ryuugan-ditto-1TB', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced, muffinesque suggestion by Jim Dempsey enforced.
    Note0: Nakamichi 'Dragoneye' is 100% FREE, licenseless that is.
    Note1: Hamid Buzidi's LzTurbo ([a] FASTEST [Textual] Decompressor, Levels 19/29/39) retains kingship, his TurboBench (2017-Apr-07) proves the supremacy of LzTurbo, Turbo-Amazing!
    Note2: Conor Stokes' LZSSE2 ([a] FASTEST Textual Decompressor, Level 17) is embedded, all credits along with many thanks go to him.
    Note3: 'Ryuugan' predecessors are Washigan, Okamigan, Zato, Tsubame, Tengu-Tsuyo, Tengu, Rakka, Kokuen, Kinroba, Yoko, Kinutora, Jiten, Butsuhira, Suiken, Keigan, Kumataka, Washi, Aratama, Hitomi, Nek
    omata, Kitsune, Kinezumi, Sanbashi, Kaiko, Inazuma, Zangetsu, Hanabi, Hanazakari, Sanshi and Kaidanji.
    Note4: This variant is the SLOWEST compressor under the Sun! This compile can handle files up to 5120MB.
    Note5: The matchfinder is either 'Railgun_Trolldom' (matches longer than 18) or Leprechaun's B-tree order 3 (matches less or equal to 18).
    Note6: Instead of '_mm_loadu_si128' '_mm_lddqu_si128' is used.
    Note7: The lookahead 'Tsuyo' heuristic which looks one char ahead is applied thrice, still not strengthened, though.
    Note8: The compile made 2017-Oct-22, the decompression time measuring is done in 16x8 passes choosing the top score from 64 back-to-back runs - the goal - to enter [maximal] Turbo Mode.
    Note9: In GP/SSE4.1/AVX2 compile, the 24 matches become 3xQWORD/1xQWORD+1xXMMWORD/1xYMMWORD.
    NoteA: Maximum compression ratio is 44:1, for 704 bytes long matches within 1TB Sliding Window.
    NoteB: Please send me (at [email protected]) decompression results obtained on machines with fast CPU-RAM subsystems.
    NoteC: In this latest (2019-Jan-13) compile, clock() was replaced with time() - to counter bigtime stats misreporting.
    Current priority class is REALTIME_PRIORITY_CLASS.
    Allocating Source-Buffer 3 MB ...
    Allocating Source-Buffer 3 MB (REVERSED) ...
    Allocating Target-Buffer 35 MB ...
    Allocating Verification-Buffer 3 MB ...
    Leprechaun: Memory pool for B-tress is 1,612,048 KB.
    Leprechaun: In this revision 8MB 1-way hash is used which results in 1,048,576 internal B-Trees of order 3.
    Leprechaun: In this revision, 1 pass is to be made.
    Leprechaun: Allocating HASH memory 8,388,673 bytes ... OK
    Leprechaun: Allocating memory for B-tress 1575 MB ... OK
    Leprechaun: Size of input file: 3,714,387
    
    Leprechaun: Inserting keys/BBs of order 004 into B-trees, free RAM in B-tree pool is 1,612,048 KB ... OK
    Leprechaun: Number Of Distinct keys/BBs (all orders): 59,081
    Leprechaun: Number Of TREEs(GREATER THE BETTER): 54,297
    Leprechaun: Number Of LEAFs(littler THE BETTER) not counting ROOT LEAFs: 673
    Leprechaun: Highest Tree not counting ROOT Level i.e. CORONA levels(littler THE BETTER): 1
    Leprechaun: Used value for third parameter in KB: 1,612,048
    Leprechaun: Use next time as third parameter: 6,979
    Leprechaun: Total Attempts to Find/Put keys into B-trees order 3: 266,958
    
    Leprechaun: Inserting keys/BBs of order 006 into B-trees, free RAM in B-tree pool is 1,605,069 KB ... OK
    Leprechaun: Number Of Distinct keys/BBs (all orders): 432,623
    Leprechaun: Number Of TREEs(GREATER THE BETTER): 307,140
    Leprechaun: Number Of LEAFs(littler THE BETTER) not counting ROOT LEAFs: 51,165
    Leprechaun: Highest Tree not counting ROOT Level i.e. CORONA levels(littler THE BETTER): 2
    Leprechaun: Used value for third parameter in KB: 1,612,048
    Leprechaun: Use next time as third parameter: 45,574
    Leprechaun: Total Attempts to Find/Put keys into B-trees order 3: 1,066,896
    
    Leprechaun: Inserting keys/BBs of order 008 into B-trees, free RAM in B-tree pool is 1,566,474 KB ... OK
    Leprechaun: Number Of Distinct keys/BBs (all orders): 1,473,624
    Leprechaun: Number Of TREEs(GREATER THE BETTER): 739,168
    Leprechaun: Number Of LEAFs(littler THE BETTER) not counting ROOT LEAFs: 368,120
    Leprechaun: Highest Tree not counting ROOT Level i.e. CORONA levels(littler THE BETTER): 2
    Leprechaun: Used value for third parameter in KB: 1,612,048
    Leprechaun: Use next time as third parameter: 147,155
    Leprechaun: Total Attempts to Find/Put keys into B-trees order 3: 2,509,474
    
    Leprechaun: Inserting keys/BBs of order 010 into B-trees, free RAM in B-tree pool is 1,464,893 KB ... OK
    Leprechaun: Number Of Distinct keys/BBs (all orders): 3,304,694
    Leprechaun: Number Of TREEs(GREATER THE BETTER): 910,611
    Leprechaun: Number Of LEAFs(littler THE BETTER) not counting ROOT LEAFs: 1,234,656
    Leprechaun: Highest Tree not counting ROOT Level i.e. CORONA levels(littler THE BETTER): 3
    Leprechaun: Used value for third parameter in KB: 1,612,048
    Leprechaun: Use next time as third parameter: 325,664
    Leprechaun: Total Attempts to Find/Put keys into B-trees order 3: 5,424,374
    
    Leprechaun: Inserting keys/BBs of order 012 into B-trees, free RAM in B-tree pool is 1,286,384 KB ... OK
    Leprechaun: Number Of Distinct keys/BBs (all orders): 5,811,731
    Leprechaun: Number Of TREEs(GREATER THE BETTER): 1,043,037
    Leprechaun: Number Of LEAFs(littler THE BETTER) not counting ROOT LEAFs: 1,806,063
    Leprechaun: Highest Tree not counting ROOT Level i.e. CORONA levels(littler THE BETTER): 3
    Leprechaun: Used value for third parameter in KB: 1,612,048
    Leprechaun: Use next time as third parameter: 571,761
    Leprechaun: Total Attempts to Find/Put keys into B-trees order 3: 8,813,528
    
    Leprechaun: Inserting keys/BBs of order 014 into B-trees, free RAM in B-tree pool is 1,040,287 KB ... OK
    Leprechaun: Number Of Distinct keys/BBs (all orders): 8,795,024
    Leprechaun: Number Of TREEs(GREATER THE BETTER): 1,047,790
    Leprechaun: Number Of LEAFs(littler THE BETTER) not counting ROOT LEAFs: 2,339,050
    Leprechaun: Highest Tree not counting ROOT Level i.e. CORONA levels(littler THE BETTER): 3
    Leprechaun: Used value for third parameter in KB: 1,612,048
    Leprechaun: Use next time as third parameter: 869,314
    Leprechaun: Total Attempts to Find/Put keys into B-trees order 3: 12,690,670
    
    Leprechaun: Inserting keys/BBs of order 016 into B-trees, free RAM in B-tree pool is 742,734 KB ... OK
    Leprechaun: Number Of Distinct keys/BBs (all orders): 12,078,810
    Leprechaun: Number Of TREEs(GREATER THE BETTER): 1,048,551
    Leprechaun: Number Of LEAFs(littler THE BETTER) not counting ROOT LEAFs: 2,574,284
    Leprechaun: Highest Tree not counting ROOT Level i.e. CORONA levels(littler THE BETTER): 3
    Leprechaun: Used value for third parameter in KB: 1,612,048
    Leprechaun: Use next time as third parameter: 1,196,224
    Leprechaun: Total Attempts to Find/Put keys into B-trees order 3: 16,700,443
    
    Leprechaun: Inserting keys/BBs of order 018 into B-trees, free RAM in B-tree pool is 415,824 KB ... OK
    Leprechaun: Number Of Distinct keys/BBs (all orders): 15,542,161
    Leprechaun: Number Of TREEs(GREATER THE BETTER): 1,048,573
    Leprechaun: Number Of LEAFs(littler THE BETTER) not counting ROOT LEAFs: 2,700,527
    Leprechaun: Highest Tree not counting ROOT Level i.e. CORONA levels(littler THE BETTER): 3
    Leprechaun: Used value for third parameter in KB: 1,612,048
    Leprechaun: Use next time as third parameter: 1,539,067
    Leprechaun: Total Attempts to Find/Put keys into B-trees order 3: 20,645,236
    
    ...

    In parallel I am running a medium size (for now) TEXTORAMIC (as the Japanorama documentary, but adjectival Textorama) benchmark:
    TEXTORAMIC_Decompression_Showdown_2018-Dec-31.pdf
     

    Attached Files:

    • Winner Winner x 1
  2. Sanmayce

    Sanmayce New Member

    9
    1
    3
    Jun 9, 2018
    Sofia
    Ratings:
    +8
    Local Time:
    4:18 AM
    Glad to share (attached) a revision compilable with GCC. On Windows, after installing the portable (ready-to-go) MinGW the compile lines are:

    Code (Text):
    C:\>cd MinGW
    
    C:\MinGW>dir
     Volume in drive C has no label.
     Volume Serial Number is 7E3E-0444
    
     Directory of C:\MinGW
    
    05/03/2018  05:11 PM    <DIR>          .
    05/03/2018  05:11 PM    <DIR>          ..
    05/03/2018  05:08 PM    <DIR>          bin
    05/03/2018  05:09 PM    <DIR>          git
    05/03/2018  05:11 PM    <DIR>          include
    05/03/2018  05:11 PM    <DIR>          lib
    05/03/2018  05:11 PM    <DIR>          libexec
    03/18/2012  09:45 PM               709 open_distro_window.bat
    02/26/2018  10:34 PM            18,216 README_STL.txt
    05/03/2018  05:11 PM    <DIR>          scripts-15.4
    07/30/2017  05:05 PM               780 set_distro_paths.bat
    05/03/2018  05:11 PM    <DIR>          x86_64-w64-mingw32
    
    C:\MinGW>set_distro_paths.bat
    
    C:\MinGW>gcc -v
    Using built-in specs.
    COLLECT_GCC=gcc
    COLLECT_LTO_WRAPPER=c:/mingw/bin/../libexec/gcc/x86_64-w64-mingw32/7.3.0/lto-wrapper.exe
    Target: x86_64-w64-mingw32
    Configured with: ../src/configure --enable-languages=c,c++ --build=x86_64-w64-mingw32 --host=x86_64-w64-mingw32 --target=x86_64-w64-mingw32 --disable-multilib --prefix=/c/temp/gcc/dest --with-sysroot=/c/temp/gcc/dest --disable-libstdcxx-pch --disable-libstdcxx-verbose --disable-nls --disable-shared --disable-win32-registry --with-tune=haswell --enable-threads=posix --enable-libgomp
    Thread model: posix
    gcc version 7.3.0 (GCC)
    
    E:\N>dir
    
    11/14/2018  11:29 AM            55,397 lzsse2.cpp
    11/14/2018  11:29 AM             4,858 lzsse2.h
    11/14/2018  11:29 AM             2,481 lzsse2_platform.h
    01/13/2019  01:22 PM           634,462 Nakamichi_Ryuugan-ditto-1TB_btree.c
    01/14/2019  08:21 AM               797 MakeEXEs_Ryuugan-ditto-1TB_btree_GCC.bat
    
    D:\N1TB_bigtime_b-tree>MakeEXEs_Ryuugan-ditto-1TB_btree_GCC.bat
    
    D:\N1TB_bigtime_b-tree>gcc -O3 -msse4.1 -fomit-frame-pointer Nakamichi_Ryuugan-ditto-1TB_btree.c -o Nakamichi_Ryuugan-ditto-1TB_RAM_GCC730.exe -D_N_XMM -D_N_prefetch_4096 -D_N_alone -D_N_HIGH_PRIORITY -DHashInBITS=20 -DHashChunkSizeInBITS=20 -DRAMpoolInKB=4612048
    
    D:\N1TB_bigtime_b-tree>gcc -O3 -msse4.1 -fomit-frame-pointer Nakamichi_Ryuugan-ditto-1TB_btree.c -o Nakamichi_Ryuugan-ditto-1TB_SSD_GCC730.exe -D_N_XMM -D_N_prefetch_4096 -D_N_alone -D_N_HIGH_PRIORITY -DHashInBITS=20 -DHashChunkSizeInBITS=20 -DRAMpoolInKB=4612048 -DExternalRAM
    
    For *nix add:
    -D_POSIX_ENVIRONMENT_


    Above two lines generate an 'SSD' and 'RAM' executables, below, the 'SSD' is under inspection.

    Wanted, as a side-effect i.e. by-product, to have a real-world SSD random read/write of small packets benchmarker...
    Here comes such torturer of 130 bytes long packets, reading-n-writing randomly, it reports real IOPS since it counts the real invocations of each 'fread()' and 'fwrite()' function.

    The testmachine is my laptop with i5-7200u, the SSD is 240GB Kingston A400 (cheapest of all) and connected via USB 2.0, Windows caching has gone out of ... the window:

    Edit 2019-Jan-17:
    Grmbl! I'm sorry for the wrong statistics posted few days ago, had to replace them.
    The culprit was my forgetfulness, did forget that when final traversal was done the pointers were zeroed, it was on purpose back then, nothing wrong actually.

    The first example, the testdataset is '1001 nights' ~15MB long file:
    The "score" is 163,328 IOPS for 402,619,834 'freads()' and 268,170,926 'fwrites()' (of packets 50..106 bytes long) during loading traversing all orders. Roughly, 1.5:1 read vs write operations. For realistic results, the filesize should be > 512MB.

    Code (Text):
    C:\Windows\system32>cd\
    
    C:\>cd MinGW
    
    C:\MinGW>set_distro_paths.bat
    
    D:\Nakamichi_Ryuugan-ditto-1TB_bigtime_b-tree_2019-Jan-20>MakeEXEs_Ryuugan-ditto-1TB_btree_GCC.bat
    
    D:\Nakamichi_Ryuugan-ditto-1TB_bigtime_b-tree_2019-Jan-20>gcc -O3 -msse4.1 -fomit-frame-pointer Nakamichi_Ryuugan-ditto-1TB_btree.c -o Nakamichi_Ryuugan-ditto-1TB_RAM_GCC730.exe -D_N_XMM -D_N_prefetch_4096 -D_N_alone -D_N_HI
    GH_PRIORITY -DHashInBITS=20 -DHashChunkSizeInBITS=20 -DRAMpoolInKB=6612048
    
    D:\Nakamichi_Ryuugan-ditto-1TB_bigtime_b-tree_2019-Jan-20>gcc -O3 -msse4.1 -fomit-frame-pointer Nakamichi_Ryuugan-ditto-1TB_btree.c -o Nakamichi_Ryuugan-ditto-1TB_SSD_GCC730.exe -D_N_XMM -D_N_prefetch_4096 -D_N_alone -D_N_HI
    GH_PRIORITY -DHashInBITS=20 -DHashChunkSizeInBITS=20 -DRAMpoolInKB=6612048 -DExternalRAM
    
    D:\Nakamichi_Ryuugan-ditto-1TB_bigtime_b-tree_2019-Jan-20>Nakamichi_Ryuugan-ditto-1TB_RAM_GCC730.exe Arabian_Nights_complete.html
    Nakamichi 'Ryuugan-ditto-1TB', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced, muffinesque suggestion by Jim Dempsey enforced.
    Note0: Nakamichi 'Dragoneye' is 100% FREE, licenseless that is.
    Note1: Hamid Buzidi's LzTurbo ([a] FASTEST [Textual] Decompressor, Levels 19/29/39) retains kingship, his TurboBench (2017-Apr-07) proves the supremacy of LzTurbo, Turbo-Amazing!
    Note2: Conor Stokes' LZSSE2 ([a] FASTEST Textual Decompressor, Level 17) is embedded, all credits along with many thanks go to him.
    Note3: 'Ryuugan' predecessors are Washigan, Okamigan, Zato, Tsubame, Tengu-Tsuyo, Tengu, Rakka, Kokuen, Kinroba, Yoko, Kinutora, Jiten, Butsuhira, Suiken, Keigan, Kumataka, Washi, Aratama, Hitomi, Nekomata, Kitsune, Kinezumi, Sanbashi, Kaiko, Inazuma, Zangetsu, Hanabi, Hanazakari, Sanshi and Kaidanji.
    Note4: This variant is the SLOWEST compressor under the Sun! This compile can handle files up to 5120MB.
    Note5: The matchfinder is either 'Railgun_Trolldom' (matches longer than 18) or Leprechaun's B-tree order 3 (matches less or equal to 18).
    Note6: Instead of '_mm_loadu_si128' '_mm_lddqu_si128' is used.
    Note7: The lookahead 'Tsuyo' heuristic which looks one char ahead is applied thrice, still not strengthened, though.
    Note8: The compile made 2017-Oct-22, the decompression time measuring is done in 16x8 passes choosing the top score from 64 back-to-back runs - the goal - to enter [maximal] Turbo Mode.
    Note9: In GP/SSE4.1/AVX2 compile, the 24 matches become 3xQWORD/1xQWORD+1xXMMWORD/1xYMMWORD.
    NoteA: Maximum compression ratio is 44:1, for 704 bytes long matches within 1TB Sliding Window.
    NoteB: Please send me (at [email protected]) decompression results obtained on machines with fast CPU-RAM subsystems.
    NoteC: In this latest (2019-Jan-20) compile, clock() was replaced with time() - to counter bigtime stats misreporting.
    NoteD: Multi-way hashing allows each KeySize to occupy its own HASH pool, thus less RAM is in use - the LEAF is smaller.
    Current priority class is REALTIME_PRIORITY_CLASS.
    Allocating Source-Buffer 14 MB ...
    Allocating Source-Buffer 14 MB (REVERSED) ...
    Allocating Target-Buffer 46 MB ...
    Allocating Verification-Buffer 14 MB ...
    Leprechaun: Memory pool for B-tress is 6,612,048 KB.
    Leprechaun: In this revision 8MB 8-way hash is used which results in 8 x 1,048,576 internal B-Trees of order 3.
    Leprechaun: In this revision, 1 pass is to be made.
    Leprechaun: Allocating HASH memory 67,108,929 bytes ... OK
    Leprechaun: Allocating memory for B-tress 6458 MB ... OK
    Leprechaun: Size of input file: 15,583,440
    
    Leprechaun: Inserting keys/BBs of order 004 into B-trees, free RAM in B-tree pool is 6,612,048 KB ...
    Leprechaun: Inserting keys/BBs of order 006 into B-trees, free RAM in B-tree pool is 6,604,089 KB ...
    Leprechaun: Inserting keys/BBs of order 008 into B-trees, free RAM in B-tree pool is 6,550,882 KB ...
    Leprechaun: Inserting keys/BBs of order 010 into B-trees, free RAM in B-tree pool is 6,388,585 KB ...
    Leprechaun: Inserting keys/BBs of order 012 into B-trees, free RAM in B-tree pool is 6,055,645 KB ...
    Leprechaun: Inserting keys/BBs of order 014 into B-trees, free RAM in B-tree pool is 5,532,175 KB ...
    Leprechaun: Inserting keys/BBs of order 016 into B-trees, free RAM in B-tree pool is 4,825,455 KB ...
    Leprechaun: Inserting keys/BBs of order 018 into B-trees, free RAM in B-tree pool is 3,954,416 KB ...
    
    Leprechaun: Number Of Total/Distinct/Undistinct keys/BBs (all orders): 124,667,440/55,177,480/69,489,960
    Leprechaun: Number Of TREEs(GREATER THE BETTER): 6,768,818
    Leprechaun: Number Of LEAFs(littler THE BETTER) not counting ROOT LEAFs: 34,942,972
    Leprechaun: Highest Tree not counting ROOT Level i.e. CORONA levels(littler THE BETTER): 4
    Leprechaun: Used value for B-trees pool in KB: 6,612,048
    Leprechaun: Use next time for B-trees pool in KB: 3,672,584
    Leprechaun: Total keys (all orders) into B-trees order 3 (during traversal/dump): 55,177,480
    Leprechaun: Total keys MISSES/HITS (all orders) into B-trees order 3: 55,177,480/69,489,960
    
    Leprechaun: B-trees building speed (counting ROOT LEAFs): 1,069,533 LEAFs/s
    Leprechaun: B-trees traverse speed (counting ROOT LEAFs): 5,958,827 LEAFs/s
    Leprechaun: Total Searches-n-Inserts Per Second: 2,770,387 SNIPS
    Leprechaun: RAM needed to house B-trees (relative to the file being ripped): 241N
    
    Compressing 15,583,440 bytes ...
    ^C
    
    D:\Nakamichi_Ryuugan-ditto-1TB_bigtime_b-tree_2019-Jan-20>Nakamichi_Ryuugan-ditto-1TB_ssd_GCC730.exe Arabian_Nights_complete.html
    Nakamichi 'Ryuugan-ditto-1TB', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced, muffinesque suggestion by Jim Dempsey enforced.
    Note0: Nakamichi 'Dragoneye' is 100% FREE, licenseless that is.
    Note1: Hamid Buzidi's LzTurbo ([a] FASTEST [Textual] Decompressor, Levels 19/29/39) retains kingship, his TurboBench (2017-Apr-07) proves the supremacy of LzTurbo, Turbo-Amazing!
    Note2: Conor Stokes' LZSSE2 ([a] FASTEST Textual Decompressor, Level 17) is embedded, all credits along with many thanks go to him.
    Note3: 'Ryuugan' predecessors are Washigan, Okamigan, Zato, Tsubame, Tengu-Tsuyo, Tengu, Rakka, Kokuen, Kinroba, Yoko, Kinutora, Jiten, Butsuhira, Suiken, Keigan, Kumataka, Washi, Aratama, Hitomi, Nekomata, Kitsune, Kinezumi, Sanbashi, Kaiko, Inazuma, Zangetsu, Hanabi, Hanazakari, Sanshi and Kaidanji.
    Note4: This variant is the SLOWEST compressor under the Sun! This compile can handle files up to 5120MB.
    Note5: The matchfinder is either 'Railgun_Trolldom' (matches longer than 18) or Leprechaun's B-tree order 3 (matches less or equal to 18).
    Note6: Instead of '_mm_loadu_si128' '_mm_lddqu_si128' is used.
    Note7: The lookahead 'Tsuyo' heuristic which looks one char ahead is applied thrice, still not strengthened, though.
    Note8: The compile made 2017-Oct-22, the decompression time measuring is done in 16x8 passes choosing the top score from 64 back-to-back runs - the goal - to enter [maximal] Turbo Mode.
    Note9: In GP/SSE4.1/AVX2 compile, the 24 matches become 3xQWORD/1xQWORD+1xXMMWORD/1xYMMWORD.
    NoteA: Maximum compression ratio is 44:1, for 704 bytes long matches within 1TB Sliding Window.
    NoteB: Please send me (at [email protected]) decompression results obtained on machines with fast CPU-RAM subsystems.
    NoteC: In this latest (2019-Jan-20) compile, clock() was replaced with time() - to counter bigtime stats misreporting.
    NoteD: Multi-way hashing allows each KeySize to occupy its own HASH pool, thus less RAM is in use - the LEAF is smaller.
    Current priority class is REALTIME_PRIORITY_CLASS.
    Allocating Source-Buffer 14 MB ...
    Allocating Source-Buffer 14 MB (REVERSED) ...
    Allocating Target-Buffer 46 MB ...
    Allocating Verification-Buffer 14 MB ...
    Leprechaun: Memory pool for B-tress is 6,612,048 KB.
    Leprechaun: In this revision 8MB 8-way hash is used which results in 8 x 1,048,576 external B-Trees of order 3.
    Leprechaun: In this revision, 1 pass is to be made.
    Leprechaun: Allocating HASH memory 67,108,929 bytes ... OK
    Leprechaun: Allocating/ZEROing 6,770,737,166 bytes swap file ... OK
    Leprechaun: Size of input file: 15,583,440
    
    Leprechaun: Inserting keys/BBs of order 004 into B-trees, free RAM in B-tree pool is 6,612,048 KB ...
    Leprechaun: Inserting keys/BBs of order 006 into B-trees, free RAM in B-tree pool is 6,604,089 KB ...
    Leprechaun: Inserting keys/BBs of order 008 into B-trees, free RAM in B-tree pool is 6,550,882 KB ...
    Leprechaun: Inserting keys/BBs of order 010 into B-trees, free RAM in B-tree pool is 6,388,585 KB ...
    Leprechaun: Inserting keys/BBs of order 012 into B-trees, free RAM in B-tree pool is 6,055,645 KB ...
    Leprechaun: Inserting keys/BBs of order 014 into B-trees, free RAM in B-tree pool is 5,532,175 KB ...
    Leprechaun: Inserting keys/BBs of order 016 into B-trees, free RAM in B-tree pool is 4,825,455 KB ...
    Leprechaun: Inserting keys/BBs of order 018 into B-trees, free RAM in B-tree pool is 3,954,416 KB ...
    
    Leprechaun: Number Of Total/Distinct/Undistinct keys/BBs (all orders): 124,667,440/55,177,480/69,489,960
    Leprechaun: Number Of TREEs(GREATER THE BETTER): 6,768,818
    Leprechaun: Number Of LEAFs(littler THE BETTER) not counting ROOT LEAFs: 34,942,972
    Leprechaun: Highest Tree not counting ROOT Level i.e. CORONA levels(littler THE BETTER): 4
    Leprechaun: Used value for B-trees pool in KB: 6,612,048
    Leprechaun: Use next time for B-trees pool in KB: 3,672,584
    Leprechaun: Total keys (all orders) into B-trees order 3 (during traversal/dump): 55,177,480
    Leprechaun: Total keys MISSES/HITS (all orders) into B-trees order 3: 55,177,480/69,489,960
    
    Leprechaun: B-trees building speed (counting ROOT LEAFs): 13,322 LEAFs/s
    Leprechaun: B-trees traverse speed (counting ROOT LEAFs): 42,693 LEAFs/s
    Leprechaun: Total Searches-n-Inserts Per Second: 30,354 SNIPS
    Leprechaun: RAM needed to house B-trees (relative to the file being ripped): 241N
    Leprechaun: Total IOPS for 402,619,834 'freads' and 268,170,926 'fwrites' (of packets 106 bytes long) during loading traversing all orders: 163,328 IOPS
    
    Compressing 15,583,440 bytes ...
    ^C


    Had I have $999.99 I would buy the ideal choice for such virtual RAM loads - Samsung SSD 860 PRO 2.5" SATA III 4TB:
    Random Read (4KB, QD1): Up to 11,000 IOPS Random Read
    Random Write (4KB, QD1): Up to 43,000 IOPS Random Write
    [​IMG]
    Currently, I have the smallest model - Samsung 256GB 860 PRO, all the benchmarking done on this model will be more informative since it has the smallest cache - thus when the tested files are big enough they will torture the uncached regions.

    CACHE MEMORY
    512 MB Low Power DDR4 (256 GB, 512 GB)
    1 GB Low Power DDR4 (1,024 GB)
    2 GB Low Power DDR4 (2,048 GB)
    4 GB Low Power DDR4 (4,096 GB)

    What an atrocity, nowadays machines still go with 4GB Main RAM, while the drives already have 4GB?

    My eyes are locked onto future machines, long ago I decided to write C etudes disregarding the current conjuncture/nomenclature and utilize RAM and SSD space as if they are in spades! I hate [seeing] crippling some etude for sake of fitting in the status quo, think for a moment, what happens when next decade brings 1TB DIMMs or/and 16TB SSDs as mainstream! What will all those "blind" for the available resources tools/implementations do? Suck, as I see it.

    Too often, frightened people ask me why the scary evil elf is my logo/talisman, first off, every decent ripper doesn't sugarcoat - it rips brutally, hence the ... adequate choice of mine, moreover Leprechaun is a ruler of the forest/trees, he is a master-rhymer, a trickster, and mostly a brutal master of wizardcraft.

    Incoming_btree_ugly2.png
    Besides, the idiom "as hell" (slang, to the maximum degree) is unknown to them, usually.
    I find the naming convention for foreground/background programs in *nix very on point - the elves/daemons carry the spirit of magicalness.

    Incoming_btree_ugly3.png

    Ripping fast as hell, that's what I envision.

    Or as Tarja sings "The elf-folk is calling me":
    "The Elvenpath
    It's the honesty of these worlds
    Ruled by magic and mighty swords
    That makes my soul long for the past"

    Edit 2019-Jan-20:
    Few changes and a bugfix (a missed check for allocating External RAM) were done:

    The second example, the testdataset is 'Complete Dickens' ~41MB long file:
    The "score" is 67,632 IOPS for 1,320,112,595 'freads()' and 735,431,907 'fwrites()' (of packets 50..106 bytes long) during loading traversing all orders. Roughly, 1.8:1 read vs write operations. For realistic results, the filesize should be > 512MB.

    Code (Text):
    D:\Nakamichi_Ryuugan-ditto-1TB_bigtime_b-tree_2019-Jan-20>MakeEXEs_Ryuugan-ditto-1TB_btree_GCC.bat
    
    D:\Nakamichi_Ryuugan-ditto-1TB_bigtime_b-tree_2019-Jan-20>gcc -O3 -msse4.1 -fomit-frame-pointer Nakamichi_Ryuugan-ditto-1TB_btree.c -o Nakamichi_Ryuugan-ditto-1TB_RAM_GCC730.exe -D_N_XMM -D_N_prefetch_4096 -D_N_alone -D_N_HI
    GH_PRIORITY -DHashInBITS=20 -DHashChunkSizeInBITS=20 -DRAMpoolInKB=13612048
    
    D:\Nakamichi_Ryuugan-ditto-1TB_bigtime_b-tree_2019-Jan-20>gcc -O3 -msse4.1 -fomit-frame-pointer Nakamichi_Ryuugan-ditto-1TB_btree.c -o Nakamichi_Ryuugan-ditto-1TB_SSD_GCC730.exe -D_N_XMM -D_N_prefetch_4096 -D_N_alone -D_N_HI
    GH_PRIORITY -DHashInBITS=20 -DHashChunkSizeInBITS=20 -DRAMpoolInKB=13612048 -DExternalRAM
    
    D:\Nakamichi_Ryuugan-ditto-1TB_bigtime_b-tree_2019-Jan-20>Nakamichi_Ryuugan-ditto-1TB_ssd_GCC730.exe Complete_Works_of_Charles_Dickens.txt
    Nakamichi 'Ryuugan-ditto-1TB', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced, muffinesque suggestion by Jim Dempsey enforced.
    Note0: Nakamichi 'Dragoneye' is 100% FREE, licenseless that is.
    Note1: Hamid Buzidi's LzTurbo ([a] FASTEST [Textual] Decompressor, Levels 19/29/39) retains kingship, his TurboBench (2017-Apr-07) proves the supremacy of LzTurbo, Turbo-Amazing!
    Note2: Conor Stokes' LZSSE2 ([a] FASTEST Textual Decompressor, Level 17) is embedded, all credits along with many thanks go to him.
    Note3: 'Ryuugan' predecessors are Washigan, Okamigan, Zato, Tsubame, Tengu-Tsuyo, Tengu, Rakka, Kokuen, Kinroba, Yoko, Kinutora, Jiten, Butsuhira, Suiken, Keigan, Kumataka, Washi, Aratama, Hitomi, Nekomata, Kitsune, Kinezumi, Sanbashi, Kaiko, Inazuma, Zangetsu, Hanabi, Hanazakari, Sanshi and Kaidanji.
    Note4: This variant is the SLOWEST compressor under the Sun! This compile can handle files up to 5120MB.
    Note5: The matchfinder is either 'Railgun_Trolldom' (matches longer than 18) or Leprechaun's B-tree order 3 (matches less or equal to 18).
    Note6: Instead of '_mm_loadu_si128' '_mm_lddqu_si128' is used.
    Note7: The lookahead 'Tsuyo' heuristic which looks one char ahead is applied thrice, still not strengthened, though.
    Note8: The compile made 2017-Oct-22, the decompression time measuring is done in 16x8 passes choosing the top score from 64 back-to-back runs - the goal - to enter [maximal] Turbo Mode.
    Note9: In GP/SSE4.1/AVX2 compile, the 24 matches become 3xQWORD/1xQWORD+1xXMMWORD/1xYMMWORD.
    NoteA: Maximum compression ratio is 44:1, for 704 bytes long matches within 1TB Sliding Window.
    NoteB: Please send me (at [email protected]) decompression results obtained on machines with fast CPU-RAM subsystems.
    NoteC: In this latest (2019-Jan-20) compile, clock() was replaced with time() - to counter bigtime stats misreporting.
    NoteD: Multi-way hashing allows each KeySize to occupy its own HASH pool, thus less RAM is in use - the LEAF is smaller.
    Current priority class is REALTIME_PRIORITY_CLASS.
    Allocating Source-Buffer 40 MB ...
    Allocating Source-Buffer 40 MB (REVERSED) ...
    Allocating Target-Buffer 72 MB ...
    Allocating Verification-Buffer 40 MB ...
    Leprechaun: Memory pool for B-tress is 13,612,048 KB.
    Leprechaun: In this revision 8MB 8-way hash is used which results in 8 x 1,048,576 external B-Trees of order 3.
    Leprechaun: In this revision, 1 pass is to be made.
    Leprechaun: Allocating HASH memory 67,108,929 bytes ... OK
    Leprechaun: Allocating/ZEROing 13,938,737,166 bytes swap file ... OK
    Leprechaun: Size of input file: 42,935,960
    
    Leprechaun: Inserting keys/BBs of order 004 into B-trees, free RAM in B-tree pool is 13,612,048 KB ...
    Leprechaun: Inserting keys/BBs of order 006 into B-trees, free RAM in B-tree pool is 13,605,572 KB ...
    Leprechaun: Inserting keys/BBs of order 008 into B-trees, free RAM in B-tree pool is 13,546,308 KB ...
    Leprechaun: Inserting keys/BBs of order 010 into B-trees, free RAM in B-tree pool is 13,285,138 KB ...
    Leprechaun: Inserting keys/BBs of order 012 into B-trees, free RAM in B-tree pool is 12,604,428 KB ...
    Leprechaun: Inserting keys/BBs of order 014 into B-trees, free RAM in B-tree pool is 11,354,612 KB ...
    Leprechaun: Inserting keys/BBs of order 016 into B-trees, free RAM in B-tree pool is 9,485,965 KB ...
    Leprechaun: Inserting keys/BBs of order 018 into B-trees, free RAM in B-tree pool is 7,061,527 KB ...
    
    Leprechaun: Number Of Total/Distinct/Undistinct keys/BBs (all orders): 343,487,600/139,455,560/204,032,040
    Leprechaun: Number Of TREEs(GREATER THE BETTER): 6,869,619
    Leprechaun: Number Of LEAFs(littler THE BETTER) not counting ROOT LEAFs: 97,898,274
    Leprechaun: Highest Tree not counting ROOT Level i.e. CORONA levels(littler THE BETTER): 5
    Leprechaun: Used value for B-trees pool in KB: 13,612,048
    Leprechaun: Use next time for B-trees pool in KB: 9,440,253
    Leprechaun: Total keys (all orders) into B-trees order 3 (during traversal/dump): 139,455,560
    Leprechaun: Total keys MISSES/HITS (all orders) into B-trees order 3: 139,455,560/204,032,040
    
    Leprechaun: B-trees building speed (counting ROOT LEAFs): 5,116 LEAFs/s
    Leprechaun: B-trees traverse speed (counting ROOT LEAFs): 10,564 LEAFs/s
    Leprechaun: Total Searches-n-Inserts Per Second: 11,301 SNIPS
    Leprechaun: RAM needed to house B-trees (relative to the file being ripped): 225N
    Leprechaun: Total IOPS for 1,320,112,595 'freads' and 735,431,907 'fwrites' (of packets 106 bytes long) during loading traversing all orders: 67,632 IOPS
    
    Compressing 42,935,960 bytes ...
    ^C


    Looking at the stats above, maximum 5 random seeks/reads are needed to find a key into 139,455,560 keys. In this revision the LEAF size was reduced, more economically the RAM is used. Having IOPS is a must, but the more telling stats are SNIPS:
    Total Searches-n-Inserts Per Second: 11,301 SNIPS
    It says how many actual FINDINGS you can do.
     

    Attached Files:

    Last edited: Jan 21, 2019
  3. eva2000

    eva2000 Administrator Staff Member

    38,695
    8,548
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +13,131
    Local Time:
    12:18 PM
    Nginx 1.15.x
    MariaDB 5.5/10.x
    thanks for sharing ?? do you have you project up on Github.com that we can follow ?
     
  4. Sanmayce

    Sanmayce New Member

    9
    1
    3
    Jun 9, 2018
    Sofia
    Ratings:
    +8
    Local Time:
    4:18 AM
    >thanks for sharing ?? do you have you project up on Github.com that we can follow ?

    My pleasure, sharing is better than receiving - an old truth is that.

    If you don't mind this thread will be the initial/development/benchmark page for Nakamichi featuring a B-tree matchfinder.
    Since I don't run a *nix system, it would be nice Nakamichi to have an alternative home on an appreciative *nix site.
    My development environment is Windows XP paired with Intel C compiler v15. I'm kinda disappointed by unappreciativeness shown left and right, most "programmers" don't let the numbers speak, optimizing basic etudes is ... underneath their gravity. Have had my bitter experience with some old "all knowing" PROs on a bigsite, they several times confronted and refused to make visible my simple etudes, the last one was my 256bit Assembly branchless LCSS - it failed to fit their bills. I said "So be it."

    As for GitHub, I used it only to share not to promote.
    To me, GitHub and such are for PROs, I will die an amateur, nothing can change that.

    Even now, before even mixing the B-trees into Compress() function, the C source from the previous post serves well as a benchmarker. That's what I love about practical etudes, they allow answering many questions, such as:
    Q: - How many times an SSD performance is lower than RAM?
    A: - Surprisingly, only 2,833,350 SNIPS / 38,101 SNIPS = 74x, for 'Arabian_Nights_complete.html' 15,583,440 bytes long.

    announce3.png
    The '1001 Nights' testdataset is downloadable at:
    The Book of The Thousand Nights and a Night


    For all who want to have a hardcopy of the source, this is mine, the latest DRAFT:
    Nakamichi_Ryuugan-ditto-1TB_btree_source.pdf
     
    Last edited: Jan 18, 2019
  5. Sanmayce

    Sanmayce New Member

    9
    1
    3
    Jun 9, 2018
    Sofia
    Ratings:
    +8
    Local Time:
    4:18 AM
    Gladly I'm sharing the first fully operational Nakamichi powered by my Leprechaunish B-tree matchfinder.

    Code (Text):
    C:\>cd MinGW
    
    C:\MinGW>set_distro_paths.bat
    
    C:\MinGW>d:
    
    D:\>cd Nakamichi_Ryuugan-ditto-1TB_bigtime_b-tree_2019-Feb-02_
    
    D:\Nakamichi_Ryuugan-ditto-1TB_bigtime_b-tree_2019-Feb-02_>MakeEXEs_Ryuugan-ditto-1TB_btree_GCC.bat
    
    D:\Nakamichi_Ryuugan-ditto-1TB_bigtime_b-tree_2019-Feb-02_>gcc -O3 -msse4.1 -fomit-frame-pointer Nakamichi_Ryuugan-ditto-1TB_btree.c -o Nakamichi_Ryuugan-ditto-1TB_RAM_GCC730.exe -D_N_XMM -D_N_prefetch_4096 -D_N_alone -D_N_HIGH_PRIORITY -DHashInBITS=24 -DHashChunkSizeInBITS=24  -DRAMpoolInKB=5612048 -DBtreeHEURISTIC
    
    D:\Nakamichi_Ryuugan-ditto-1TB_bigtime_b-tree_2019-Feb-02_>gcc -O3 -msse4.1 -fomit-frame-pointer Nakamichi_Ryuugan-ditto-1TB_btree.c -o Nakamichi_Ryuugan-ditto-1TB_SSD_GCC730.exe -D_N_XMM -D_N_prefetch_4096 -D_N_alone -D_N_HIGH_PRIORITY -DHashInBITS=24 -DHashChunkSizeInBITS=24  -DRAMpoolInKB=5612048 -DExternalRAM -DBtreeHEURISTIC
    
    For *nix add:
    -D_POSIX_ENVIRONMENT_
    
    
    D:\Nakamichi_Ryuugan-ditto-1TB_bigtime_b-tree_2019-Feb-02_>Nakamichi_Ryuugan-ditto-1TB_RAM_GCC730.exe Arabian_Nights_complete.html
    Nakamichi 'Ryuugan-ditto-1TB', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced, muffinesque suggestion by Jim Dempsey enforced.
    Note0: Nakamichi 'Dragoneye' is 100% FREE, licenseless that is.
    Note1: Hamid Buzidi's LzTurbo ([a] FASTEST [Textual] Decompressor, Levels 19/29/39) retains kingship, his TurboBench (2017-Apr-07) proves the supremacy of LzTurbo, Turbo-Amazing!
    Note2: Conor Stokes' LZSSE2 ([a] FASTEST Textual Decompressor, Level 17) is embedded, all credits along with many thanks go to him.
    Note3: 'Ryuugan' predecessors are Washigan, Okamigan, Zato, Tsubame, Tengu-Tsuyo, Tengu, Rakka, Kokuen, Kinroba, Yoko, Kinutora, Jiten, Butsuhira, Suiken, Keigan, Kumataka, Washi, Aratama, Hitomi, Nekomata, Kitsune, Kinezumi, Sanbashi, Kaiko, Inazuma, Zangetsu, Hanabi, Hanazakari, Sanshi and Kaidanji.
    Note4: This variant is the SLOWEST compressor under the Sun! This compile can handle files up to 5120MB.
    Note5: The matchfinder is either 'Railgun_Trolldom' (matches longer than 18) or Leprechaun's B-tree order 3 (matches less or equal to 18).
    Note6: Instead of '_mm_loadu_si128' '_mm_lddqu_si128' is used.
    Note7: The lookahead 'Tsuyo' heuristic which looks one char ahead is applied thrice, still not strengthened, though.
    Note8: The compile made 2017-Oct-22, the decompression time measuring is done in 16x8 passes choosing the top score from 64 back-to-back runs - the goal - to enter [maximal] Turbo Mode.
    Note9: In GP/SSE4.1/AVX2 compile, the 24 matches become 3xQWORD/1xQWORD+1xXMMWORD/1xYMMWORD.
    NoteA: Maximum compression ratio is 44:1, for 704 bytes long matches within 1TB Sliding Window.
    NoteB: Please send me (at [email protected]) decompression results obtained on machines with fast CPU-RAM subsystems.
    NoteC: In this compile, clock() was replaced with time() - to counter bigtime stats misreporting.
    NoteD: Multi-way hashing allows each KeySize to occupy its own HASH pool, thus less RAM is in use - the LEAF is smaller.
    NoteE: In this revision, B-tree heuristics are in use, allowing skipping many unnecessary memmem() invocations.
    NoteF: In this latest (2019-Feb-02) compile, B-tree is the primary matchfinder, memmem() is used for 18+ matches, and for marginal cases.
    NoteG: The file being compressed should be 18 bytes or longer due to Building-Blocks being in range 4..18.
    Current priority class is HIGH_PRIORITY_CLASS.
    Allocating Source-Buffer 14 MB ...
    Allocating Source-Buffer 14 MB (REVERSED) ...
    Allocating Target-Buffer 46 MB ...
    Allocating Verification-Buffer 14 MB ...
    Leprechaun: Memory pool for B-tress is 5,612,048 KB.
    Leprechaun: In this revision 128MB 8-way hash is used which results in 8 x 16,777,216 internal B-Trees of order 3.
    Leprechaun: In this revision, 1 pass is to be made.
    Leprechaun: Allocating HASH memory 1,073,741,889 bytes ... OK
    Leprechaun: Allocating memory for B-tress 5481 MB ... OK
    Leprechaun: Size of input file: 15,583,440
    
    Leprechaun: Inserting keys/BBs of order 004 into B-trees, free RAM in B-tree pool is 5,612,048 KB ...
    Leprechaun: Inserting keys/BBs of order 006 into B-trees, free RAM in B-tree pool is 5,602,086 KB ...
    Leprechaun: Inserting keys/BBs of order 008 into B-trees, free RAM in B-tree pool is 5,533,534 KB ...
    Leprechaun: Inserting keys/BBs of order 010 into B-trees, free RAM in B-tree pool is 5,328,114 KB ...
    Leprechaun: Inserting keys/BBs of order 012 into B-trees, free RAM in B-tree pool is 4,918,554 KB ...
    Leprechaun: Inserting keys/BBs of order 014 into B-trees, free RAM in B-tree pool is 4,288,798 KB ...
    Leprechaun: Inserting keys/BBs of order 016 into B-trees, free RAM in B-tree pool is 3,451,465 KB ...
    Leprechaun: Inserting keys/BBs of order 018 into B-trees, free RAM in B-tree pool is 2,435,992 KB ...
    Leprechaun: Total Searches-n-Inserts Per Second: 4,617,312 SNIPS
    Leprechaun: RAM needed to house B-trees (relative to the file being ripped): 285N
    
    Compressing 15,583,440 bytes ...
    /; Each rotation means 64KB are encoded; Speed: 0,011,577 B/s; Done 100%; Compression Ratio: 3.09:1; Matches(24/32/48): 163,277/39,583/2,722; 128[+] long matches: 959; ETA: 0.00 days
    NumberOfFullLiterals (lower-the-better): 749
    Tsuyo_HEURISTIC_APPLIED_thrice_back-to-back: 0
    NumberOf(Tiny)Matches[Micro]Window (4)[16B]: 5865
    NumberOfMatches[Bheema]Window [128GB window]: 1058
    RAM-to-RAM performance: 11577 B/s.
    Compressed to 5,039,053 bytes.
    Source-file-Hash(FNV1A_YoshimitsuTRIAD) = 0x065e,e94a
    Target-file-Hash(FNV1A_YoshimitsuTRIAD) = 0xb651,2c30
    Decompressing 5,039,053 (being the compressed stream) bytes ...
    RAM-to-RAM performance: 681 MB/s.
    Verification (input and output sizes match) OK.
    Verification (input and output blocks match) OK.
    
    D:\Nakamichi_Ryuugan-ditto-1TB_bigtime_b-tree_2019-Feb-02_>
    


    Right now waiting for SSD compile to finish, wanna report the SSD vs RAM actual speeds...

    Code (Text):
    D:\Nakamichi_Ryuugan-ditto-1TB_bigtime_b-tree_2019-Feb-02_>Nakamichi_Ryuugan-ditto-1TB_ssd_GCC730.exe Arabian_Nights_complete.html
    Nakamichi 'Ryuugan-ditto-1TB', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced, muffinesque suggestion by Jim Dempsey enforced.
    Note0: Nakamichi 'Dragoneye' is 100% FREE, licenseless that is.
    Note1: Hamid Buzidi's LzTurbo ([a] FASTEST [Textual] Decompressor, Levels 19/29/39) retains kingship, his TurboBench (2017-Apr-07) proves the supremacy of LzTurbo, Turbo-Amazing!
    Note2: Conor Stokes' LZSSE2 ([a] FASTEST Textual Decompressor, Level 17) is embedded, all credits along with many thanks go to him.
    Note3: 'Ryuugan' predecessors are Washigan, Okamigan, Zato, Tsubame, Tengu-Tsuyo, Tengu, Rakka, Kokuen, Kinroba, Yoko, Kinutora, Jiten, Butsuhira, Suiken, Keigan, Kumataka, Washi, Aratama, Hitomi, Nekomata, Kitsune, Kinezumi, Sanbashi, Kaiko, Inazuma, Zangetsu, Hanabi, Hanazakari, Sanshi and Kaidanji.
    Note4: This variant is the SLOWEST compressor under the Sun! This compile can handle files up to 5120MB.
    Note5: The matchfinder is either 'Railgun_Trolldom' (matches longer than 18) or Leprechaun's B-tree order 3 (matches less or equal to 18).
    Note6: Instead of '_mm_loadu_si128' '_mm_lddqu_si128' is used.
    Note7: The lookahead 'Tsuyo' heuristic which looks one char ahead is applied thrice, still not strengthened, though.
    Note8: The compile made 2017-Oct-22, the decompression time measuring is done in 16x8 passes choosing the top score from 64 back-to-back runs - the goal - to enter [maximal] Turbo Mode.
    Note9: In GP/SSE4.1/AVX2 compile, the 24 matches become 3xQWORD/1xQWORD+1xXMMWORD/1xYMMWORD.
    NoteA: Maximum compression ratio is 44:1, for 704 bytes long matches within 1TB Sliding Window.
    NoteB: Please send me (at [email protected]) decompression results obtained on machines with fast CPU-RAM subsystems.
    NoteC: In this compile, clock() was replaced with time() - to counter bigtime stats misreporting.
    NoteD: Multi-way hashing allows each KeySize to occupy its own HASH pool, thus less RAM is in use - the LEAF is smaller.
    NoteE: In this revision, B-tree heuristics are in use, allowing skipping many unnecessary memmem() invocations.
    NoteF: In this latest (2019-Feb-02) compile, B-tree is the primary matchfinder, memmem() is used for 18+ matches, and for marginal cases.
    NoteG: The file being compressed should be 18 bytes or longer due to Building-Blocks being in range 4..18.
    Current priority class is HIGH_PRIORITY_CLASS.
    Allocating Source-Buffer 14 MB ...
    Allocating Source-Buffer 14 MB (REVERSED) ...
    Allocating Target-Buffer 46 MB ...
    Allocating Verification-Buffer 14 MB ...
    Leprechaun: Memory pool for B-tress is 5,612,048 KB.
    Leprechaun: In this revision 128MB 8-way hash is used which results in 8 x 16,777,216 external B-Trees of order 3.
    Leprechaun: In this revision, 1 pass is to be made.
    Leprechaun: Allocating HASH memory 1,073,741,889 bytes ... OK
    Leprechaun: Allocating/ZEROing 5,746,737,166 bytes swap file ... OK
    Leprechaun: Size of input file: 15,583,440
    
    Leprechaun: Inserting keys/BBs of order 004 into B-trees, free RAM in B-tree pool is 5,612,048 KB ...
    Leprechaun: Inserting keys/BBs of order 006 into B-trees, free RAM in B-tree pool is 5,602,086 KB ...
    Leprechaun: Inserting keys/BBs of order 008 into B-trees, free RAM in B-tree pool is 5,533,534 KB ...
    Leprechaun: Inserting keys/BBs of order 010 into B-trees, free RAM in B-tree pool is 5,328,114 KB ...
    Leprechaun: Inserting keys/BBs of order 012 into B-trees, free RAM in B-tree pool is 4,918,554 KB ...
    Leprechaun: Inserting keys/BBs of order 014 into B-trees, free RAM in B-tree pool is 4,288,798 KB ...
    Leprechaun: Inserting keys/BBs of order 016 into B-trees, free RAM in B-tree pool is 3,451,465 KB ...
    Leprechaun: Inserting keys/BBs of order 018 into B-trees, free RAM in B-tree pool is 2,435,992 KB ...
    Leprechaun: Total Searches-n-Inserts Per Second: 83,222 SNIPS
    Leprechaun: RAM needed to house B-trees (relative to the file being ripped): 285N
    Leprechaun: Total IOPS for 109,387,689 'freads' and 140,066,348 'fwrites' (of packets 114 bytes long) during loading traversing all orders: 166,524 IOPS
    
    Compressing 15,583,440 bytes ...
    /; Each rotation means 64KB are encoded; Speed: 0,001,764 B/s; Done 100%; Compression Ratio: 3.09:1; Matches(24/32/48): 163,277/39,583/2,722; 128[+] long matches: 959; ETA: 0.00 days
    NumberOfFullLiterals (lower-the-better): 749
    Tsuyo_HEURISTIC_APPLIED_thrice_back-to-back: 0
    NumberOf(Tiny)Matches[Micro]Window (4)[16B]: 5865
    NumberOfMatches[Bheema]Window [128GB window]: 1058
    RAM-to-RAM performance: 1764 B/s.
    Compressed to 5,039,053 bytes.
    Source-file-Hash(FNV1A_YoshimitsuTRIAD) = 0x065e,e94a
    Target-file-Hash(FNV1A_YoshimitsuTRIAD) = 0xb651,2c30
    Decompressing 5,039,053 (being the compressed stream) bytes ...
    RAM-to-RAM performance: 681 MB/s.
    Verification (input and output sizes match) OK.
    Verification (input and output blocks match) OK.
    
    D:\Nakamichi_Ryuugan-ditto-1TB_bigtime_b-tree_2019-Feb-02_>


    Okay, the RAM (internal B-trees) vs SSD (external B-trees) speed ratios is 11,577 B/s : 1,764 B/s or 6.5:1, discrepancy is so minimal, especially with USB connected (not in super-speed) and write caching disabled cheapest 240GB model.

    Love people like Sir Burton, his immortal work stands, that is why these 16 volumes are one of my primary testdatasets:

    [​IMG]

    The full package is on my Internet Drive:
    Nakamichi_Ryuugan-ditto-1TB_bigtime_b-tree_2019-Feb-02.7z

    It simply opens the gate to teraheavy matchfinding :p
     

    Attached Files:

    Last edited: Feb 3, 2019
    • Informative Informative x 2
  6. Sanmayce

    Sanmayce New Member

    9
    1
    3
    Jun 9, 2018
    Sofia
    Ratings:
    +8
    Local Time:
    4:18 AM
    Some heavy trashing of my Samsung 860 PRO 256GB SSD commenced...
    In the PDF excerpt below 600+ million B-trees are rooted in the hash table (each slot is a root), if the machine has 85GB of system RAM then all the B-trees will be in the internal/physical not in the external/SSD RAM.

    TFD_booklet_half.png

    For better resolution of above image try this:
    www.sanmayce.com/Nakamichi/TFD_booklet.png

    To reiterate, Nakamichi is written in such a way that it is capable of utilizing internal&external RAM ... greedily.
    If the machine has 128GB of RAM then the 10 keysizes/matchlengths can be put into their own segments of the HASHPOT, e.g. if the hashsize=30 then the hash pool for a single keysize/matchlength is (8bytes x 2^30) = 8GB, for the 10 keysizes the total hash block is 80GB, it equals possible 10x1=10 billion B-trees - this means matches within 1GB long file can be found in maximum 2 attempts - the MAX height of the B-trees!

    As a side-effect I will torture (i.e. ripping apart) SSDs to their death, thus be instrumental in assessing the endurance of PRO SSDs, hopefully Samsung and heavy-duty users will thank me. I took a snapshot of the Samsung's status as reported by Sentinel before all the ripping and today:

    BEFORE_ALL_RUNS.png

    As you can see, a fortnight later the health went from 100% down to 96% :(

    BEFORE_ALL_RUNS_2019-feb-24.png

    TEXTORAMIC_Decompression_Showdown_2019-Feb-21.pdf:
    https://drive.google.com/file/d/162cikKQ3QDiXhUaz_uBJG9unXhoJWwTQ/view?usp=sharing

    Nakamichi_Ryuugan-ditto-1TB_btree_source.pdf:
    https://drive.google.com/file/d/1dFtfvpcE-TUo_D_Ol9C8FSDXlbzyIll9/view?usp=sharing
     
    Last edited: Feb 25, 2019
    • Informative Informative x 2
..