Get the most out of your Centmin Mod LEMP stack
Become a Member

MariaDB MariaDB Profile-Guided Optimization?

Discussion in 'MariaDB & General MySQL news & discussions' started by tininho, Nov 5, 2019.

  1. tininho

    tininho Active Member

    182
    44
    28
    May 22, 2019
    eu
    Ratings:
    +135
    Local Time:
    10:49 PM
  2. eva2000

    eva2000 Administrator Staff Member

    54,107
    12,179
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,738
    Local Time:
    6:49 AM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    Yeah I have built MariaDB custom RPMs with Intel cpu optimisations via GCC compiler and not even needing Intel compiler - which can be up to 50% faster than generic MariaDB RPM builds. But haven't tried Profile Guided Optimizations with MariaDB before. I wouldn't be surprised PGO trained MariaDB MySQL would be faster just like Centmin Mod's PHP 7+ PGO trained PHP-FPM binaries. Might try it out for my own curiosity but wouldn't be something I'd add to publicly available Centmin Mod as it's something I wouldn't be willing to provide support for (for free). I only add to Centmin Mod what I am willing to support :)

    But MariaDB MySQL is about data integrity and stability so usually best to use official MariaDB RPMs for bug fixes and security.
     
  3. eva2000

    eva2000 Administrator Staff Member

    54,107
    12,179
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,738
    Local Time:
    6:49 AM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    and 40% better single-threaded performance in MariaDB
     
  4. David Y

    David Y New Member

    5
    2
    3
    Dec 23, 2019
    Toronto
    Ratings:
    +7
    Local Time:
    3:49 PM
    On the topic of PGO and MySQL/MariaDB, take a look at this talk by two Dropbox SREs:
    Optimizing MySQL without SQL or touching my.cnf | Percona Live - Open Source Database Conference 2017

    Some takeaways: Apparently Clang/LLVM can produce better results than GCC, with different GCC versions also making a big difference in performance. Using GCC with -march=native -mtune=native and -flto (with PGO) all helped.

    The main challenges using PGO this way are:

    1) The binary is fixed to your training run and can actually be slower if your workload changes.

    2) You need to go through the work of building and training yourself. It can all be very labour intensive.

    3) There is a bit of risk involved when building MySQL yourself. Did you set the right source config options such that MySQL will function exactly as before?

    To overcome this I would recommend trying Dynimize. It performs PGO on live running processes by using in-memory machine code plus live profiling data at runtime, so your profiling data is as relevant as possible. It's a machine code to machine code JIT compiler that runs as a separate background process and interfaces with target processes through various system calls. The advantages are:

    1) Trivial setup (as easy as tweaking your my.cnf).

    2) You don't need to build MariaDB from source as it only uses live in memory machine code. That also means target program functionality will not change.

    3) Zero downtime. Application restart not required.

    4) It can automatically adapt to changes in workload profile if you start it with the -reoptimize flag. You can also tell it to redo the PGO at any point, or undo the PGO and revert back to the original machine code, all without mysqld restart.

    5) It optimizes across all user space machine code in a target process, including shared libraries. It also adds memory address dependant optimizations that can only be done with run-time information.

    6) No changes to object code on disk.

    7) It has been thoroughly stress tested across a large range of MariaDB and MySQL versions/distributions.

    The PGO phase for Dynimize is quite short and only takes one to a few minutes, depending on how CPU intensive your workload is and therefor how quickly it can collect profiling data. You will need a relatively CPU intensive mysqld process. It does temporarily consume 4 GB of memory during the PGO phase, but you can just increase swap instead of RAM for that if you are short on RAM. The perf improvements are very similar to what you get with GCC PGO: https://dynimize.com/performanceSpeedup

    Keep in mind that when you improve the CPU performance of MySQL or any other program, the gains may manifest as a reduction in CPU usage rather than a performance boost (or a bit of both), depending on where your bottlenecks are.

    Disclosure: I'm the founder of Dynimize and I actually discovered Centmin Mod from a Dynimize user using it with the Centmin Mod stack :)
     
  5. eva2000

    eva2000 Administrator Staff Member

    54,107
    12,179
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,738
    Local Time:
    6:49 AM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    Interesting I vaguely recall coming across Dynimize in my PGO research when I building out my PGO PHP routines. Dynimize approach is interesting in that it works on the in memory machine code. Guess this could work with other programs like memcached and redis servers and compression algorithms ?

    Thanks found the video for that talk at Optimizing MySQL without SQL or touching my.cnf :)

    2017 is quite a while ago, GCC 8+ has caught up alot with Clang in general performance since 2017. So performance can vary between cpu types/architectures. But some programs lend themselves better to some compilers i.e. xxHash hashing algorithm performs better compiled with Clang versus GCC.
     
  6. David Y

    David Y New Member

    5
    2
    3
    Dec 23, 2019
    Toronto
    Ratings:
    +7
    Local Time:
    3:49 PM
    One could always give it a try with memcached/redis, however Dynimize only optimizes user mode machine code at the moment. The redis and memcached workloads I've benchmarked spend most of their time in kernel mode. Dido for NGINX. There are plans to support kernel mode in the future, and those workloads would be an excellent match for PGO since they spend most their time in icache misses. Compression algorithms typically don't, so it would depend on other opportunities that may exist in the specific compression algorithm.

    Good point. There's also something to be said for sticking with the official project compiler.

    .
     
  7. tininho

    tininho Active Member

    182
    44
    28
    May 22, 2019
    eu
    Ratings:
    +135
    Local Time:
    10:49 PM
    @David Y I've tried Dynimize on Hetzner AX-52 (Ryzen 3700X with Centminmod) but I have yet to see any difference on my lazy benchmark, meaning WPPerformanceTester. I guess it shows up on more precise testing and under heavier workloads?

     
  8. David Y

    David Y New Member

    5
    2
    3
    Dec 23, 2019
    Toronto
    Ratings:
    +7
    Local Time:
    3:49 PM
    From the Dynimize User Guide:
    @tininho your dyni -status means the shared libraries or mysqld on disk has been updated since you started that mysqld process, so they are out of sync. Dynimize won't optimize it until you restart MariaDB.

    Note that if the performance of the stack is dependant on several components and not completely limited by mysqld, then expect some component of the benefit to be a mysqld CPU usage reduction. That is often the case with WordPress.
     
  9. tininho

    tininho Active Member

    182
    44
    28
    May 22, 2019
    eu
    Ratings:
    +135
    Local Time:
    10:49 PM
    @David Y I get ~10% boost on performance (req/s), but more notably it gets very stable, almost constant. Nice tip!
     
  10. David Y

    David Y New Member

    5
    2
    3
    Dec 23, 2019
    Toronto
    Ratings:
    +7
    Local Time:
    3:49 PM
    That's good to know. Thanks for the feedback!