Welcome to Centmin Mod Community
Become a Member

Benchmarks Optimizing AMD EPYC governor setting to reduce TTFB

Discussion in 'Dedicated server hosting' started by deltahf, Oct 5, 2021.

Tags:
  1. deltahf

    deltahf Premium Member Premium Member

    499
    219
    43
    Jun 8, 2014
    Ratings:
    +389
    Local Time:
    6:08 AM
    After taking delivery of my AMD EPYC 7001-series server, I mentioned I was disappointed in the Centminmod install time, and @eva2000 noticed I had not tuned the CPU's clock/power management profile.

    I was not familiar with that or how to do it, so I did some research. I will share what I learned here along with some interesting benchmarks.

    This AMD developer document is a tuning guide for EPYC CPUs. See sections 6.5-6.6 regarding the governors:

    http://developer.amd.com/wp-content/resources/56420.pdf

    I confirmed that my server was running in "conservative" mode at only ~1.2 Ghz.

    Code (Text):
    $ cpupower frequency-info
    analyzing CPU 0:
      driver: acpi-cpufreq
      CPUs which run at the same hardware frequency: 0
      CPUs which need to have their frequency coordinated by software: 0
      maximum transition latency:  Cannot determine or is not supported.
      hardware limits: 1.20 GHz - 2.10 GHz
      available frequency steps:  2.10 GHz, 1.70 GHz, 1.20 GHz
      available cpufreq governors: conservative userspace powersave ondemand performance
      current policy: frequency should be within 1.20 GHz and 2.10 GHz.
                      The governor "conservative" may decide which speed to use
                      within this range.
      current CPU frequency: 1.20 GHz (asserted by call to hardware)
      boost state support:
        Supported: yes
        Active: yes
        Boost States: 0
        Total States: 3
        Pstate-P0:  2100MHz
        Pstate-P1:  1700MHz
        Pstate-P2:  1200MHz
    


    Code (Text):
    $ cpupower monitor
        |Mperf               || Idle_Stats
    CPU | C0   | Cx   | Freq || POLL | C1   | C2
       0|  0.12| 99.88|  1198||  0.00|  0.44| 99.45
       8|  0.08| 99.92|  1198||  0.00|  0.28| 99.65
       1|  0.11| 99.89|  1199||  0.00|  0.43| 99.47
       9|  0.05| 99.95|  1199||  0.00|  0.26| 99.70
       2|  0.13| 99.87|  1199||  0.00|  0.52| 99.37
      10|  0.15| 99.85|  1198||  0.00|  0.17| 99.69
       3|  0.96| 99.04|  1199||  0.00|  0.51| 98.55
      11|  0.14| 99.86|  1197||  0.00|  0.26| 99.61
       4|  0.11| 99.89|  1200||  0.00|  0.21| 99.69
      12|  0.09| 99.91|  1197||  0.00|  0.38| 99.57
       5|  0.08| 99.92|  1200||  0.00| 10.54| 89.36
      13|  0.04| 99.96|  1195||  0.00|  7.37| 92.52
       6|  0.13| 99.87|  1199||  0.00|  0.10| 99.78
      14|  0.14| 99.86|  1198||  0.00|  0.27| 99.54
       7|  0.19| 99.81|  1198||  0.00|  1.46| 98.36
      15|  0.21| 99.79|  1199||  0.00|  1.03| 98.77
    


    It's very easy to change to performance mode!

    Code (Text):
    $ cpupower frequency-set -g performance
    Setting cpu: 0
    Setting cpu: 1
    Setting cpu: 2
    Setting cpu: 3
    Setting cpu: 4
    Setting cpu: 5
    Setting cpu: 6
    Setting cpu: 7
    Setting cpu: 8
    Setting cpu: 9
    Setting cpu: 10
    Setting cpu: 11
    Setting cpu: 12
    Setting cpu: 13
    Setting cpu: 14
    Setting cpu: 15
    


    Code (Text):
    $ cpupower frequency-info
    analyzing CPU 0:
      driver: acpi-cpufreq
      CPUs which run at the same hardware frequency: 0
      CPUs which need to have their frequency coordinated by software: 0
      maximum transition latency:  Cannot determine or is not supported.
      hardware limits: 1.20 GHz - 2.10 GHz
      available frequency steps:  2.10 GHz, 1.70 GHz, 1.20 GHz
      available cpufreq governors: conservative userspace powersave ondemand performance
      current policy: frequency should be within 1.20 GHz and 2.10 GHz.
                      The governor "performance" may decide which speed to use
                      within this range.
      current CPU frequency: 2.10 GHz (asserted by call to hardware)
      boost state support:
        Supported: yes
        Active: yes
        Boost States: 0
        Total States: 3
        Pstate-P0:  2100MHz
    


    Code (Text):
     $ cpupower monitor
        |Mperf               || Idle_Stats
    CPU | C0   | Cx   | Freq || POLL | C1   | C2
       0|  0.05| 99.95|  2618||  0.00|  0.50| 99.46
       8|  0.04| 99.96|  2657||  0.00|  0.29| 99.68
       1|  0.04| 99.96|  2614||  0.00|  0.32| 99.65
       9|  0.03| 99.97|  2647||  0.00|  0.68| 99.29
       2|  0.17| 99.83|  2783||  0.00|  0.55| 99.29
      10|  0.08| 99.92|  2555||  0.00|  0.28| 99.66
       3|  0.06| 99.94|  2619||  0.00|  0.71| 99.24
      11|  0.03| 99.97|  2641||  0.00|  0.78| 99.19
       4|  0.05| 99.95|  2633||  0.00|  0.59| 99.36
      12|  0.04| 99.96|  2632||  0.00|  0.42| 99.55
       5|  0.04| 99.96|  2633||  0.00| 10.59| 89.38
      13|  0.02| 99.98|  2806||  0.00|  0.37| 99.61
       6|  0.06| 99.94|  2622||  0.00|  0.20| 99.75
      14|  0.08| 99.92|  2559||  0.00|  0.27| 99.66
       7|  0.07| 99.93|  2720||  0.00|  0.62| 99.31
      15|  0.08| 99.92|  2813||  0.00|  0.83| 99.10
    


    Changing this setting had a significant impact on Time-To-First-Byte (TTFB) page generation times for both WordPress and XenForo 2.2.

    TTFB time in milliseconds by page type (PHP 7.4, MariaDB 10.4):

    Screen Shot 2021-10-04 at 3.40.15 PM.png

    Enabling performance mode reduced TTFB by ~26% on XF pages ~43% on WordPress pages. It also brought TTFB times below those of my older Intel Xeon E3-1230v5 CPU, even though it is running at a much higher clock speed compared to the EPYC (3.4Ghz vs 2.1Ghz). This shows the expected improvement of the more modern EPYC architecture, and of course it also has double the cores and threads of the E3-1230.

    Huge thanks to @eva2000 for mentioning the governor tuning. I know it is common knowledge for you professional sysadmins, but this is my first AMD server and I wasn't aware of it!


    Now I need to start tuning WordPress to get those TTFB times down even more...
     
  2. rdan

    rdan Premium Member Premium Member

    5,141
    1,282
    113
    May 25, 2014
    Ratings:
    +1,966
    Local Time:
    7:08 PM
    Mainline
    10.2
    What if you try tuned?
    Code:
    yum install tuned -y
    systemctl enable --now tuned
    
    tuned-adm active
    tuned-adm profile latency-performance
     
  3. eva2000

    eva2000 Administrator Staff Member

    47,898
    10,929
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +16,984
    Local Time:
    9:08 PM
    Nginx 1.21.x
    MariaDB 10.x
    Love it when folks share their own journeys of discovery and back them up with benchmarks of the before and after :D (y) :cool:
     
  4. eva2000

    eva2000 Administrator Staff Member

    47,898
    10,929
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +16,984
    Local Time:
    9:08 PM
    Nginx 1.21.x
    MariaDB 10.x
    Be careful with that, sometimes doesn't do what you expected depending on Linux Kernel and CPU models used. Also test and benchmark before and after results :D
     
  5. eva2000

    eva2000 Administrator Staff Member

    47,898
    10,929
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +16,984
    Local Time:
    9:08 PM
    Nginx 1.21.x
    MariaDB 10.x
  6. rdan

    rdan Premium Member Premium Member

    5,141
    1,282
    113
    May 25, 2014
    Ratings:
    +1,966
    Local Time:
    7:08 PM
    Mainline
    10.2
    Any more infos about this?
    An article or forum discussion?
    Thanks.
     
  7. eva2000

    eva2000 Administrator Staff Member

    47,898
    10,929
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +16,984
    Local Time:
    9:08 PM
    Nginx 1.21.x
    MariaDB 10.x
    From personal experience ;) :D CPU clock speeds might not work as expected resulting in lower performance during testing/benchmarking real loads.