Welcome to Centmin Mod Community
Become a Member

Sysadmin Nginx 100% use Cpu

Discussion in 'System Administration' started by upgrade81, Apr 5, 2018.

  1. Matt

    Matt Moderator Staff Member

    862
    387
    63
    May 25, 2014
    Rotherham, UK
    Ratings:
    +606
    Local Time:
    7:47 AM
    1.5.15
    MariaDB 10.2
    If it was a ddos, then why does a restart of nginx bring the CPU usage back down to normal, which no reduction in the traffic?
     
  2. eva2000

    eva2000 Administrator Staff Member

    45,676
    10,371
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +16,094
    Local Time:
    5:47 PM
    Nginx 1.19.x
    MariaDB 5.5/10.x
    Indeed... have you tried enabling nginx debug mode for your ISP ip only and see if logs reveal more ?

    To enable, add to persistent config file /etc/centminmod/custom_config.inc the variable NGINX_DEBUG=y and recompile Nginx via centmin.sh menu option 4 and then setting error_log in nginx vhosts to debug as outlined at nginx.org/en/docs/debugging_log.html & wiki.nginx.org/Debugging.

    You do not want to leave Nginx debug mode running forever, so after debugging, set NGINX_DEBUG=n or remove the variable from the persistent config file /etc/centminmod/custom_config.inc and recompile Nginx again via centmin.sh menu option 4 and remove error_log debugging mode to disable Nginx debug mode again.

    To update your Centmin Mod builds code for Nginx debug mode support if you do not have a NGINX_DEBUG variable in centmin.sh, follow instructions at centminmod.com/upgrade.html and respective version threads below:
    Centmin Mod is provided as is, but you can try debugging mode for Nginx for further troubleshooting if you have problems with Nginx (i.e. segfaults / signal 11 issues) as outlined at nginx.org/en/docs/debugging_log.html & wiki.nginx.org/Debugging.
     
  3. upgrade81

    upgrade81 Premium Member Premium Member

    266
    16
    18
    Sep 5, 2016
    Italy
    Ratings:
    +27
    Local Time:
    8:47 AM
    1.17
    10.3
    Hi, I think that doing a complete debug of nginx is excessive.
    also because the same identical thing happens on other 2 VM one with 2 cores the other with 4 cores that reside on the same server.
    maybe at this point it could be a hardware problem.

    now I also hear MATT on what to do.
     
  4. upgrade81

    upgrade81 Premium Member Premium Member

    266
    16
    18
    Sep 5, 2016
    Italy
    Ratings:
    +27
    Local Time:
    8:47 AM
    1.17
    10.3
    do you notice something strange?

    Code (Text):
    netstat -s
    Ip:
        148235878 total packets received
        217299 with invalid addresses
        0 forwarded
        0 incoming packets discarded
        147784086 incoming packets delivered
        125234272 requests sent out
        4063 outgoing packets dropped
        42 dropped because of missing route
        136 reassemblies required
        68 packets reassembled ok
        58 fragments failed
    Icmp:
        3264948 ICMP messages received
        195253 input ICMP message failed.
        InCsumErrors: 55
        ICMP input histogram:
            destination unreachable: 3237053
            timeout in transit: 21627
            redirects: 3
            echo requests: 6193
            timestamp request: 17
        7160 ICMP messages sent
        0 ICMP messages failed
        ICMP output histogram:
            destination unreachable: 950
            echo replies: 6193
            timestamp replies: 17
    IcmpMsg:
            InType3: 3237053
            InType5: 3
            InType8: 6193
            InType11: 21627
            InType13: 17
            OutType0: 6193
            OutType3: 950
            OutType14: 17
    Tcp:
        687500 active connections openings
        2041389 passive connection openings
        12555 failed connection attempts
        137291 connection resets received
        128 connections established
        150127715 segments received
        194846553 segments send out
        14280455 segments retransmited
        126 bad segments received.
        543120 resets sent
    Udp:
        113450 packets received
        855 packets to unknown port received.
        0 packet receive errors
        114249 packets sent
        0 receive buffer errors
        0 send buffer errors
        IgnoredMulti: 87
    UdpLite:
    TcpExt:
        10936 resets received for embryonic SYN_RECV sockets
        18850 ICMP packets dropped because they were out-of-window
        2 ICMP packets dropped because socket was locked
        1220304 TCP sockets finished time wait in fast timer
        58637 packets rejects in established connections because of timestamp
        619496 delayed acks sent
        7782 delayed acks further delayed because of locked socket
        Quick ack mode was activated 180009 times
        99997 times the listen queue of a socket overflowed
        100082 SYNs to LISTEN sockets dropped
        5947232 packet headers predicted
        101880574 acknowledgments not containing data payload received
        27597629 predicted acknowledgments
        408 times recovered from packet loss due to fast retransmit
        3756360 times recovered from packet loss by selective acknowledgements
        644 bad SACK blocks received
        Detected reordering 3088421 times using SACK
        Detected reordering 127 times using reno fast retransmit
        Detected reordering 37636 times using time stamp
        146245 congestion windows fully recovered without slow start
        28720 congestion windows partially recovered using Hoe heuristic
        129217 congestion windows recovered without slow start by DSACK
        122994 congestion windows recovered without slow start after partial ack
        TCPLostRetransmit: 1612775
        97 timeouts after reno fast retransmit
        44018 timeouts after SACK recovery
        20399 timeouts in loss state
        12340179 fast retransmits
        741707 retransmits in slow start
        166687 other TCP timeouts
        TCPLossProbes: 1633116
        TCPLossProbeRecovery: 5646
        40 classic Reno fast retransmits failed
        283167 SACK retransmits failed
        182705 DSACKs sent for old packets
        702 DSACKs sent for out of order packets
        6318370 DSACKs received
        1135754 DSACKs for out of order packets received
        153856 connections reset due to unexpected data
        5752 connections reset due to early user close
        22051 connections aborted due to timeout
        8279 times unable to send RST due to no memory
        TCPSACKDiscard: 6254
        TCPDSACKIgnoredOld: 8635
        TCPDSACKIgnoredNoUndo: 605787
        TCPSpuriousRTOs: 9158
        TCPSackShifted: 469561
        TCPSackMerged: 7409584
        TCPSackShiftFallback: 33732105
        TCPRetransFail: 8455
        TCPRcvCoalesce: 2033105
        TCPOFOQueue: 64822
        TCPOFOMerge: 780
        TCPChallengeACK: 3288
        TCPSYNChallenge: 154
        TCPFastOpenPassiveFail: 29
        TCPFastOpenCookieReqd: 11
        TCPSpuriousRtxHostQueues: 121
        TCPAutoCorking: 3974103
        TCPFromZeroWindowAdv: 125
        TCPToZeroWindowAdv: 125
        TCPWantZeroWindowAdv: 4711
        TCPSynRetrans: 68938
        TCPOrigDataSent: 182725487
        TCPHystartTrainDetect: 112487
        TCPHystartTrainCwnd: 5623554
        TCPHystartDelayDetect: 108565
        TCPHystartDelayCwnd: 6017509
        TCPACKSkippedSynRecv: 1987
        TCPACKSkippedPAWS: 37737
        TCPACKSkippedSeq: 17530
        TCPACKSkippedFinWait2: 12
        TCPACKSkippedTimeWait: 134
        TCPACKSkippedChallenge: 673
        TCPWinProbe: 6718
        TCPKeepAlive: 2815
        TCPMTUPFail: 26
        TCPMTUPSuccess: 2725
    IpExt:
        InBcastPkts: 57784
        InOctets: 69064809414
        OutOctets: 307060701424
        InBcastOctets: 7729816
        InNoECTPkts: 148164411
        InECT1Pkts: 23
        InECT0Pkts: 364409
        InCEPkts: 12675
    
     
  5. upgrade81

    upgrade81 Premium Member Premium Member

    266
    16
    18
    Sep 5, 2016
    Italy
    Ratings:
    +27
    Local Time:
    8:47 AM
    1.17
    10.3
  6. eva2000

    eva2000 Administrator Staff Member

    45,676
    10,371
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +16,094
    Local Time:
    5:47 PM
    Nginx 1.19.x
    MariaDB 5.5/10.x
    Tried updating 123.09beta01 via cmupdate command + centmin.sh menu option 4 recompile as past few days did some updates to nginx. Though nginx debug build will likely reveal more info.
     
  7. upgrade81

    upgrade81 Premium Member Premium Member

    266
    16
    18
    Sep 5, 2016
    Italy
    Ratings:
    +27
    Local Time:
    8:47 AM
    1.17
    10.3
    it was done 2 days ago and still today.
    nothing has been resolved.
     
  8. eva2000

    eva2000 Administrator Staff Member

    45,676
    10,371
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +16,094
    Local Time:
    5:47 PM
    Nginx 1.19.x
    MariaDB 5.5/10.x
    Also try changing persistent config file /etc/centminmod/custom_config.inc to below as alot of those settings are already default in 123.09beta01 try with HPACK disabled too
    Code (Text):
    NGINX_PAGESPEED=y
    NGXDYNAMIC_NGXPAGESPEED='y'
    NGINX_LIBBROTLI='y'
    NGXDYNAMIC_BROTLI='y'
    PHPMSSQL='y'
    PHP_PGO='y'
    #PHP_PGO_CENTOSSIX='y'
    #NGINX_DEVTOOLSETGCC='y'
    #GENERAL_DEVTOOLSETGCC='y'
    
    # -----set = y to put nginx, php and mariadb major version updates into 503
    # maintenance mode https://community.centminmod.com/posts/26485/
    NGINX_UPDATEMAINTENANCE='y'
    PHP_UPDATEMAINTENANCE='y'
    MARIADB_UPDATEMAINTENANCE='y'
    
    #------nginx
    LETSENCRYPT_DETECT='y'
    #NGINX_DYNAMICTLS='n'
    #NGINX_HPACK='y'
    
    #PHP Custom
    #PHP_VERSION='7.1.15'
    PHPGEOIP_ALWAYS='n'
    

    and if that doesn't work try emptying /etc/centminmod/custom_config.inc and do nginx recompile and see how it fairs to rule out some conflicts.
     
  9. eva2000

    eva2000 Administrator Staff Member

    45,676
    10,371
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +16,094
    Local Time:
    5:47 PM
    Nginx 1.19.x
    MariaDB 5.5/10.x
    nginx debug mode ???
     
  10. eva2000

    eva2000 Administrator Staff Member

    45,676
    10,371
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +16,094
    Local Time:
    5:47 PM
    Nginx 1.19.x
    MariaDB 5.5/10.x
    Just noticed that one nginx kernel segfault with ngx brotli module.. nginx debug build would be better able to dig deeper. But try disable ngx_brotli as well

    what's your kernel version info and cpu info ?

    output for
    Code (Text):
    nginx -V
    

    wrap output of nginx -V in quote tags

    and wrap the rest in CODE tags
    Code (Text):
    uname -r
    

    Code (Text):
    lscpu
    

    Code (Text):
    cat /proc/cpuinfo
    
     
  11. upgrade81

    upgrade81 Premium Member Premium Member

    266
    16
    18
    Sep 5, 2016
    Italy
    Ratings:
    +27
    Local Time:
    8:47 AM
    1.17
    10.3
    Code:
    4.17.0-1.el7.elrepo.x86_64
    
    Code (Text):
    Architecture:          x86_64
    CPU op-mode(s):        32-bit, 64-bit
    Byte Order:            Little Endian
    CPU(s):                4
    On-line CPU(s) list:   0-3
    Thread(s) per core:    1
    Core(s) per socket:    1
    Socket(s):             4
    NUMA node(s):          1
    Vendor ID:             GenuineIntel
    CPU family:            6
    Model:                 63
    Model name:            Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
    Stepping:              2
    CPU MHz:               3499.996
    BogoMIPS:              6999.99
    Hypervisor vendor:     KVM
    Virtualization type:   full
    L1d cache:             32K
    L1i cache:             32K
    L2 cache:              4096K
    NUMA node0 CPU(s):     0-3
    Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt
    


    Code (Text):
    processor       : 3
    vendor_id       : GenuineIntel
    cpu family      : 6
    model           : 63
    model name      : Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
    stepping        : 2
    microcode       : 0x1
    cpu MHz         : 3499.996
    cache size      : 4096 KB
    physical id     : 3
    siblings        : 1
    core id         : 0
    cpu cores       : 1
    apicid          : 3
    initial apicid  : 3
    fpu             : yes
    fpu_exception   : yes
    cpuid level     : 13
    wp              : yes
    flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt
    bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass
    bogomips        : 6999.99
    clflush size    : 64
    cache_alignment : 64
    address sizes   : 46 bits physical, 48 bits virtual
    power management:
    


    I am also attaching a screen related to perf top just when nginx squirts 100%.
    I remind you that on this machine there are only 2 my personal corporate websites, of news, we use Wordpress.
     

    Attached Files:

  12. eva2000

    eva2000 Administrator Staff Member

    45,676
    10,371
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +16,094
    Local Time:
    5:47 PM
    Nginx 1.19.x
    MariaDB 5.5/10.x
    I'd try bare bones persistent config file suggestion or empty persistent config with/without ngx_brotli enabled Sysadmin - Nginx 100% use Cpu

    also update kernel as 4.17 is pretty old for ELREPO repo should be at least at 4.18.6 by now. and reboot server
     
  13. upgrade81

    upgrade81 Premium Member Premium Member

    266
    16
    18
    Sep 5, 2016
    Italy
    Ratings:
    +27
    Local Time:
    8:47 AM
    1.17
    10.3
    And custom config,
    now it's like that.
    I saw you put a patch for brorli now you're going to get nginx back to be sure.

    Code:
    NSD_INSTALL='n'
    #CLANG='n'
    #DEVTOOLSETSEVEN='y'
    #NGINX_DEVTOOLSETGCC='y'
    PHP_PGO='y'                   # Profile Guided Optimization https://software.intel.com/en-us/blogs/2015/10/09/pgo-let-it-go-php
    
    
    # -----set = y to put nginx, php and mariadb major version updates into 503
    # maintenance mode https://community.centminmod.com/posts/26485/
    NGINX_UPDATEMAINTENANCE='n'
    PHP_UPDATEMAINTENANCE='n'
    MARIADB_UPDATEMAINTENANCE='y'
    
    #------nginx
    LETSENCRYPT_DETECT='y'
    DUALCERTS='y'  #dual cert RSA + ECDSA
    NGINX_DYNAMICTLS='n'
    NGINX_HPACK='y'
    NGINX_LIBBROTLI='y'          # https://github.com/eustas/ngx_brotli
    NGINX_LIBBROTLISTATIC='y'
    CLOUDFLARE_ZLIB='n'        # use Cloudflare optimised zlib fork https://blog.cloudflare.com/cloudflare-fights-cancer/
    CLOUDFLARE_ZLIBPHP='n'     # use Cloudflare optimised zlib fork for PHP-FPM zlib instead of system zlib
    
    CUSTOMSERVERNAME='y'
    CUSTOMSERVERSTRING='nginx'
    AUTO_GITUPDATE='y'  #abilita gli auto update di centmin
    ENABLE_MARIADBTENTWOUPGRADE='y'
    AUTOTUNE_CLIENTMAXBODY='n'
    
    
    
    #OPENSSL_VERSION='1.1.0h'
    LIBRESSL_SWITCH='n'
    
    #PRIORITIZE_CHACHA_OPENSSL='y'
    
    #PHP Custom
    PHP_VERSION='7.1.23'
    GCCINTEL_PHP='y'
    
     
  14. upgrade81

    upgrade81 Premium Member Premium Member

    266
    16
    18
    Sep 5, 2016
    Italy
    Ratings:
    +27
    Local Time:
    8:47 AM
    1.17
    10.3
    No, I have not remove brotli yet.
    now I upgrade the kernel.
     
  15. upgrade81

    upgrade81 Premium Member Premium Member

    266
    16
    18
    Sep 5, 2016
    Italy
    Ratings:
    +27
    Local Time:
    8:47 AM
    1.17
    10.3
    ok kernel update to 4.19 ML Ml headers and Ml devel
     
  16. eva2000

    eva2000 Administrator Staff Member

    45,676
    10,371
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +16,094
    Local Time:
    5:47 PM
    Nginx 1.19.x
    MariaDB 5.5/10.x
    if you haven't check /var/log/messages for any continued segfaults

    if they still exist (note timestamp logged to current date), then try recompile nginx with brotli disabled.
     
  17. upgrade81

    upgrade81 Premium Member Premium Member

    266
    16
    18
    Sep 5, 2016
    Italy
    Ratings:
    +27
    Local Time:
    8:47 AM
    1.17
    10.3
    I removed Brotli, I'll let you know how it goes.
     
  18. upgrade81

    upgrade81 Premium Member Premium Member

    266
    16
    18
    Sep 5, 2016
    Italy
    Ratings:
    +27
    Local Time:
    8:47 AM
    1.17
    10.3
    I noticed this

    I do netstat -an | wc -l
    1311
    When nginx went 100%

    2 seconds after restarting the nginx.

    netstat -an | wc -l
    195

    it seems to me that something remains "hung"
     
  19. eva2000

    eva2000 Administrator Staff Member

    45,676
    10,371
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +16,094
    Local Time:
    5:47 PM
    Nginx 1.19.x
    MariaDB 5.5/10.x
    This is after or before you disabled ngx_brotli ?

    You need to investigate where those connections are directed to site wise etc - this thread has discusses all the various ways to get this info, ngxtop, nginx debug mode, logs etc.

    also use cminfo netstat command Beta Branch - update cminfo command with netstat flag option

    if could be very well legit load/activity and just your nginx related resource usage is maxed out too and/or needs optimising.
     
  20. upgrade81

    upgrade81 Premium Member Premium Member

    266
    16
    18
    Sep 5, 2016
    Italy
    Ratings:
    +27
    Local Time:
    8:47 AM
    1.17
    10.3
    on the VM with brotli disabled for now nginx is ok, we'll see tomorrow.

    What I wrote is related to an identical VM, but with brotli still active.

    I'm doing several tests I have not only 1 VM that gives this problem but 3.