Learn about Centmin Mod LEMP Stack today
Register Now

Beta Branch Nginx Upgrade - zero downtime mode

Discussion in 'Beta release code' started by eva2000, Jul 16, 2016.

  1. eva2000

    eva2000 Administrator Staff Member

    55,223
    12,253
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,831
    Local Time:
    1:40 PM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    Centmin Mod 123.09beta01's nginx upgrade routine (centmin.sh menu option 4) has added support for zero downtime mode on the fly nginx binary updates as outlined here. This routine was original added to a separate experimental branch of centmin mod 123.08 betas and now ported over to 123.09beta01.

    It's an optional feature which is disabled by default. To enable it add to your persistent config file at /etc/centminmod/custom_config.inc the variable.
    Code (Text):
    NGINX_ZERODT='y'
    

    • On centmin.sh menu option 4 recompile or upgrade/downgrades of Nginx with NGINX_ZERODT='y' enabled, you will see additional information to help verify zero downtime on the fly nginx binary updates are indeed working as outlined at http://nginx.org/en/docs/control.html#upgrade.
    • You'll see the previous nginx version's binary renamed and copied as /usr/local/sbin/nginx.old so you can revert to previous nginx version more quickly. Though you can also do that via centmin.sh menu option 4 recompiles.
    Existing nginx master process id (pid) = 4533 and the new updated nginx binary master pid = 9780
    Code (Text):
    ---------------------------------------------------------------------------
    nginx master id: 4533
    ---------------------------------------------------------------------------
    Active connections: 1
    server accepts handled requests
    256 256 257
    Reading: 0 Writing: 1 Waiting: 0
    ---------------------------------------------------------------------------
    kill -USR2 4533
      PID  PPID USER     %CPU    VSZ WCHAN  COMMAND
    4533     1 root      0.0 254736 sigsus nginx: master process /usr/local/sbin/nginx -c /usr/local/nginx/conf/nginx.conf
    4536  4533 nginx     0.0 295708 ep_pol nginx: worker process
    4537  4533 nginx     0.0 295708 ep_pol nginx: worker process
    4538  4533 nginx     0.0 295708 ep_pol nginx: worker process
    4539  4533 nginx     0.0 324388 ep_pol nginx: worker process
    9780  4533 root      1.6 254736 sigsus nginx: master process /usr/local/sbin/nginx -c /usr/local/nginx/conf/nginx.conf
    9799  9780 nginx     0.0 271124 ep_pol nginx: worker process
    9800  9780 nginx     0.0 271124 ep_pol nginx: worker process
    9801  9780 nginx     0.0 271124 ep_pol nginx: worker process
    9802  9780 nginx     0.0 271124 ep_pol nginx: worker process
    
    -rwxr-xr-x 1 root root 3.4M Jul 15 15:42 /usr/local/sbin/nginx
    -rwxr-xr-x 1 root root 3.4M Jul 15 15:29 /usr/local/sbin/nginx.old
    ---------------------------------------------------------------------------
    Active connections: 1
    server accepts handled requests
    257 257 258
    Reading: 0 Writing: 1 Waiting: 0
    ---------------------------------------------------------------------------
    

    Old nginx master childs all exited leaving just the master pid = 4533 while new nginx master pid = 9780 and childs with PPID = 9780 are running side by side
    Code (Text):
    ---------------------------------------------------------------------------
    kill -WINCH 4533
      PID  PPID USER     %CPU    VSZ WCHAN  COMMAND
    4533     1 root      0.0 254736 sigsus nginx: master process /usr/local/sbin/nginx -c /usr/local/nginx/conf/nginx.conf
    9780  4533 root      0.8 254736 sigsus nginx: master process /usr/local/sbin/nginx -c /usr/local/nginx/conf/nginx.conf
    9799  9780 nginx     0.0 271124 ep_pol nginx: worker process
    9800  9780 nginx     0.0 271124 ep_pol nginx: worker process
    9801  9780 nginx     1.0 324388 ep_pol nginx: worker process
    9802  9780 nginx     0.0 271124 ep_pol nginx: worker process
    ---------------------------------------------------------------------------
    Active connections: 1
    server accepts handled requests
    3 3 3
    Reading: 0 Writing: 1 Waiting: 0
    ---------------------------------------------------------------------------
    

    End old master nginx pid = 4533 leaving just the new nginx master pid = 9780 and childs with PPID = 9780 are running
    Code (Text):
    ---------------------------------------------------------------------------
    kill -QUIT 4533
      PID  PPID USER     %CPU    VSZ WCHAN  COMMAND
    9780     1 root      0.5 254736 sigsus nginx: master process /usr/local/sbin/nginx -c /usr/local/nginx/conf/nginx.conf
    9799  9780 nginx     0.0 295708 ep_pol nginx: worker process
    9800  9780 nginx     0.0 271124 ep_pol nginx: worker process
    9801  9780 nginx     0.6 324388 ep_pol nginx: worker process
    9802  9780 nginx     0.0 271124 ep_pol nginx: worker process
    ---------------------------------------------------------------------------
    Active connections: 1
    server accepts handled requests
    5 5 5
    Reading: 0 Writing: 1 Waiting: 0
    ---------------------------------------------------------------------------
    


     
    Last edited: Jul 16, 2016
  2. tjk

    tjk Member

    76
    16
    8
    Jun 27, 2015
    Ratings:
    +27
    Local Time:
    11:40 PM
    Not that I need it for my sites, but this is way cool @eva2000 !
     
  3. pamamolf

    pamamolf Premium Member Premium Member

    4,101
    428
    83
    May 31, 2014
    Ratings:
    +837
    Local Time:
    6:40 AM
    Nginx-1.26.x
    MariaDB 10.6.x
    I thought that this will replace the old upgrade routine at all and not as an option...or isn't stable yet?
     
  4. eva2000

    eva2000 Administrator Staff Member

    55,223
    12,253
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,831
    Local Time:
    1:40 PM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    Cheers, has always been on the books this feature for centmin mod eventually :)
    123.09beta01 so test it now and later on stable release NGINX_ZERODT='y' will be default in centmin.sh :D
     
  5. eva2000

    eva2000 Administrator Staff Member

    55,223
    12,253
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,831
    Local Time:
    1:40 PM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    revised the NGINX_ZERODT='y' routine for the time between kill -WINCH and kill -QUIT commands to properly detect when the old nginx worker child processes with PPID = to the old nginx master PID eventually gracefully shutdown before issuing kill -QUIT. This can potentially extend and prolong the whole nginx upgrade routine in centmin.sh menu option 4 until the old nginx worker child processes finish. But this should probably allow old nginx processes to complete first.

    You'll have this extra output before kill -QUIT is issued
    Code (Text):
    ---------------------------------------------------------------------------
    waiting for old nginx worker processes to exit...
     checking... worker child PPID=24050 exists
     checking... worker child PPID=24050 exists
     checking... worker child PPID=24050 exists
    ---------------------------------------------------------------------------
    kill -QUIT 24050
    

    Full example
    Code (Text):
    ---------------------------------------------------------------------------
    nginx master id: 24050
    ---------------------------------------------------------------------------
    Active connections: 2
    server accepts handled requests
    871 871 868
    Reading: 0 Writing: 1 Waiting: 1
    ---------------------------------------------------------------------------
    kill -USR2 24050
      PID  PPID USER     %CPU    VSZ WCHAN  COMMAND
    1076 24050 root      1.6 254736 sigsus nginx: master process /usr/local/sbin/nginx -c /usr/local/nginx/conf/nginx.conf
    1079  1076 nginx     0.0 271124 ep_pol nginx: worker process
    1080  1076 nginx     0.0 271124 ep_pol nginx: worker process
    1081  1076 nginx     2.0 324388 ep_pol nginx: worker process
    1082  1076 nginx     0.0 271124 ep_pol nginx: worker process
    24050     1 root      0.0 254736 sigsus nginx: master process /usr/local/sbin/nginx -c /usr/local/nginx/conf/nginx.conf
    24051 24050 nginx     0.0 308000 ep_pol nginx: worker process
    24052 24050 nginx     0.0 295708 ep_pol nginx: worker process
    24053 24050 nginx     0.0 295708 ep_pol nginx: worker process
    24054 24050 nginx     0.0 308000 ep_pol nginx: worker process
    
    -rwxr-xr-x 1 root root 3.4M Jul 16 00:51 /usr/local/sbin/nginx
    -rwxr-xr-x 1 root root 3.4M Jul 15 15:42 /usr/local/sbin/nginx.old
    ---------------------------------------------------------------------------
    Active connections: 1
    server accepts handled requests
    3 3 3
    Reading: 0 Writing: 1 Waiting: 0
    

    Code (Text):
    ---------------------------------------------------------------------------
    kill -WINCH 24050
      PID  PPID USER     %CPU    VSZ WCHAN  COMMAND
    1076 24050 root      0.8 254736 sigsus nginx: master process /usr/local/sbin/nginx -c /usr/local/nginx/conf/nginx.conf
    1079  1076 nginx     0.0 295708 ep_pol nginx: worker process
    1080  1076 nginx     0.0 295708 ep_pol nginx: worker process
    1081  1076 nginx     1.0 324388 ep_pol nginx: worker process
    1082  1076 nginx     0.0 295708 ep_pol nginx: worker process
    24050     1 root      0.0 254736 sigsus nginx: master process /usr/local/sbin/nginx -c /usr/local/nginx/conf/nginx.conf
    ---------------------------------------------------------------------------
    Active connections: 2
    server accepts handled requests
    12 12 12
    Reading: 0 Writing: 1 Waiting: 1
    ---------------------------------------------------------------------------
    waiting for old nginx worker processes to exit...
     checking... worker child PPID=24050 exists
     checking... worker child PPID=24050 exists
     checking... worker child PPID=24050 exists
    ---------------------------------------------------------------------------
    kill -QUIT 24050
      PID  PPID USER     %CPU    VSZ WCHAN  COMMAND
    1076     1 root      0.5 254736 sigsus nginx: master process /usr/local/sbin/nginx -c /usr/local/nginx/conf/nginx.conf
    1079  1076 nginx     0.0 295708 ep_pol nginx: worker process
    1080  1076 nginx     0.1 295708 ep_pol nginx: worker process
    1081  1076 nginx     0.6 324388 ep_pol nginx: worker process
    1082  1076 nginx     0.0 295708 ep_pol nginx: worker process
    ---------------------------------------------------------------------------
    Active connections: 2
    server accepts handled requests
    15 15 15
    Reading: 0 Writing: 1 Waiting: 1
    ---------------------------------------------------------------------------
     
  6. rdan

    rdan Well-Known Member

    5,449
    1,410
    113
    May 25, 2014
    Ratings:
    +2,204
    Local Time:
    11:40 AM
    Mainline
    10.2
    Why is this still not enable by default? :)
     
  7. eva2000

    eva2000 Administrator Staff Member

    55,223
    12,253
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,831
    Local Time:
    1:40 PM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    I've tested it to work for myself, but no guarantee it works 100% for others, so leave it up to them to test it themselves and hopefully they report their feedback :)
     
  8. rdan

    rdan Well-Known Member

    5,449
    1,410
    113
    May 25, 2014
    Ratings:
    +2,204
    Local Time:
    11:40 AM
    Mainline
    10.2
    No PHP-FPM similar approach yet? :)
     
  9. eva2000

    eva2000 Administrator Staff Member

    55,223
    12,253
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,831
    Local Time:
    1:40 PM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    nope - for that if you need 100% uptime, you need to use multiple servers in load balanced state each running php-fpm servers and a network shared/distributed common web site files storage configuration i.e. GlusterFS/NFS etc
     
  10. rdan

    rdan Well-Known Member

    5,449
    1,410
    113
    May 25, 2014
    Ratings:
    +2,204
    Local Time:
    11:40 AM
    Mainline
    10.2
    What is this 404 error not found on the log?
    Code:
    ---------------------------------------------------------------------------
    nginx master id: 29648
    ---------------------------------------------------------------------------
    <html>
    <head><title>404 Not Found</title></head>
    <body>
    <center><h1>404 Not Found</h1></center>
    <hr><center>nginx</center>
    </body>
    </html>
    ---------------------------------------------------------------------------
    kill -USR2 29648
      PID  PPID USER     %CPU    VSZ WCHAN  COMMAND
     5757 29648 nginx     0.8 1367732 ep_pol nginx: worker process
     5758 29648 nginx     0.6 1367732 -     nginx: worker process
     5759 29648 nginx     0.6 1367732 ep_pol nginx: worker process
     5760 29648 nginx     0.5 1367732 ep_pol nginx: worker process
     5761 29648 nginx     0.8 1367732 ep_pol nginx: worker process
     5762 29648 nginx     0.7 1367732 ep_pol nginx: worker process
     5763 29648 nginx     0.0 1318580 ep_pol nginx: cache manager process
    29648     1 root      0.0 1318580 sigsus nginx: master process /usr/local/sbin/nginx -c /usr/local/nginx/conf/nginx.conf
    30542 29648 root      0.0 1314340 sigsus nginx: master process /usr/local/sbin/nginx -c /usr/local/nginx/conf/nginx.conf
    30544 30542 nginx     0.3 1363492 ep_pol nginx: worker process
    30545 30542 nginx     0.6 1363492 ep_pol nginx: worker process
    30546 30542 nginx     0.3 1363492 ep_pol nginx: worker process
    30547 30542 nginx     0.6 1363492 ep_pol nginx: worker process
    30548 30542 nginx     0.3 1363492 ep_pol nginx: worker process
    30549 30542 nginx     0.6 1363492 ep_pol nginx: worker process
    30550 30542 nginx     0.0 1314340 ep_pol nginx: cache manager process
    30551 30542 nginx     0.0 1314340 ep_pol nginx: cache loader process
    
    -rwxr-xr-x 1 root root 4.1M Sep 25 00:01 /usr/local/sbin/nginx
    -rwxr-xr-x 1 root root 4.1M Sep 22 04:46 /usr/local/sbin/nginx.old
    ---------------------------------------------------------------------------
    <html>
    <head><title>404 Not Found</title></head>
    <body>
    <center><h1>404 Not Found</h1></center>
    <hr><center>nginx</center>
    </body>
    </html>
    ---------------------------------------------------------------------------
    kill -WINCH 29648
      PID  PPID USER     %CPU    VSZ WCHAN  COMMAND
    29648     1 root      0.0 1318580 sigsus nginx: master process /usr/local/sbin/nginx -c /usr/local/nginx/conf/nginx.conf
    30542 29648 root      0.0 1314340 sigsus nginx: master process /usr/local/sbin/nginx -c /usr/local/nginx/conf/nginx.conf
    30544 30542 nginx     0.6 1363492 ep_pol nginx: worker process
    30545 30542 nginx     0.5 1363492 ep_pol nginx: worker process
    30546 30542 nginx     0.6 1363492 ep_pol nginx: worker process
    30547 30542 nginx     0.6 1363492 ep_pol nginx: worker process
    30548 30542 nginx     0.6 1363492 ep_pol nginx: worker process
    30549 30542 nginx     0.5 1363492 ep_pol nginx: worker process
    30550 30542 nginx     0.0 1314340 ep_pol nginx: cache manager process
    30551 30542 nginx     0.0 1314340 ep_pol nginx: cache loader process
    ---------------------------------------------------------------------------
    <html>
    <head><title>404 Not Found</title></head>
    <body>
    <center><h1>404 Not Found</h1></center>
    <hr><center>nginx</center>
    </body>
    </html>
    ---------------------------------------------------------------------------
     waiting for old nginx worker processes to exit...
    ---------------------------------------------------------------------------
    kill -QUIT 29648
      PID  PPID USER     %CPU    VSZ WCHAN  COMMAND
    30542     1 root      0.0 1314340 sigsus nginx: master process /usr/local/sbin/nginx -c /usr/local/nginx/conf/nginx.conf
    30544 30542 nginx     0.6 1363492 ep_pol nginx: worker process
    30545 30542 nginx     0.5 1363492 ep_pol nginx: worker process
    30546 30542 nginx     1.0 1363492 ep_pol nginx: worker process
    30547 30542 nginx     0.5 1363492 ep_pol nginx: worker process
    30548 30542 nginx     0.8 1363492 ep_pol nginx: worker process
    30549 30542 nginx     0.6 1363492 ep_pol nginx: worker process
    30550 30542 nginx     0.0 1314340 ep_pol nginx: cache manager process
    30551 30542 nginx     0.0 1314340 ep_pol nginx: cache loader process
    ---------------------------------------------------------------------------
    <html>
    <head><title>404 Not Found</title></head>
    <body>
    <center><h1>404 Not Found</h1></center>
    <hr><center>nginx</center>
    </body>
    </html>
    ---------------------------------------------------------------------------
    
     
  11. eva2000

    eva2000 Administrator Staff Member

    55,223
    12,253
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,831
    Local Time:
    1:40 PM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    The routine needs to monitor your nginx_status output - just for diagnostic purposes in nginx upgrade logs
    Code (Text):
    curl -s -4 localhost/nginx_status

    Code (Text):
    curl -s -4 localhost/nginx_status
    Active connections: 1
    server accepts handled requests
     7 7 7
    Reading: 0 Writing: 1 Waiting: 0
    
     
  12. rdan

    rdan Well-Known Member

    5,449
    1,410
    113
    May 25, 2014
    Ratings:
    +2,204
    Local Time:
    11:40 AM
    Mainline
    10.2
    Hhmmmm, I disable sub stats module, and also remove that on the config.
    Is it really needed? Then I'll add it back.
    Thanks!
     
  13. eva2000

    eva2000 Administrator Staff Member

    55,223
    12,253
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,831
    Local Time:
    1:40 PM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    It's handy to have for nginx statistics and if you use nginx related service monitoring i.e. nginx amplify, newrelic etc.
     
  14. rdan

    rdan Well-Known Member

    5,449
    1,410
    113
    May 25, 2014
    Ratings:
    +2,204
    Local Time:
    11:40 AM
    Mainline
    10.2
    Could be related...

    I have a scenario today that...
    Site visitors are receiving timeout errors when loading the site,
    I cannot restart or start Nginx.
    Just hang up, I've fixed it by rebooting the whole server.

    Then disable this NGINX_ZERODT, and recompiled Nginx.
    Though could be a bug with 1.17.4.
    I didn't inspect much.
     
  15. rdan

    rdan Well-Known Member

    5,449
    1,410
    113
    May 25, 2014
    Ratings:
    +2,204
    Local Time:
    11:40 AM
    Mainline
    10.2
    I've also downgraded to Nginx 1.16.1 and remove Hpack/DynamicTLS, just to lessen the things to investigate.
     
  16. eva2000

    eva2000 Administrator Staff Member

    55,223
    12,253
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,831
    Local Time:
    1:40 PM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    might just use 1.17.4 and remove nginx patches and see

    also timeouts could be network related so unrelated to Nginx ?
     
  17. rdan

    rdan Well-Known Member

    5,449
    1,410
    113
    May 25, 2014
    Ratings:
    +2,204
    Local Time:
    11:40 AM
    Mainline
    10.2
    I'll try that later.

    Nope, i can ping the site Fine.
     
  18. rdan

    rdan Well-Known Member

    5,449
    1,410
    113
    May 25, 2014
    Ratings:
    +2,204
    Local Time:
    11:40 AM
    Mainline
    10.2
    Done with dynamic tls patch only now.
    I won't touch hpack anymore seeing someone has encounter an issue also last year.
     
  19. eva2000

    eva2000 Administrator Staff Member

    55,223
    12,253
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,831
    Local Time:
    1:40 PM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    are you behind Cloudflare ? if you are you don't really need HPACK nginx patch as Cloudflare can only communicate with origin servers over HTTP/1.1 and not HTTP/2 right now
     
  20. rdan

    rdan Well-Known Member

    5,449
    1,410
    113
    May 25, 2014
    Ratings:
    +2,204
    Local Time:
    11:40 AM
    Mainline
    10.2
    Not anymore.
    Cloudflare is great! but having 2-3 users reporting that they encounter 522,523,504 cloudflare errors even if server is working fine isn't good.
    So once again I ditch CF. :-(

    Maybe not every Cloudflare Pop are performing the same and well.
     
    Last edited: Oct 1, 2019