I received automated mail: Code: High 15 minute load average alert - 25.29 [-- Attachment #1 --] [-- Type: text/plain, Encoding: 7bit, Size: 0.2K --] Time: Sun Apr 15 07:16:35 2018 +0000 1 Min Load Avg: 22.01 5 Min Load Avg: 29.03 15 Min Load Avg: 25.29 Running/Total Processes: 1/69 [-- Attachment #2: ps.txt --] [-- Type: text/plain, Encoding: 7bit, Size: 0.1K --] Output from ps: Unable to obtain process output within 15 seconds - Timed out [-- Attachment #3: vmstat.txt --] [-- Type: text/plain, Encoding: 7bit, Size: 0.1K --] Output from vmstat: Unable to obtain vmstat output within 10 seconds - Timed out [-- Attachment #4: netstat.txt --] [-- Type: text/plain, Encoding: 7bit, Size: 2.3K --] Output from netstat: Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 127.0.0.1:8004 0.0.0.0:* LISTEN 2250/php-fpm: maste tcp 0 0 127.0.0.1:8005 0.0.0.0:* LISTEN 2250/php-fpm: maste tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN 1/init tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 10803/nginx: master tcp 0 0 0.0.0.0:21 0.0.0.0:* LISTEN 6253/pure-ftpd (SER tcp 0 0 0.0.0.0:712 0.0.0.0:* LISTEN 430/sshd tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 775/master tcp 0 0 0.0.0.0:443 0.0.0.0:* LISTEN 10803/nginx: master tcp 0 0 127.0.0.1:8000 0.0.0.0:* LISTEN 2250/php-fpm: maste tcp 0 0 127.0.0.1:8002 0.0.0.0:* LISTEN 2250/php-fpm: maste tcp 0 0 127.0.0.1:3303 0.0.0.0:* LISTEN 17533/mysqld tcp 0 0 127.0.0.1:8003 0.0.0.0:* LISTEN 2250/php-fpm: maste tcp6 0 0 :::80 :::* LISTEN 10803/nginx: master tcp6 0 0 :::21 :::* LISTEN 6253/pure-ftpd (SER tcp6 0 0 :::712 :::* LISTEN 430/sshd tcp6 0 0 ::1:25 :::* LISTEN 775/master tcp6 0 0 :::443 :::* LISTEN 10803/nginx: master udp 0 0 111.22.333.44:63249 8.8.4.4:53 ESTABLISHED 10805/nginx: worker udp 0 0 111.22.333.44:44679 8.8.8.8:53 ESTABLISHED 10805/nginx: worker udp 0 0 111.22.333.44:64408 8.8.8.8:53 ESTABLISHED 10804/nginx: worker udp 0 0 111.22.333.44:32236 8.8.4.4:53 ESTABLISHED 10804/nginx: worker [-- Attachment #5: apachestatus.html --] [-- Type: text/html, Encoding: 7bit, Size: 0.1K --] Unable to retrieve Apache Server Status [http://127.0.0.1/server-status] - Unable to download: Server closed connection without sending any data back Why it's trying to get Apache Server Status? I checked the logs but I don't understand what happened. This was few hours later: Code: error.log 2018/04/15 15:04:03 [crit] 10804#10804: *236 SSL_do_handshake() failed (SSL: error:1417D102:SSL routines:tls_process_client_hello:unsupported protocol) while SSL handshaking, client: 172.104.154.126, server: 0.0.0.0:443 (lot's of these) Code: access.log 54.183.165.74 - - [15/Apr/2018:05:59:48 +0000] "GET / HTTP/1.1" 400 7442 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36" This is me - - [15/Apr/2018:07:44:27 +0000] "GET /index.php/204 HTTP/1.1" 204 0 "-" "Mozilla/5.0 (Android) ownCloud-android/20180412" ... 172.104.154.126 - - [15/Apr/2018:15:04:03 +0000] "\x15\x03\x01\x00\x02\x01\x00" 400 166 "-" "-" 172.104.154.126 - - [15/Apr/2018:15:04:03 +0000] "\x15\x03\x01\x00\x02\x01\x00" 400 166 "-" "-" 172.104.154.126 - - [15/Apr/2018:15:04:03 +0000] "\x15\x03\x01\x00\x02\x01\x00" 400 166 "-" "-" 172.104.154.126 - - [15/Apr/2018:15:04:03 +0000] "\x15\x03\x01\x00\x02\x01\x00" 400 166 "-" "-" 172.104.154.126 - - [15/Apr/2018:15:04:03 +0000] "\x15\x03\x01\x00\x02\x01\x00" 400 166 "-" "-" 172.104.154.126 - - [15/Apr/2018:15:04:03 +0000] "\x15\x03\x02\x00\x02\x01\x00" 400 166 "-" "-" 172.104.154.126 - - [15/Apr/2018:15:04:03 +0000] "\x15\x03\x01\x00\x02\x01\x00" 400 166 "-" "-" 172.104.154.126 - - [15/Apr/2018:15:04:03 +0000] "\x15\x03\x01\x00\x02\x01\x00" 400 166 "-" "-" 172.104.154.126 - - [15/Apr/2018:15:04:03 +0000] "\x15\x03\x01\x00\x02\x01\x00" 400 166 "-" "-" 172.104.154.126 - - [15/Apr/2018:15:04:03 +0000] "\x15\x03\x01\x00\x02\x01\x00" 400 166 "-" "-" 172.104.154.126 - - [15/Apr/2018:15:04:03 +0000] "\x15\x03\x01\x00\x02\x01\x00" 400 166 "-" "-" 172.104.154.126 - - [15/Apr/2018:15:04:03 +0000] "\x15\x03\x01\x00\x02\x01\x00" 400 166 "-" "-" 172.104.154.126 - - [15/Apr/2018:15:04:03 +0000] "\x15\x03\x01\x00\x02\x01\x00" 400 166 "-" "-" 172.104.154.126 - - [15/Apr/2018:15:04:03 +0000] "\x15\x03\x01\x00\x02\x01\x00" 400 166 "-" "-" (this continues...) Code: cron Apr 15 06:15:03 info CROND[13371]: (root) CMD (/usr/lib64/sa/sa1 1 1) Apr 15 06:15:03 info CROND[13372]: (nginx) CMD (/usr/local/bin/php -f /home/nginx/domains/domain.name/public/cron.php) Apr 15 06:20:02 info CROND[13401]: (root) CMD (/usr/lib64/sa/sa1 1 1) Apr 15 06:30:37 info crond[13422]: (root) INFO (Job execution of per-minute job scheduled for 06:25 delayed into subsequent minute 06:30. Skipping job run.) Apr 15 06:32:09 info crond[13436]: (nginx) INFO (Job execution of per-minute job scheduled for 06:30 delayed into subsequent minute 06:31. Skipping job run.) Apr 15 06:32:11 info crond[13435]: (root) INFO (Job execution of per-minute job scheduled for 06:30 delayed into subsequent minute 06:31. Skipping job run.) Apr 15 06:32:11 info crond[13435]: (root) INFO (Job execution of per-minute job scheduled for 06:30 delayed into subsequent minute 06:31. Skipping job run.) Apr 15 07:16:36 info crond[13460]: (root) INFO (Job execution of per-minute job scheduled for 07:13 delayed into subsequent minute 07:14. Skipping job run.) Apr 15 07:16:36 info crond[13454]: (root) INFO (Job execution of per-minute job scheduled for 06:48 delayed into subsequent minute 07:14. Skipping job run.) Apr 15 07:16:36 info crond[13453]: (root) INFO (Job execution of per-minute job scheduled for 06:36 delayed into subsequent minute 07:14. Skipping job run.) Apr 15 07:16:36 info crond[13457]: (root) INFO (Job execution of per-minute job scheduled for 06:50 delayed into subsequent minute 07:14. Skipping job run.) Apr 15 07:16:36 info CROND[13485]: (root) CMD (/usr/lib64/sa/sa1 1 1) Code: maillog Apr 15 06:15:09 info postfix/local[13382]: F22CD40D2E: to=<nginx@info.localdomain>, orig_to=<nginx>, relay=local, delay=7.8, delays=5.7/0.01/0/2.1, dsn=2.0.0, status=sent (delivered to mailbox) Apr 15 06:15:09 info postfix/qmgr[783]: F22CD40D2E: removed Apr 15 06:30:34 info postfix/master[775]: warning: unix_trigger_event: read timeout for service public/pickup Apr 15 06:30:59 info postfix/master[775]: warning: unix_trigger_event: read timeout for service public/pickup Apr 15 06:32:16 info postfix/master[775]: warning: unix_trigger_event: read timeout for service public/pickup Apr 15 06:48:55 info postfix/master[775]: warning: unix_trigger_event: read timeout for service public/pickup Apr 15 06:32:16 info postfix/master[775]: warning: unix_trigger_event: read timeout for service public/pickup Apr 15 06:48:55 info postfix/master[775]: warning: unix_trigger_event: read timeout for service public/pickup Apr 15 07:16:36 info postfix/master[775]: warning: unix_trigger_event: read timeout for service public/pickup Apr 15 07:16:36 info postfix/master[775]: warning: unix_trigger_event: read timeout for service public/qmgr Apr 15 07:16:36 info postfix/master[775]: warning: unix_trigger_event: read timeout for service public/pickup Apr 15 07:16:36 info postfix/master[775]: warning: unix_trigger_event: read timeout for service public/pickup Apr 15 07:18:08 info postfix/pickup[13483]: 8D8F440DF0: uid=0 from=<root> Apr 15 07:18:08 info postfix/cleanup[13494]: 8D8F440DF0: message-id=<20180415071646.8D8F440DF0@info.localdomain> Apr 15 07:18:08 info postfix/qmgr[783]: 8D8F440DF0: from=<root@info.localdomain>, size=4085, nrcpt=1 (queue active) Apr 15 07:18:08 info postfix/local[13502]: 8D8F440DF0: to=<root@info.localdomain>, orig_to=<root>, relay=local, delay=86, delays=56/3.2/0/27, dsn=2.0.0, status=sent (delivered to mailbox) Code: messages Apr 15 06:15:02 info systemd: Started Session 2626430 of user root. Apr 15 06:15:03 info systemd: Starting Session 2626430 of user root. Apr 15 06:15:03 info systemd: Created slice User Slice of nginx. Apr 15 06:15:03 info systemd: Starting User Slice of nginx. Apr 15 06:15:03 info systemd: Started Session 2626431 of user nginx. Apr 15 06:15:03 info systemd: Starting Session 2626431 of user nginx. Apr 15 06:15:04 info systemd: Removed slice User Slice of nginx. Apr 15 06:15:04 info systemd: Stopping User Slice of nginx. Apr 15 06:20:02 info systemd: Started Session 2626661 of user root. Apr 15 06:20:02 info systemd: Starting Session 2626661 of user root. Apr 15 06:29:00 info systemd-logind: Failed to start session scope session-2626800.scope: Connection timed out Apr 15 06:30:27 info systemd: Failed to start Session 2626800 of user root. Apr 15 06:31:38 info systemd: Started Session 2626985 of user root. Apr 15 06:32:04 info systemd: Starting Session 2626985 of user root. Apr 15 06:32:09 info systemd: Created slice User Slice of nginx. Apr 15 06:32:11 info systemd: Starting User Slice of nginx. Apr 15 06:32:11 info systemd: Failed to start Session 2626986 of user nginx. Apr 15 06:32:23 info systemd: Removed slice User Slice of nginx. Apr 15 06:32:24 info systemd: Stopping User Slice of nginx. Apr 15 06:32:11 info systemd: Starting User Slice of nginx. Apr 15 06:32:11 info systemd: Failed to start Session 2626986 of user nginx. Apr 15 06:32:23 info systemd: Removed slice User Slice of nginx. Apr 15 06:32:24 info systemd: Stopping User Slice of nginx. Apr 15 07:16:36 info journal: Permanent journal is using 96.0M (max allowed 4.0G, trying to leave 4.0G free of 63.1G available → current limit 4.0G). Apr 15 07:16:36 info journal: Journal started Apr 15 07:16:36 info systemd: systemd-journald.service watchdog timeout (limit 3min)! Apr 15 07:16:36 info rsyslogd: imjournal: journal reloaded... [v8.24.0 try http://www.rsyslog.com/e/0 ] Apr 15 07:16:36 info journal: Journal stopped Apr 15 07:18:08 info journal: Permanent journal is using 96.0M (max allowed 4.0G, trying to leave 4.0G free of 63.1G available → current limit 4.0G). Apr 15 07:18:08 info journal: Journal started Apr 15 07:19:55 info systemd: Removed slice User Slice of nginx. Apr 15 07:19:55 info systemd: Stopping User Slice of nginx. Apr 15 07:20:02 info systemd: Started Session 2628008 of user root. Apr 15 07:20:02 info systemd: Starting Session 2628008 of user root. Apr 15 07:25:01 info systemd: Started Session 2628202 of user root. Apr 15 07:25:02 info systemd: Starting Session 2628202 of user root. Code: journalctl Apr 15 06:20:02 info systemd[1]: Started Session 2626661 of user root. Apr 15 06:20:02 info CROND[13401]: (root) CMD (/usr/lib64/sa/sa1 1 1) Apr 15 06:20:02 info systemd[1]: Starting Session 2626661 of user root. Apr 15 06:29:00 info systemd-logind[121]: Failed to start session scope session-2626800.scope: Connection timed out Apr 15 06:30:21 info crond[13422]: pam_systemd(crond:session): Failed to create session: Connection timed out Apr 15 06:30:27 info systemd[1]: Failed to start Session 2626800 of user root. Apr 15 06:30:34 info postfix/master[775]: warning: unix_trigger_event: read timeout for service public/pickup Apr 15 06:30:37 info crond[13422]: (root) INFO (Job execution of per-minute job scheduled for 06:25 delayed into subsequent minute 06:30. Skipping job r Apr 15 06:30:59 info postfix/master[775]: warning: unix_trigger_event: read timeout for service public/pickup Apr 15 06:31:38 info systemd[1]: Started Session 2626985 of user root. Apr 15 06:32:02 info crond[13435]: pam_systemd(crond:session): Failed to create session: Connection timed out Apr 15 06:32:04 info systemd[1]: Starting Session 2626985 of user root. Apr 15 06:32:09 info crond[13436]: pam_systemd(crond:session): Failed to create session: Connection timed out Apr 15 06:32:09 info systemd[1]: Created slice User Slice of nginx. Apr 15 06:32:09 info crond[13436]: (nginx) INFO (Job execution of per-minute job scheduled for 06:30 delayed into subsequent minute 06:31. Skipping job Apr 15 06:32:11 info systemd[1]: Starting User Slice of nginx. Apr 15 06:32:11 info crond[13435]: (root) INFO (Job execution of per-minute job scheduled for 06:30 delayed into subsequent minute 06:31. Skipping job r Apr 15 06:32:11 info systemd[1]: Failed to start Session 2626986 of user nginx. Apr 15 06:32:16 info postfix/master[775]: warning: unix_trigger_event: read timeout for service public/pickup Apr 15 06:32:23 info systemd[1]: Removed slice User Slice of nginx. Apr 15 06:32:24 info systemd[1]: Stopping User Slice of nginx. Apr 15 06:48:55 info postfix/master[775]: warning: unix_trigger_event: read timeout for service public/pickup Apr 15 07:16:36 info systemd-journal[13462]: Permanent journal is using 96.0M (max allowed 4.0G, trying to leave 4.0G free of 63.1G available <E2><86> Apr 15 07:16:36 info systemd-journal[13462]: Journal started Apr 15 07:16:36 info postfix/master[775]: warning: unix_trigger_event: read timeout for service public/pickup Apr 15 07:16:36 info systemd[1]: systemd-journald.service watchdog timeout (limit 3min)! Apr 15 07:16:36 info postfix/master[775]: warning: unix_trigger_event: read timeout for service public/qmgr Apr 15 07:16:36 info postfix/master[775]: warning: unix_trigger_event: read timeout for service public/pickup Apr 15 07:16:36 info crond[13460]: (root) INFO (Job execution of per-minute job scheduled for 07:13 delayed into subsequent minute 07:14. Skipping job r Apr 15 07:16:36 info crond[13454]: (root) INFO (Job execution of per-minute job scheduled for 06:48 delayed into subsequent minute 07:14. Skipping job r Apr 15 07:16:36 info crond[13453]: (root) INFO (Job execution of per-minute job scheduled for 06:36 delayed into subsequent minute 07:14. Skipping job r Apr 15 07:16:36 info crond[13457]: (root) INFO (Job execution of per-minute job scheduled for 06:50 delayed into subsequent minute 07:14. Skipping job r Apr 15 07:16:36 info CROND[13485]: (root) CMD (/usr/lib64/sa/sa1 1 1) Apr 15 07:16:36 info CROND[13486]: (nginx) CMD (/usr/local/bin/php -f /home/nginx/domains/domain.name/public/cron.php) Apr 15 07:16:36 info postfix/master[775]: warning: unix_trigger_event: read timeout for service public/pickup Apr 15 07:16:36 info rsyslogd[117]: imjournal: journal reloaded... [v8.24.0 try http://www.rsyslog.com/e/0 ] Apr 15 07:16:36 info systemd-journal[13462]: Journal stopped Apr 15 07:18:08 info systemd-journal[13501]: Permanent journal is using 96.0M (max allowed 4.0G, trying to leave 4.0G free of 63.1G available <E2><86> Apr 15 07:18:08 info systemd-journal[13501]: Journal started To me it looks like there is empty space in the logs about the high load time. Is it just because system was in high use or did someone get in and swiped his/her steps..? This is the first time when I got automated warning, so I'm bit confused with this system. I didn't know ngingx receives mail. How do I read them? Code: su nginx -c mutt This account is currently not available. Or how do I change mailbox?
CSF Firewall was originally made for cpanel/WHM servers which had apache server status and is defined in /etc/csf/csf.conf so just ignore that part for now Code (Text): fgrep -n -C15 '/server-status' /etc/csf/csf.conf 1696- 1697-# Check the PT_LOAD_AVG minute Load Average (can be set to 1 5 or 15 and 1698-# defaults to 5 if set otherwise) on the server every PT_LOAD seconds. If the 1699-# load average is greater than or equal to PT_LOAD_LEVEL then an email alert is 1700-# sent. lfd then does not report subsequent high load until PT_LOAD_SKIP 1701-# seconds has passed to prevent email floods. 1702-# 1703-# Set PT_LOAD to "0" to disable this feature 1704-PT_LOAD = "600" 1705-PT_LOAD_AVG = "15" 1706-PT_LOAD_LEVEL = "8" 1707-PT_LOAD_SKIP = "3600" 1708- 1709-# This is the Apache Server Status URL used in the email alert. Requires the 1710-# Apache mod_status module to be installed and configured correctly 1711:PT_APACHESTATUS = "http://127.0.0.1/server-status" CPU load optimal is 1.0 per cpu thread so 8 cpu thread normal = 8x1 = 8.0. 16 cpu server normal is 16x1 = 16.0 so cpu load of 8 for 2 cpu server = 8/2 = 4.0 load average 4 cpu server = 8/4 = 2.0 load average 8 cpu server = 8/8 = 1.0 load average When you type top command, hit number 1 to break down per cpu stats Centmin Mod is provided as is, so troubleshooting load issues is left to end user to do. However, there's many linux tools and scripts that can help you figure out what was causing the load issues and when. If you're providing info on this forum, more info might be helpful What version of Centmin Mod ? .08 stable or .09 beta ? If .09 beta when was it installed and when was last time you updated ? What's your VPS/Server hardware specifications ? cpu type ? memory available ? disk space ? Who's your web host ? Different hosts have different limits for server resource usage and some are more restrictive that others so it could just be their resource usage policy you tripped which may or may not be restrictive. If restrictive, then real solution would be finding a better web host. If not restrictive, then it's finding out what caused high cpu load. Tools and commands you will want to read up on and learn for basic system admin tasks and troubleshooting. 20 Command Line Tools to Monitor Linux Performance 13 Linux Performance Monitoring Tools - Part 2 80 Linux Monitoring Tools for SysAdmins - Server Density Blog Amazing ! 25 Linux Performance Monitoring Tools 20 Linux System Monitoring Tools Every SysAdmin Should Know My mysqlmymonlite.sh script can help with that mysqlmymonlite.sh Addon | Centmin Mod Community README text file has instructions for setting up a cronjob to email you reports i.e. every 10/15min or every hour. Notes: Centmin Mod already installed Glances tool. It can be invoked from SSH command glances or top2. Official Glances site at nicolargo.github.io/glances/ and documentation here. For example what's output for these commands Code (Text): top -bn1 Code (Text): ps aufxw Code (Text): sar -q Code (Text): sar -r Code (Text): sar -q -f /var/log/sa/sa$(date +%d -d yesterday) Code (Text): sar -r -f /var/log/sa/sa$(date +%d -d yesterday) If you want to look at per process usage stats use pidstat run pidstat every 1 sec for 5 runs and sed is just to replace your main hostname with word hostname so mask your domain for posting on public places Code (Text): pidstat -durh 1 5 | sed -e "s|$(hostname)|hostname|g" for posting code you might want to use CODE tags for code How to use forum BBCODE code tags
.09 beta updated (a day before this happened) on the 14th of April. Installed 15th of March. This was first high usage alert. I'm running Nextcloud for 2 person. So it's mainly 100% idle according to top. Just tested small video upload. 1 thread went shortly ~85 id while other 3 was more than 90 id. When the high usage happened, we didn't used it. I think this was openvz problem or something from outisde caused it. HostUS 2GB RAM 2GB vSwap 4 vCPU Cores (Fair Share) 75GB Disk Space 2TB Bandwidth OpenVZ VPS (Special) Amsterdam Install - Post your Centmin Mod 123.09beta01 install time stats HosUS Thank You @eva2000 for your great information package! I'll study it as soon as I have some spare time.
Yes OpenVZ and HostUS host issues could be a factor. I just updated 123.09beta01's cminfo command with a cminfo top command you can also use to get a better overview of cpu, memory and disk usage stats. Still up to you to troubleshoot but cminfo top command gives you some stats to look over.