Join the community today
Become a Member

Sysadmin my server reboots at least once a month

Discussion in 'System Administration' started by yunos, Jan 13, 2021.

  1. yunos

    yunos Member

    130
    3
    18
    Aug 8, 2015
    Ratings:
    +17
    Local Time:
    11:16 PM
    1.8.0
    its not a scheduled reboot, also i have kdump enabled

    Message: [ 0.000000] Reserving 166MB of memory at 688MB for crashkernel (System RAM: 98268MB)
    no logged dumpfile in /var/crash

    dmesg didnt find out of ordinary but maybe i missed something
    heres the log
    https://pastebin.com/sEEEckTu

    I did check if my ssd was the issue but no errors found


    although checking the boot logs seems awefully like a scheduled one althouh i dont have anythign like that in crontab

    l23QprC.png (343×119) (imgur.com)

    doing last command i got this
    29hllww.png (582×18) (imgur.com)
    but there were no users online who did that reboot


    upon checking cron logs to match the exact timing as when server got rebooted

    Jan 12 20:51:55 crond[1115]: (CRON) INFO (RANDOM_DELAY will be scaled with factor 38% if used.)
    Jan 12 20:51:55 crond[1115]: (nodequery) ORPHAN (no passwd entry)
    Jan 12 20:51:55 crond[1115]: (lionel) ORPHAN (no passwd entry)
    Jan 12 20:51:55 crond[1115]: (CRON) INFO (running with inotify support)



    Upon further checking in /var/secure i found this in the exact timeframe my server got rebooted

    Jan 12 20:51:55 polkitd[1060]: Loading rules from directory /etc/polkit-1/rules.d
    Jan 12 20:51:55 polkitd[1060]: Loading rules from directory /usr/share/polkit-1/rules.d
    Jan 12 20:51:55 polkitd[1060]: Finished loading, compiling and executing 2 rules
    Jan 12 20:51:55 polkitd[1060]: Acquired the name org.freedesktop.PolicyKit1 on the system bus
    Jan 12 20:51:59 runuser: pam_unix(runuser-l:session): session opened for user root by (uid=0)
    Jan 12 20:51:59 runuser: pam_unix(runuser-l:session): session closed for user root
     
    Last edited: Jan 13, 2021
  2. eva2000

    eva2000 Administrator Staff Member

    54,519
    12,211
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,780
    Local Time:
    8:16 AM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    Who is the web host? Have you asked if any emergency maintenance/reboots were done at their end ? Though if it reboots one a month then that would be strange. What's output for these 2 commands to filter command history for keywords that reboot system and also check cronjob list

    Code (Text):
    history | egrep 'shutdown|reboot'

    example output
    Code (Text):
    history | egrep 'shutdown|reboot'   
        3  [13.01.21] 11:39:46   reboot
      573  [03.08.19] 20:13:24   reboot
      786  [25.09.19] 12:56:42   reboot
      793  [25.09.19] 13:05:28   reboot
      989  [24.10.19] 11:48:03   reboot
     1111  [18.11.19] 17:07:54   reboot
     1162  [18.12.19] 15:37:50   reboot
     1321  [19.02.20] 05:24:58   reboot
    

    and cronjob listing you can mask actual domain names etc for privacy too from the output of this command
    Code (Text):
    crontab -l

    then check monthly cronjobs
    Code (Text):
    ls -lah /etc/cron.monthly/

    weekly cronjobs
    Code (Text):
    ls -lah ls -lah /etc/cron.weekly/

    daily cronjobs
    Code (Text):
    ls -lah /etc/cron.daily/

    For daily normal cronjobs on CentminMod system might look like
    Code (Text):
    ls -lah /etc/cron.daily/ 
    total 36K
    drwxr-xr-x.  2 root root  100 May 13  2020 .
    drwxr-xr-x. 98 root root 8.0K Jan  7 15:52 ..
    -rwx------   1 root root 3.3K Jun 17  2020 csget
    -rwxr-xr-x   1 root root  979 Aug  6  2019 cyrus-imapd
    -rwxr-xr-x   1 root root 2.2K Jul 12  2019 diskalert
    -rwx------   1 root root  219 Apr  1  2020 logrotate
    -rwxr-xr-x   1 root root  618 Oct 30  2018 man-db.cron
    -rwx------   1 root root  208 Apr 10  2018 mlocate
    

    For posting code or output from commands to keep the formatting, you might want to use CODE tags for code How to use forum BBCODE code tags :)
     
  3. yunos

    yunos Member

    130
    3
    18
    Aug 8, 2015
    Ratings:
    +17
    Local Time:
    11:16 PM
    1.8.0
    hi, thanks for replying. Im renting a dedicated server and host has not said there are any downtime or maintenance. They also checked no logs showing that my server stopped receiving power.

    history | egrep 'shutdown|reboot'
    Showed no results


    Code (Text):
    crontab -l
    13 23 * * * /usr/local/src/centminmod/tools/autoprotect.sh >/dev/null 2>&1
    0 */4 * * * /usr/bin/cminfo_updater 2>/dev/null
    22 */12 * * * /usr/local/src/centminmod/tools/csfcf.sh auto >/dev/null 2>&1
    0 0 * * 5 /usr/local/src/centminmod/tools/cf-authenticated-origin-cert-update.sh update >/dev/null 2>&1
    22 0 * * * "/root/.acme.sh"/acme.sh --cron --home "/root/.acme.sh" > /dev/null
    5 12 * * * service docker restart
    11 */12 * * * /usr/local/src/centminmod/tools/jetpackips.sh >/dev/null 2>&1



    Code (Text):
    ls -lah /etc/cron.monthly/
    total 16K
    drwxr-xr-x.   2 root root 4.0K Jun  9  2014 .
    drwxr-xr-x. 116 root root  12K Jan 12 20:52 ..


    Code (Text):
    ls -lah /etc/cron.weekly/
    total 16K
    drwxr-xr-x.   2 root root 4.0K Jun  9  2014 .
    drwxr-xr-x. 116 root root  12K Jan 12 20:52 ..


    Code (Text):
    ls -lah /etc/cron.daily/
    total 40K
    drwxr-xr-x.   2 root root 4.0K May  4  2020 .
    drwxr-xr-x. 116 root root  12K Jan 12 20:52 ..
    -rwx------    1 root root 3.3K Jun 17  2020 csget
    -rwxr-xr-x    1 root root 2.2K Mar 21  2020 diskalert
    -rwx------    1 root root  219 Apr  1  2020 logrotate
    -rwxr-xr-x    1 root root 3.9K Mar 21  2020 maldet
    -rwxr-xr-x.   1 root root  618 Oct 30  2018 man-db.cron
    -rwx------    1 root root  208 Apr 10  2018 mlocate
     
  4. eva2000

    eva2000 Administrator Staff Member

    54,519
    12,211
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,780
    Local Time:
    8:16 AM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    Yeah nothing that output suggests it's scripted at least
    Might need to install auditd via Centmin Mod's tools/auditd.sh addon script as outlined at https://community.centminmod.com/th...td-support-added-in-latest-123-09beta01.9071/ so we can track down who/what accessed files, directories and user logins etc. Examples of how you can track file/directory accesses https://community.centminmod.com/th...added-in-latest-123-09beta01.9071/#post-37761 and tracking sudo users https://community.centminmod.com/th...sion-thread-for-123-09beta01.9089/#post-52814

    you need to set in persistent config file /etc/centminmod/custom_config.inc the variable below before running it:
    Code (Text):
    AUDITD_ENABLE='y'

    To install and setup tools/auditd.sh run
    Code (Text):
    /usr/local/src/centminmod/tools/auditd.sh setup


    auditd logs will only record entries since it's install, so probably need to wait for next reboot though you can do things like check all successful SSH logs
    Code (Text):
    aureport -au -i --success
    

    successful login summary
    Code (Text):
    aureport -l --success --summary -i
    

    failed SSH logins
    Code (Text):
    aureport -au -i --failed
    

    The default custom tools/auditd.sh configuration already has a rule to track reboot and shutdown commands via assigned key name = power
    Code (Text):
    auditctl -l | egrep -i 'shutdown|reboot'
    -w /sbin/shutdown -p x -k power
    -w /sbin/reboot -p x -k power
    

    So no need to do setup outlined like at https://www.thegeekdiary.com/audit-rules-to-log-reboot-command-executions-in-centos-rhel/ but example commands outlined are similar

    example you can see /sbin/reboot is assigned a key named = power so you can search auditd logs by key = power
    Code (Text):
    ausearch -k power

    full list of auditd rules setup from tools/auditd.sh setup run can be seen via command
    Code (Text):
    auditctl -l
     
  5. yunos

    yunos Member

    130
    3
    18
    Aug 8, 2015
    Ratings:
    +17
    Local Time:
    11:16 PM
    1.8.0
    thanks, ive installed auditid. But none of my users who has ssh account has sudo access. Only i do though
     
  6. eva2000

    eva2000 Administrator Staff Member

    54,519
    12,211
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,780
    Local Time:
    8:16 AM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    auditd tracks all linux users on server i.e. root and sudo so can be used to figure out who ran the reboot command and when
     
  7. yunos

    yunos Member

    130
    3
    18
    Aug 8, 2015
    Ratings:
    +17
    Local Time:
    11:16 PM
    1.8.0
    so my server got rebooted again but this time i dont see anyone beside me who logged in to the server

    ausearch -k power
    ----
    time->Tue Jan 26 00:06:53 2021
    type=CONFIG_CHANGE msg=audit(1611619613.014:24): auid=4294967295 ses=4294967295 op=add_rule key="power" list=4 res=1
    ----
    time->Tue Jan 26 00:06:53 2021
    type=CONFIG_CHANGE msg=audit(1611619613.014:25): auid=4294967295 ses=4294967295 op=add_rule key="power" list=4 res=1
    ----
    time->Tue Jan 26 00:06:53 2021
    type=CONFIG_CHANGE msg=audit(1611619613.014:26): auid=4294967295 ses=4294967295 op=add_rule key="power" list=4 res=1
    ----
    time->Tue Jan 26 00:06:53 2021
    type=CONFIG_CHANGE msg=audit(1611619613.014:27): auid=4294967295 ses=4294967295 op=add_rule key="power" list=4 res=1
     
  8. eva2000

    eva2000 Administrator Staff Member

    54,519
    12,211
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,780
    Local Time:
    8:16 AM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    and last command output for most recent reboot?
     
  9. yunos

    yunos Member

    130
    3
    18
    Aug 8, 2015
    Ratings:
    +17
    Local Time:
    11:16 PM
    1.8.0
    reboot system boot 3.10.0-1160.6.1. Tue Jan 26 00:06 - 02:43 (02:36)
    the reboot happened at 1:07 AM
    as usual /var/crashed is empty
     
  10. eva2000

    eva2000 Administrator Staff Member

    54,519
    12,211
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,780
    Local Time:
    8:16 AM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
  11. yunos

    yunos Member

    130
    3
    18
    Aug 8, 2015
    Ratings:
    +17
    Local Time:
    11:16 PM
    1.8.0
    last -Fxn2 shutdown reboot
    Code (Text):
    reboot   system boot  3.10.0-1160.6.1. Tue Jan 26 00:06:49 2021 - Tue Jan 26 03:44:10 2021  (03:37)
    reboot   system boot  3.10.0-1160.6.1. Tue Jan 12 20:51:51 2021 - Tue Jan 26 03:44:10 2021 (13+06:52)
    



    ausearch -i -m system_boot,system_shutdown | tail -4
    Code (Text):
    type=SYSTEM_BOOT msg=audit(01/26/2021 00:06:53.859:393) : pid=1020 uid=root auid=unset ses=unset msg=' comm=systemd-update-utmp exe=/usr/lib/systemd/systemd-update-utmp hostname=? addr=? terminal=? res=success'
    
    


    ill create service unit to monitor this but do you have any theories what might be wrong?
     
  12. yunos

    yunos Member

    130
    3
    18
    Aug 8, 2015
    Ratings:
    +17
    Local Time:
    11:16 PM
    1.8.0
    For those who are curious, i never found the exact reason for the server reboot.
    Basically i asked provider to swap to a new hardware to narrow down the issues. But then due to new hardware enviroment, the server didnt detect ssd mounting partion, rescue mode couldnt fix the issue and they didnt know how to fix it in dracut mode. Our conversation exchanged them and me litterally googling for solutions.

    usually rebuilding initramfs should work but it couldnt detect the ssd yet again. They couldnt even provide me the dracut error log because server didnt detect the flashdrives so i couldnt find the root cause and provider incapability to fix it for me.

    The issue went even worse when i asked for ipmi access. even provider had issues setting that up. That was last call and i ended up going to a new provider for their incompetence.

    The provider is by the way dedioutlet.com
     
  13. eva2000

    eva2000 Administrator Staff Member

    54,519
    12,211
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,780
    Local Time:
    8:16 AM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    Ouch yeah probably best to move web hosts and thanks for sharing the problematic web host so we can avoid!