Welcome to Centmin Mod Community
Become a Member

Server suddenly went offline, no response to ping... what happened?

Discussion in 'System Administration' started by deltahf, Sep 2, 2019.

  1. deltahf

    deltahf Premium Member Premium Member

    335
    150
    43
    Jun 8, 2014
    Ratings:
    +238
    Local Time:
    8:33 AM
    My dedicated CMM server was humming along just fine... then it went dark. No response to ping, no SSH, nothing. Contacted Hivelocity support chat (which is awesome, by the way) and they rebooted the server via console.

    It rebooted and came back to life as normal, but...

    Server monitoring and APM shows no indications of any problems. No unusual bandwidth usage. Nothing odd in the logs I checked.

    What in the world happened? How can I even begin to troubleshoot this to ensure it won't happen again?
     
  2. eva2000

    eva2000 Administrator Staff Member

    41,350
    9,279
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +14,236
    Local Time:
    10:33 PM
    Nginx 1.17.x
    MariaDB 5.5/10.x
    So many reasons for this to occur. If you check ping etc from locations other than your ISP IP/devices ? i.e. 3rd party uptime/checks ?

    One most likely is web host network connectivity issues which your webhost would need to look into so check web host's service status/outage page reports. For example, I check my VPS servers' webhosts outage/status trackers VPS Provider Network Status.

    Other reasons are connectivity between your ISP and your server i.e. sometimes my VPN servers looses connectivity and needs re-connecting.

    For Centmin Mod internal server checks, also check system logs (journalctl and /var/log/messages, CSF Firewall logs /var/log/lfd.log etc. First thing though is to get a more specific date/time of when the issue started and stopped so you can inspect logs around that time.

    Otherwise, need to hire someone to investigate ;) :D
     
    • Like Like x 1
  3. wmtech

    wmtech Active Member

    113
    32
    28
    Jul 22, 2017
    Ratings:
    +85
    Local Time:
    2:33 PM
    Depends.

    If it is a dedicated server running for several years and has it's own power supply the most possible reason would be a defunct power supply. If this is the case the server will fail again in shorter intervals.

    Otherwise there can be a lot of reasons why a dedicated server crashes. Most reasons cannot be found out easily. So I would recommend to change to a newer server if the problem occurs again.
     
  4. BamaStangGuy

    BamaStangGuy Active Member

    577
    174
    43
    May 25, 2014
    Ratings:
    +236
    Local Time:
    7:33 AM
    If it happens again I can take a look at it for you.
     
    • Like Like x 1
  5. deltahf

    deltahf Premium Member Premium Member

    335
    150
    43
    Jun 8, 2014
    Ratings:
    +238
    Local Time:
    8:33 AM
    Yeah, it was offline for everyone (got lots of reports from users), including the Hivelocity support staff in the data center, so not a network issue.

    I do know the exact time, 21:52 UTC.

    In the messages log, I just see this. I guess the repeating "^@" indicates a reboot? Nothing unusual before it, just firewall block entries.

    Code (Text):
    ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
    Sep  1 22:41:11 hv kernel: microcode: microcode updated early to revision 0xcc, date = 2019-04-01
    Sep  1 22:41:11 hv kernel: Initializing cgroup subsys cpuset
    Sep  1 22:41:11 hv kernel: Initializing cgroup subsys cpu
    Sep  1 22:41:11 hv kernel: Initializing cgroup subsys cpuacct
    Sep  1 22:41:11 hv kernel: Linux version 3.10.0-957.27.2.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC) ) #1 SMP Mon Jul 29 17:46:05 UTC 2019
    Sep  1 22:41:11 hv kernel: Command line: BOOT_IMAGE=/vmlinuz-3.10.0-957.27.2.el7.x86_64 root=/dev/mapper/vg1-root ro crashkernel=auto rd.lvm.lv=vg1/root rd.lvm.lv=vg1/swap rhgb quiet LANG=en_US.UTF-8
    


    journalctl only seems to store info since the reboot, so I can't find any clues about what might have happened when it crashed:

    Code (Text):
    $ journalctl --list-boots
     0 617227989cd64655b2543a4908c26eb0 Sun 2019-09-01 22:41:11 UTC—Mon 2019-09-02 19:26:11 UTC


    Is that normal or can/should I configure that to save more info about previous boots?

    Yeah, it's just over a year old... hopefully too young for any hardware issues. :eek:
     
  6. jcat

    jcat Member

    110
    17
    18
    Jun 21, 2015
    Ratings:
    +44
    Local Time:
    8:33 AM
    • Informative Informative x 3
  7. deltahf

    deltahf Premium Member Premium Member

    335
    150
    43
    Jun 8, 2014
    Ratings:
    +238
    Local Time:
    8:33 AM
    Thanks for the suggestion! I do have a /var/crash directory but it's empty, so I'm guessing kdump is not installed or configured. I will look into it for sure.
     
  8. eva2000

    eva2000 Administrator Staff Member

    41,350
    9,279
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +14,236
    Local Time:
    10:33 PM
    Nginx 1.17.x
    MariaDB 5.5/10.x
    • Informative Informative x 2