Want to subscribe to topics you're interested in?
Become a Member

Troubleshooting website and database performance, speed, slowdowns. PostgreSQL, MariaDB

Discussion in 'Web Development & Web Performance' started by MaximilianKohler, Jan 18, 2024.

  1. MaximilianKohler

    MaximilianKohler Member

    201
    7
    18
    Jun 23, 2023
    Ratings:
    +34
    Local Time:
    2:34 PM
    My server was running smoothly. My Xenforo forum (Mariadb) is fast and my listmonk (PostgreSQL db) has been faster with centmin + binary than it was with AWS EC2 + docker. Those are the only things running on my server, and my resource usage is low. However, I've started getting severe lag when opening up the listmonk UI to where it basically doesn't load at all and I have to try again later. I got a delay once when opening the server via Putty, but usually it opens right away and shows low CPU usage.

    Usually my forum refreshes without issue, but it's cached by Cloudflare. But when listmonk is lagging I refreshed my forum and admin page, and it was slow too (especially with ctrl+F5). So it definitely seems to be a server problem. Forum traffic hasn't spiked. Restarting my server (33 days uptime) did not fix it.

    So I ran the server spec calculator again and the recommendation it gave me was lower than the first time. With a noticeable difference being that Peak CPU Usage was 17.7% last time and this time it was 100%. And average and Peak CPU Steal are both 0% now, whereas before it was 32.65%.

    I found this thread that recommends the "cminfo top" command. Here's the log for it: pastebin.com/PjPjmHJD. I'm no expert, but nothing stood out to me.

    I was thinking about disabling the Cloudflare proxies as a troubleshooting step, but I don't think it's Cloudflare, and that's not secure. The main listmonk page demo.listmonk.app/ should be the only thing proxied by Cloudflare, and that page loads fine. After you login demo.listmonk.app/admin it's reading directly from the postgres database and that is where it's super slow/doesn't load.

    I got some emails today about Cloudflare slowdowns, but they were in Texas, and my issue seems to be only when the databases on the server are being accessed.

    I saw webpagetest.org mentioned in another thread and ran a test on my forum and listmonk main page and didn't see any major problems. https://catbox.moe/c/4latge

    I contacted Hetzner and they said everything looked normal and I should check my logs.

    Centmin Mod Log Files
    Code:
    tail -10 /usr/local/nginx/logs/error.log
    is incorrect https://files.catbox.moe/8s8zbq.jpg. It's similar for most of the log files listed there.

    error.log-20240117 only has warnings about the "listen ... http2" directive is deprecated, use the "http2" directive instead.

    I don't see anything unusual in the /home/nginx/domains/domainname.com/log/access.log for the listmonk or forum domains.

    /home/nginx/domains/domainname.com/log/error.log for listmonk does have some timed out notices.
    Code:
    2024/01/17 15:38:02 [error] 1754#1754: *16739 upstream timed out (110: Connection timed out) while reading response header from upstream, client: <MY IP>, server: listmonk.mysite.com, request: "GET /api/events?type=error HTTP/2.0", upstream: "http://127.0.0.1:9011/api/events?type=error", host: "listmonk.mysite.com", referrer: "https://listmonk.mysite.com/admin"
    2024/01/17 15:59:18 [error] 1755#1755: *17243 upstream timed out (110: Connection timed out) while reading response header from upstream, client: <MY IP>, server: listmonk.mysite.com, request: "GET /api/events?type=error HTTP/2.0", upstream: "http://127.0.0.1:9011/api/events?type=error", host: "listmonk.mysite.com", referrer: "https://listmonk.mysite.com/admin/settings/logs"
    2024/01/17 18:25:47 [error] 1754#1754: *19500 upstream timed out (110: Connection timed out) while reading response header from upstream, client: <MY IP>, server: listmonk.mysite.com, request: "GET /api/events?type=error HTTP/2.0", upstream: "http://127.0.0.1:9011/api/events?type=error", host: "listmonk.mysite.com", referrer: "https://listmonk.mysite.com/admin"
    
    I had a similar issue before, but that was with docker, and it was the firewall blocking docker. This error message is also a bit different. I did a search for it and the results seem to confirm it's a server issue.

    error.log is 0 bytes for my forum. The real log is "error.log-20240117" and contains some favicon errors which seems odd since Xenforo should be loading those from its db:
    Code:
    2024/01/16 07:27:44 [error] 12104#12104: *657920 open() "/home/nginx/domains/myforum.com/public/favicon.ico" failed (2: No such file or directory), client: <someone's IP>, server: myforum.com, request: "GET /favicon.ico HTTP/2.0", host: "myforum.com", referrer: "https://myforum.com/forums/thread123/?prefix_id=40"
    2024/01/16 11:41:55 [error] 12103#12103: *666394 open() "/home/nginx/domains/myforum.com/public/favicon.ico" failed (2: No such file or directory), client: <someone's IP>, server: myforum.com, request: "GET /favicon.ico HTTP/2.0", host: "myforum.com", referrer: "https://myforum.com/robots.txt"
    2024/01/17 01:36:38 [error] 1754#1754: *1812 open() "/home/nginx/domains/myforum.com/public/favicon.ico" failed (2: No such file or directory), client: <someone's IP>, server: myforum.com, request: "GET /favicon.ico HTTP/2.0", host: "myforum.com", referrer: "https://myforum.com/forums/thread123/?prefix_id=40"
    2024/01/17 02:18:05 [error] 1754#1754: *2500 open() "/home/nginx/domains/myforum.com/public/favicon.ico" failed (2: No such file or directory), client: <MY IP>, server: myforum.com, request: "GET /favicon.ico HTTP/2.0", host: "myforum.com", referrer: "https://myforum.com/admin.php"
    2024/01/17 04:08:46 [error] 1755#1755: *3931 open() "/home/nginx/domains/myforum.com/public/favicon.ico" failed (2: No such file or directory), client: <MY IP>, server: myforum.com, request: "GET /favicon.ico HTTP/2.0", host: "myforum.com", referrer: "https://myforum.com/admin.php"
    2024/01/17 04:33:35 [error] 1754#1754: *4191 open() "/home/nginx/domains/myforum.com/public/.well-known/assetlinks.json" failed (2: No such file or directory), client: <someone's IP>, server: myforum.com, request: "GET /.well-known/assetlinks.json HTTP/2.0", host: "myforum.com"
    In /var/log/php-fpm/www-error.log-20240107 I see
    Code:
    [07-Jan-2024 07:10:07] WARNING: [pool www] server reached max_children setting (30), consider raising it
    But only one from 10 days ago.

    www-php.error.log-20230908 doesn't have anything noteworthy.


    journalctl -u postgresql-15 --no-pager has nothing noteworthy.

    Code:
    systemctl status postgresql-15 -l
    ● postgresql-15.service - PostgreSQL 15 database server
       Loaded: loaded (/usr/lib/systemd/system/postgresql-15.service; enabled; vendor preset: disabled)
       Active: active (running) since Tue 2024-01-16 23:34:37 PST; 20h ago
         Docs: https://www.postgresql.org/docs/15/static/
     Main PID: 1663 (postmaster)
       CGroup: /system.slice/postgresql-15.service
               ├─ 1663 /usr/pgsql-15/bin/postmaster -D /var/lib/pgsql/15/data/
               ├─ 1702 postgres: logger
               ├─ 1732 postgres: checkpointer
               ├─ 1733 postgres: background writer
               ├─ 1747 postgres: walwriter
               ├─ 1748 postgres: autovacuum launcher
               ├─ 1749 postgres: logical replication launcher
               └─39508 postgres: listmonk listmonk ::1(23574) idle
    
    Jan 16 23:34:37 xxx systemd[1]: Starting PostgreSQL 15 database server...
    Jan 16 23:34:37 xxx postmaster[1663]: 2024-01-16 23:34:37.210 PST [1663] LOG:  redirecting log output to logging collector process
    Jan 16 23:34:37 xxx postmaster[1663]: 2024-01-16 23:34:37.210 PST [1663] HINT:  Future log output will appear in directory "log".
    Jan 16 23:34:37 xxx systemd[1]: Started PostgreSQL 15 database server.
    
    Didn't show me where the postgres logs were. This did (they were stderr), and this. I got a permission denied error, but another command "worked" but didn't tell me much:
    Code:
    listmonk=> SELECT pg_current_logfile();
    ERROR:  permission denied for function pg_current_logfile
    listmonk=>
    listmonk=> SHOW log_directory;
    ERROR:  must be superuser or have privileges of pg_read_all_settings to examine "log_directory"
    listmonk=> \! sudo lsof | grep postgres | grep 1w
    postmaste  1663        postgres    1w     FIFO                0,9       0t0      21627 pipe
    postmaste  1732        postgres    1w     FIFO                0,9       0t0      21627 pipe
    postmaste  1733        postgres    1w     FIFO                0,9       0t0      21627 pipe
    postmaste  1747        postgres    1w     FIFO                0,9       0t0      21627 pipe
    postmaste  1748        postgres    1w     FIFO                0,9       0t0      21627 pipe
    postmaste  1749        postgres    1w     FIFO                0,9       0t0      21627 pipe
    postmaste 39988        postgres    1w     FIFO                0,9       0t0      21627 pipe
    postmaste 40067        postgres    1w     FIFO                0,9       0t0      21627 pipe
    listmonk=>
    
    I tried switching to user postgres:
    List users:
    \du
    Change user:
    \c postgres

    But I still get the errors:
    Code:
    \du
                                       List of roles
     Role name |                         Attributes                         | Member of
    -----------+------------------------------------------------------------+-----------
     listmonk  |                                                            | {}
     postgres  | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
    
    listmonk=> \c postgres
    You are now connected to database "postgres" as user "listmonk".
    postgres=> SELECT pg_current_logfile();
    ERROR:  permission denied for function pg_current_logfile
    postgres=> SHOW log_directory;
    ERROR:  must be superuser or have privileges of pg_read_all_settings to examine "log_directory"
    
    I had to \q then login via sudo -i -u postgres.
    Code:
    postgres=# SELECT pg_current_logfile();
       pg_current_logfile
    ------------------------
     log/postgresql-Wed.log
    (1 row)
    
    postgres=# SHOW log_directory;
     log_directory
    ---------------
     log
    (1 row)
    
    postgres=# SHOW data_directory;
         data_directory
    ------------------------
     /var/lib/pgsql/15/data
    (1 row)
    
    So postgres logs are in /var/lib/pgsql/15/data/log. And they have nothing useful that I can tell: pastebin.com/paf89uR2
     
    Last edited: Jan 18, 2024
  2. MaximilianKohler

    MaximilianKohler Member

    201
    7
    18
    Jun 23, 2023
    Ratings:
    +34
    Local Time:
    2:34 PM
    My forum's admin panel is clearly very laggy as well. And even the main page demo.listmonk.app is starting to have trouble now.

    I got a reply from Hetzner saying it's not their fault and they won't help:
    Their forum is completely private so I can't even search it to see if there are other people having the same issue.
     
    Last edited: Jan 18, 2024
  3. MaximilianKohler

    MaximilianKohler Member

    201
    7
    18
    Jun 23, 2023
    Ratings:
    +34
    Local Time:
    2:34 PM
    Since this is an easy step I decided to create a new server from a snapshot but choose a location in another country. If it was still laggy after this it would mean that either Hetzner's servers worldwide are having issues, or it's some software issue, or it's Cloudflare.

    So far so good. But after restarting my server before it seemed to temporarily help as well. So I'm wondering if there might be some cache I may need to clear. But I'm not seeing any mention of that type of thing at Getting Started Guide - CentminMod.com LEMP Nginx web stack for CentOS or https://community.centminmod.com/threads/how-to-boost-centmin-mod-lemp-stack-performance.13776, etc.
     
  4. MaximilianKohler

    MaximilianKohler Member

    201
    7
    18
    Jun 23, 2023
    Ratings:
    +34
    Local Time:
    2:34 PM
    Well, switching server locations seems to have fixed it. 6 days later I got into the forum and it's all in German. From what I can tell, there aren't a bunch of other people having this problem. I posted on their forum (link for my reference) and the response I got was it was likely an issue that Hetzner missed.
    I'm wondering if anyone else has experienced this type of thing with any cloud provider?
     
  5. eva2000

    eva2000 Administrator Staff Member

    54,896
    12,240
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,811
    Local Time:
    7:34 AM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    Yeah that can happen especially for shared CPU VPS servers as opposed to dedicated CPU VPS servers. Usually, web host would move the client's VPS to another VPS host node or clientw would move themselves.
     
  6. MaximilianKohler

    MaximilianKohler Member

    201
    7
    18
    Jun 23, 2023
    Ratings:
    +34
    Local Time:
    2:34 PM
    I contacted Hetzner support again to let them know that it seems to be an issue on their end that they missed, and according to their own graphs it seems my server was down for hours before I contacted them yet they didn't seem to know and I didn't find out till a week later.

    When I posted on their forum about it people recommended I use 3rd party server monitoring tools. The recommendations were:

    Cheapest/simplest: UptimeRobot (hosted) or UptimeKuma (selfhosted).

    Another person mentioned https://updown.io/ which gives you 100k credits per month, which means you can check once a minute for free.

    Anyone have experience with these or other similar tools?
     
    Last edited: Jan 28, 2024
  7. eva2000

    eva2000 Administrator Staff Member

    54,896
    12,240
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,811
    Local Time:
    7:34 AM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
  8. deltahf

    deltahf Premium Member Premium Member

    587
    265
    63
    Jun 8, 2014
    Ratings:
    +489
    Local Time:
    5:34 PM
    I recommend NewRelic and its PHP Application Monitoring. It can be tricky to configure and it has a confusing and intimidating interface, but once you get it setup I personally think it is an invaluable tool for troubleshooting and monitoring server infrastructure. I would feel totally lost and blind without it.

    You can see how I used NewRelic graphs and data to tune and monitor my WordPress and XenForo installs over the past few years here in this thread.

    NewRelic offers both server monitoring (processes, CPU load, network traffic, etc.) and "Application Monitoring". Keep in mind these are different things. There are a lot of "server monitoring" solutions out there, but Application Monitoring is more specialized and harder to find because it integrates with your applications and databases to give you a view into exactly what operations or parts of your applications are running slow or causing problems. It is often called "APM", so do some searches for that and you will find the other providers that offer these services.

    I use NewRelic because I like their pay-per-usage model and find it more affordable than the others, and you can fine-tune exactly how much data you want to ingest which will have a direct impact on your bill. I pay about $20-$30 USD per month for the service.