Learn about Centmin Mod LEMP Stack today
Register Now

Wordpress How to use cron to load sitemap file?

Discussion in 'Blogs & CMS usage' started by Chuong Luong, Oct 31, 2020.

  1. Chuong Luong

    Chuong Luong Member

    31
    0
    6
    Aug 8, 2019
    Ratings:
    +2
    Local Time:
    7:28 PM
    Hi,

    I want to use cron to load my sitemap_index.xml file, so that the sitemap_index.xml can be cached on Cloudflare every hour.

    I set up cron like this:

    Code:
    35 */1 * * * wget -O - -q -t 1 https://mydomain.com/sitemap_index.xml > /dev/null 2>&1
    I check the cron log, cron did run, but it seem the sitemap file did not load. Because every hour, I check https://mydomain.com/sitemap_index.xml, cloudflare always miss.

    I do have page rule on cloudflare to cache the sitemap for duration of 1 hour.

    So, is my cron correct?

    Thanks.
     
  2. eva2000

    eva2000 Administrator Staff Member

    45,633
    10,356
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +16,070
    Local Time:
    10:28 PM
    Nginx 1.19.x
    MariaDB 5.5/10.x
    Cloudflare CDN cache doesn't guarantee asset says in cache if there is no or very little request traffic for them. But 1hr should be fine.

    You really don't want to cache sitemaps too long especially if you have active updates and changes to site maps. Cloudflare cache is per data center so such cronjobs will only populate cache for CF datacenter closest to your server location which won't help the 200+ other CF datacenter located visitors or crawlers that may visit your site and hit a cache miss anyway. So if your server is in Los Angeles and you run cronjob, then US West coast Cloudflare datacenters may get a cache populated request. But then if you're in New York and check the file, it maybe a cache miss as Cloudflare US East datacenters may not have populated the cache for that file.

    You can check if the file is cached via SSH commands on server, run wget command first manually and then do a curl header check in SSH for it too
    Code (Text):
    wget -O - -q -t 1 https://mydomain.com/sitemap_index.xml
    

    Code (Text):
    curl -Ik https://mydomain.com/sitemap_index.xml
    

    that will check the Cloudflare datacenter nearest to your server.

    Cloudflare Enterprise plans though do have Cache Prefetching from all 200+ Cloudflare datacenters to populate the caches and Enterprise plans have Argo tiered cache enabled by default so middle CF datacenters can ask as a middle origin proxy so other CF datacenters can lookup from those middle CF datacenters if a request isn't in their datacenter's cache Does Cloudflare Do Prefetching?

    But if you don't have Cloudflare Enterprise, if you need to populate more Cloudflare datacenter caches, you would need to do the cache pre-warm populating from many servers geographically spread out over the internet. I do this for some Cloudflare non-Enterprise sites as I have 32+ VPS servers from many cities already apart of my centminmod.com cluster so I use those to pre-warm my caches. I also use Cloudflare Enterprise plan to do cache prefetch.
     
  3. Chuong Luong

    Chuong Luong Member

    31
    0
    6
    Aug 8, 2019
    Ratings:
    +2
    Local Time:
    7:28 PM
    Thanks for the clarification. Cache is indeed only populated with location close to my vps.

    With your suggestion like this, each vps is $5/month, so having 32+ vps like yours is a waste to only do the pre-warm. Am I correct? Or there is some $1 vps out there? My site only have the Pro plan. So I can't do the catch prefetch from Cloudflare ...
     
  4. eva2000

    eva2000 Administrator Staff Member

    45,633
    10,356
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +16,070
    Local Time:
    10:28 PM
    Nginx 1.19.x
    MariaDB 5.5/10.x
    Yeah it would be wasted to just for cache pre-warming alone. For me the cost is already fixed as I have working site across all the VPS so it isn't costing me anything extra to add a cronjob to each VPS server.

    For most folks this isn't required, as the cost of a cache miss is only for the first 200+ visitors each from each Cloudflare datacenter for first visit only. Second visit is cached.
     
  5. Chuong Luong

    Chuong Luong Member

    31
    0
    6
    Aug 8, 2019
    Ratings:
    +2
    Local Time:
    7:28 PM
    In my case is, because sitemap load is too slow, so I want it to be cached before Google bot crawl. My google console show lots of sub-sitemap error, because of slow load of sitemap :(
     
  6. eva2000

    eva2000 Administrator Staff Member

    45,633
    10,356
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +16,070
    Local Time:
    10:28 PM
    Nginx 1.19.x
    MariaDB 5.5/10.x
    Solve the problem rather than deal with the symptom = fix slow loading sitemap :)
     
  7. Chuong Luong

    Chuong Luong Member

    31
    0
    6
    Aug 8, 2019
    Ratings:
    +2
    Local Time:
    7:28 PM
    That's the only thing I can't solve up to now, since I have little knowledge about php, sql stuff. My site has ~600-700k post, everytime the sitemap (from Yoast SEO) is crawled, mysql use 100% cpu.
     
  8. eva2000

    eva2000 Administrator Staff Member

    45,633
    10,356
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +16,070
    Local Time:
    10:28 PM
    Nginx 1.19.x
    MariaDB 5.5/10.x
  9. Chuong Luong

    Chuong Luong Member

    31
    0
    6
    Aug 8, 2019
    Ratings:
    +2
    Local Time:
    7:28 PM
    I did a while ago. It seems the indexation is what they use to tackle slow query with sitemap for large sites. But even with complete indexable tables (in my case), sitemap is still slow.
     
  10. rdan

    rdan Well-Known Member

    5,003
    1,201
    113
    May 25, 2014
    Ratings:
    +1,827
    Local Time:
    8:28 PM
    Mainline
    10.2
    On my 17M post Xenforo forum, I cached it on my server for 8 hours.

    Code:
        ### Sitemap PHP
        location ~ ^/sitemap.php {
            include /usr/local/nginx/conf/php_cache.conf;
        }
    
        ### Sitemap XML
        location ~* (sitemap|sitemap-[0-9]+)\.xml$ {
            try_files $uri /index.php?$uri&$args;
            include /usr/local/nginx/conf/php_cache.conf;
        }
     
  11. eva2000

    eva2000 Administrator Staff Member

    45,633
    10,356
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +16,070
    Local Time:
    10:28 PM
    Nginx 1.19.x
    MariaDB 5.5/10.x
    Very nice use of PHP-FPM fastcgi_cache based caching (y):cool:
     
  12. Chuong Luong

    Chuong Luong Member

    31
    0
    6
    Aug 8, 2019
    Ratings:
    +2
    Local Time:
    7:28 PM
    Do you have any idea how to do this with wordpress?
     
  13. rdan

    rdan Well-Known Member

    5,003
    1,201
    113
    May 25, 2014
    Ratings:
    +1,827
    Local Time:
    8:28 PM
    Mainline
    10.2
    I think almost the same, what does sitemap url looks like on wp?
     
  14. Chuong Luong

    Chuong Luong Member

    31
    0
    6
    Aug 8, 2019
    Ratings:
    +2
    Local Time:
    7:28 PM
    it's https://mydomain.com/sitemap_index.xml

    But there is no /usr/local/nginx/conf/php_cache.conf file in my vps for wordpress. I just install wordpress from 22 of centmin.
     
  15. eva2000

    eva2000 Administrator Staff Member

    45,633
    10,356
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +16,070
    Local Time:
    10:28 PM
    Nginx 1.19.x
    MariaDB 5.5/10.x
    You'd have to manually configure PHP-FPM fastcgi_cache and all the URL and cookie exclusions to prevent caching logged i member data. Centmin Mod 123.09beta01's centmin.sh menu option 22 Wordpress installer as PHP-FPM fastcgi_cache development testing support in private right now see Wordpress - Differences between Wordpress regular install vs centmin.sh menu option 22 install.

    I demo'd Wordpress installer using PHP-FPM fastcgi_cache caching at https://blog.centminmod.com/2019/07/15/122/how-to-install-wordpress-on-centmin-mod-lemp-stack-guide/ when creating my blog for https://servermanager.guide/. But it's still in private testing as there are bugs and gotcha's not ready for prime time support by me.

    From that blog write up
    If you're using Cloudflare you can also cache sitemaps at Cloudflare CDN level too via page rules or Cloudflare Worker custom caching which is what I do on my forums and Wordpress site right now via Cloudflare Worker custom caching.

    example
    Code (Text):
    curl -I https://blog.centminmod.com/sitemap_index.xml
    HTTP/2 200 
    date: Sun, 01 Nov 2020 08:18:04 GMT
    content-type: text/xml; charset=UTF-8
    set-cookie: __cfduid=d9bfc938f6163f5a5efe16dea7020bf751604218684; expires=Tue, 01-Dec-20 08:18:04 GMT; path=/; domain=.centminmod.com; HttpOnly; SameSite=Lax
    cf-ray: 5eb42eda4d03ca57-YUL
    age: 7
    cache-control: public, max-age=120
    expires: Sun, 01 Nov 2020 08:20:04 GMT
    link: <https://blog.centminmod.com/sitemap_index.xml>; rel="canonical"
    strict-transport-security: max-age=31536000; includeSubdomains;
    vary: Accept-Encoding
    cf-cache-status: HIT
    cf-cachetime: 120
    cf-req-country: CA
    cf-request-id: 06247b9c6f0000ca57bd97f000000001
    cf-tls: TLSv1.3
    expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
    permissions-policy: accelerometer=(), camera=(), geolocation=(), gyroscope=(), magnetometer=(), microphone=(), payment=(), usb=()
    pragma: public
    referrer-policy: strict-origin-when-cross-origin
    x-content-type-options: nosniff
    x-frame-options: SAMEORIGIN
    x-powered-by: centminmod
    x-robots-tag: noindex
    x-ua-compatible: IE=edge
    x-xss-protection: 1; mode=block
    server: cloudflare
    
     
  16. Chuong Luong

    Chuong Luong Member

    31
    0
    6
    Aug 8, 2019
    Ratings:
    +2
    Local Time:
    7:28 PM
    So, cache the sitemap in my vps is a no-go for me, right? I am caching the sitemap with Cloudflare with page rules, but like you mention: not all edges of cloudflare fetch my site map. Is using Cloudflare Workers any different? Could you guide me?
     
  17. eva2000

    eva2000 Administrator Staff Member

    45,633
    10,356
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +16,070
    Local Time:
    10:28 PM
    Nginx 1.19.x
    MariaDB 5.5/10.x
    Cloudflare Worker won't be any different than Cloudflare page rules. Just with CF Worker I can set minimum cache TTL below 30 mins i.e. 120 seconds which page rules can't with CF Edge TTL min cache value not able to set that low unless you're on Cloudflare Enterprise which can then set down to 1 second if needed. CF Free min is 2hrs, Pro is 1hr and Business plan is 30 mins minimum.

    I have Cloudflare free, pro, business and enterprise plan accounts. But with CF Worker I can do same caching regardless of CF plan type.

    On Cloudflare Enterprise Edge Cache TTL can be set as low as 1 second

    upload_2020-11-1_18-31-31.png
     
    Last edited: Nov 1, 2020
  18. Chuong Luong

    Chuong Luong Member

    31
    0
    6
    Aug 8, 2019
    Ratings:
    +2
    Local Time:
    7:28 PM
    Yep, I have the exact setup for Cloudflare page rule (cache everything & edge cache is 1 hour). However, now I what to cache the sitemap on vps, so that no matter where cloudflare's edge fetch, the sitemap will still return faster.