Learn about Centmin Mod LEMP Stack today
Become a Member

Cloudflare Cloudflare Tunnel Setup Guide - warnings, suggestions, and questions!

Discussion in 'System Administration' started by deltahf, May 31, 2021.

Tags:
  1. deltahf

    deltahf Premium Member Premium Member

    423
    186
    43
    Jun 8, 2014
    Ratings:
    +327
    Local Time:
    6:06 AM
    I just switched from Cloudflare Authenticated Origin Pulls to Cloudflare (Argo) Tunnel. Tunnel definitely feels like the future of using Cloudflare with your website and I would encourage everyone to start using it, too. Nevertheless, I have a few bruises and scars to show for it so I thought I would share a few things from my experience. :whistle:

    First of all, thank you as always to @eva2000 for his excellent Cloudflare Argo Tunnel on CentOS 7 Setup Guide. It's an awesome resource and I would not have been able to get it working without that guide.

    Now for my warnings and suggestions. In Step 3 of the setup guide:

    This is really, really important!

    I was editing /root/.cloudflared/config.yml and restarting clouflared and could not figure out why it wasn't picking up the changes. Once I edited /etc/cloudflared/config.yml everything worked. This seems like a really strange and confusing design choice on Cloudflare's part. I don't understand why they would have two config files, one that is read during install and the other that is read during restarts...?

    I presume that most people will be going through the guide for the first time using "tun." subdomains or non-critical domains, just to get a feel for CF Tunnels before switching it on for our "live" domains, so we will be going back to edit the config.yml files frequently. I know you already mention it in the guide, @eva2000, but it might be worth making a more significant warning out of it. :)

    Next up is an important modification that needs to be made to /usr/local/nginx/conf/cloudflare.conf. This config file exists, of course, to replace or overwrite the Cloudflare IP on the incoming request with that of the real user. Mine looked like this:

    Code (Text):
    include /usr/local/nginx/conf/cloudflare_customips.conf;
    set_real_ip_from 173.245.48.0/20;
    set_real_ip_from 103.21.244.0/22;
    set_real_ip_from 103.22.200.0/22;
    set_real_ip_from 103.31.4.0/22;
    set_real_ip_from 141.101.64.0/18;
    set_real_ip_from 108.162.192.0/18;
    set_real_ip_from 190.93.240.0/20;
    set_real_ip_from 188.114.96.0/20;
    set_real_ip_from 197.234.240.0/22;
    set_real_ip_from 198.41.128.0/17;
    set_real_ip_from 162.158.0.0/15;
    set_real_ip_from 172.64.0.0/13;
    set_real_ip_from 131.0.72.0/22;
    set_real_ip_from 104.16.0.0/13;
    set_real_ip_from 104.24.0.0/14;
    #set_real_ip_from 2400:cb00::/32;
    #set_real_ip_from 2606:4700::/32;
    #set_real_ip_from 2803:f800::/32;
    #set_real_ip_from 2405:b500::/32;
    #set_real_ip_from 2405:8100::/32;
    #set_real_ip_from 2a06:98c0::/29;
    #set_real_ip_from 2c0f:f248::/32;
    real_ip_header CF-Connecting-IP;
    


    This basically means that if the request is coming from one of those listed Cloudflare IP addresses, overwrite it with value presented in the "CF-Connecting-IP" header.

    The problem is that with Cloudflare Tunnel, it is handling all of the communication between the outside world and Nginx, so Nginx sees all of the traffic coming from 127.0.0.1 and none of those "set_real_ip_from" rules will ever match. I fixed this by adding another "set_real_ip_from 127.0.0.1/0;" line above the final line:

    Code (Text):
    include /usr/local/nginx/conf/cloudflare_customips.conf;
    set_real_ip_from 173.245.48.0/20;
    set_real_ip_from 103.21.244.0/22;
    set_real_ip_from 103.22.200.0/22;
    set_real_ip_from 103.31.4.0/22;
    set_real_ip_from 141.101.64.0/18;
    set_real_ip_from 108.162.192.0/18;
    set_real_ip_from 190.93.240.0/20;
    set_real_ip_from 188.114.96.0/20;
    set_real_ip_from 197.234.240.0/22;
    set_real_ip_from 198.41.128.0/17;
    set_real_ip_from 162.158.0.0/15;
    set_real_ip_from 172.64.0.0/13;
    set_real_ip_from 131.0.72.0/22;
    set_real_ip_from 104.16.0.0/13;
    set_real_ip_from 104.24.0.0/14;
    #set_real_ip_from 2400:cb00::/32;
    #set_real_ip_from 2606:4700::/32;
    #set_real_ip_from 2803:f800::/32;
    #set_real_ip_from 2405:b500::/32;
    #set_real_ip_from 2405:8100::/32;
    #set_real_ip_from 2a06:98c0::/29;
    #set_real_ip_from 2c0f:f248::/32;
    set_real_ip_from 127.0.0.1/0;
    real_ip_header CF-Connecting-IP;
    


    If you don't do this, all of your XenForo users will be reported as coming from 127.0.0.1!

    Another tip: if you are editing your access_log file format, don't forget that Cloudflare includes the country code of requests in headers as well, so these can easily be included in your access_logs! Just include the $http_cf_ipcountry header in your log. Here is a modified cf_custom4 log format example:

    Code (Text):
    log_format cf_custom4 '$remote_addr $http_cf_ipcountry - $remote_user [$time_local] $request '
                 '"$status" $body_bytes_sent "$http_referer" '
                 '"$http_user_agent" "$http_x_forwarded_for" "$gzip_ratio" "$brotli_ratio"'
                 ' "$connection" "$connection_requests" "$request_time" $http_cf_ray '
                 '$ssl_protocol $ssl_cipher $http_content_length $http_content_encoding $request_length';
    


    This makes it just a bit easier to identify bad actors or suspicious requests when checking your logs.


    Finally, a question from me: if all of the sites on a server are using CF Tunnel, can we go ahead and close ports 80 and 443 on the server?
     
  2. eva2000

    eva2000 Administrator Staff Member

    46,851
    10,627
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +16,493
    Local Time:
    8:06 PM
    Nginx 1.19.x
    MariaDB 5.5/10.x
    Probably because most guides for Cloudflare Tunnels are non-service command line based I think. Seems service install for Cloudflared is not the primary focus in their documentation.

    You can set custom settings in the include file at
    /usr/local/nginx/conf/cloudflare_customips.conf which keeps tools/csfcf.sh cronjob from overwriting your own changes. But strangely I haven't had to do that for Wordpress installs like my Centmin Mod blog which runs with Cloudflare Argo Tunnel too at https://blog.centminmod.com/

    oh I noticed in my /usr/local/nginx/conf/cloudflare.conf I am not using
    Code (Text):
    real_ip_header CF-Connecting-IP;
    

    but
    Code (Text):
    real_ip_header X-Forwarded-For;

    so as long as your Nginx vhost uncomments/enables the include file, then it should work
    Code (Text):
      # uncomment cloudflare.conf include if using cloudflare for
      # server and/or vhost site
      include /usr/local/nginx/conf/cloudflare.conf;
    


    Though it makes sense to add - it should be enough to just add it as 127.0.0.1
    Code (Text):
    set_real_ip_from 127.0.0.1;

    to /usr/local/nginx/conf/cloudflare_customips.conf

    Yup just remove port 80 and 443 from CSF Firewall's /etc/csf/csf.conf (backup file before editing) config file for TCP_IN, TCP6_IN comma separated list and then restart CSF Firewall
    Code (Text):
    csf -ra

    DO NOT exist exiting SSH session yet, keep the existing SSH session connected and try visiting your CF Argo Tunnel'd domain in browsers and mobile devices etc. Just be aware this will prevent all Nginx vhosts from port 80, 443 access unless all other Nginx vhosts are setup for Cloudflare Argo Tunnel too - including Centmin Mod Nginx main hostname vhost.
     
  3. deltahf

    deltahf Premium Member Premium Member

    423
    186
    43
    Jun 8, 2014
    Ratings:
    +327
    Local Time:
    6:06 AM
    Hmm. I might have been too quick to praise Tunnel!

    I started getting downtime notifications from UptimeRobot today, but my site loaded fine so I just brushed it off. Then I noticed our real-time traffic report in Google Analytics was about half of what it should be... then the tweets started coming in, asking me how much longer the site would be down? I asked for screenshots from the Twitter users and they were showing a 524 Cloudflare error from the Newark datacenter.

    Yet there were still hundreds of people online in GA Real-Time and the site loaded fine for me and my writer in the UK.

    I am really at a complete loss as to how to troubleshoot this aside from disabling CF Tunnel.

    The site is fine, the server is fine, yet some Cloudflare datacenters (and I don't even have a way to know which ones) are reporting trouble connecting to the server... but all connections are routed through Cloudflare's own network via Tunnel.

    What should I do? At this point I may have to just disable Tunnel.

    EDIT/UPDATE: I fixed this (for now) by restarting the cloudflared service. (Well, technically, "service cloudflared restart" would just hang. I had to "stop" it and then "start" it again to get it working.)

    Upon checking the log at /var/log/cloudflared.log, I could see it was full of errors:

    Code (Text):
    {"level":"error","error":"Unable to reach the origin service. The service may be down or it may not be responding to traffic from cloudflared: context canceled","cfRay":"658b9a9e5c0ef97d-YYZ","ingressRule":"0","originService":"https://www.domain.com:443","time":"2021-06-01T21:41:04Z"}
    {"level":"error","error":"Unable to reach the origin service. The service may be down or it may not be responding to traffic from cloudflared: context canceled","cfRay":"658b9a9ec0da39d2-SEA","ingressRule":"0","originService":"https://www.domain.com:443","time":"2021-06-01T21:41:04Z"}
    {"level":"error","error":"Unable to reach the origin service. The service may be down or it may not be responding to traffic from cloudflared: context canceled","cfRay":"658b9aa19e98595b-IAD","ingressRule":"0","originService":"https://www.domain.com:443","time":"2021-06-01T21:41:05Z"}
    {"level":"error","error":"Unable to reach the origin service. The service may be down or it may not be responding to traffic from cloudflared: context canceled","cfRay":"658b9a9ede47758f-DME","ingressRule":"0","originService":"https://www.domain.com:443","time":"2021-06-01T21:41:05Z"}
    {"level":"error","error":"Unable to reach the origin service. The service may be down or it may not be responding to traffic from cloudflared: context canceled","cfRay":"658b9a9fa9a318e5-FRA","ingressRule":"0","originService":"https://www.domain.com:443","time":"2021-06-01T21:41:05Z"}
    {"level":"error","error":"Unable to reach the origin service. The service may be down or it may not be responding to traffic from cloudflared: context canceled","cfRay":"658b9aa15dfae37a-SEA","ingressRule":"0","originService":"https://www.domain.com:443","time":"2021-06-01T21:41:05Z"}
    


    Of course I have replaced my domain with "domain.com" for this sample.

    These are good clues but I still have no idea why that would happen, or why the site would be completely accessible through some Cloudflare POPs but not from others.

    EDIT/UPDATE #2: A few hours later and the issue is cropping up again. An error from the SJC Cloudflare POP (the same as those included above) has appeared in the log and I just got another alert from UptimeRobot that the site is down. But it's still accessible for some of the others that previously had connectivity issues. It's almost like this is an issue that sort of "spreads", as more and more Cloudflare POPs are unable to access my server.

    I was theorizing that perhaps some kind of single-IP rate limiting in Nginx or CSF is getting triggered somewhere, but if all the traffic is coming from 127.0.0.1, that would not explain why only some Cloudflare POPs lose connectivity. I'm stumped.

    Oh, OK, this is really good to know.

    I actually changed the real_ip_header from "X-Forwarded-For" to "CF-Connecting-IP" while troubleshooting this (it didn't fix the problem).

    I chose to leave it as "CF-Connecting-IP" because of some advice I read here. Basically, X-Forwarded-For can present a list of multiple IP addresses which might cause some problems, while CF-Connecting-IP will only ever have one. I'm not sure it actually would make a difference, though.
     
    Last edited: Jun 2, 2021
  4. eva2000

    eva2000 Administrator Staff Member

    46,851
    10,627
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +16,493
    Local Time:
    8:06 PM
    Nginx 1.19.x
    MariaDB 5.5/10.x
    make sure you did the CSF Firewall stuff outlined at https://blog.centminmod.com/2021/02/09/2250/how-to-setup-cloudflare-argo-tunnel-on-centos-7/
    That might be due to service issue. Does a restart now work without issue? Otherwise, might want to contact Cloudflare tech support.

    But also check Cloudflare Status to see if any outages are listed for Cloudflared/Tunnels

    maybe hitting CF datacenter upgrades Upcoming Datacenter Upgrades
     
    Last edited: Jun 2, 2021
  5. deltahf

    deltahf Premium Member Premium Member

    423
    186
    43
    Jun 8, 2014
    Ratings:
    +327
    Local Time:
    6:06 AM
    OK, good advice!

    I did overlook the CSF firewall stuff as I had already whitelisted a bunch of Cloudflare IPs when I first set up the server for Cloudflare years ago, but I shouldn't have. I have just added those rules and restarted csf now.

    "service cloudflared restart" does also work. I will leave everything in place overnight and monitor it closely over the next few days!
     
  6. deltahf

    deltahf Premium Member Premium Member

    423
    186
    43
    Jun 8, 2014
    Ratings:
    +327
    Local Time:
    6:06 AM
    Unfortunately I woke up to more — and new — errors.

    Along with a few additional "Unable to reach the origin service" errors, I also now see "Lost connection with the edge" errors.

    Code (Text):
    {"level":"info","connIndex":0,"location":"TPA","time":"2021-06-02T05:06:47Z","message":"Connection 84ffb57f-5413-44a4-88d0-4d93f180865f registered"}
    {"level":"info","connIndex":1,"location":"DFW","time":"2021-06-02T05:06:47Z","message":"Connection 82551288-feef-4257-ab21-667c0a5c0020 registered"}
    {"level":"info","connIndex":2,"location":"TPA","time":"2021-06-02T05:06:48Z","message":"Connection c9b9b6ee-93f7-4f97-9339-6654fb38de4c registered"}
    {"level":"info","connIndex":3,"location":"DFW","time":"2021-06-02T05:06:49Z","message":"Connection a496f454-791b-460b-8885-e1e30a670045 registered"}
    {"level":"error","error":"Unable to reach the origin service. The service may be down or it may not be responding to traffic from cloudflared: EOF","cfRay":"658f92413fa9625f-OTP","ingressRule":"0","originService":"https://www.domain.com:443","time":"2021-06-02T09:14:24Z"}
    {"level":"error","error":"Unable to reach the origin service. The service may be down or it may not be responding to traffic from cloudflared: EOF","cfRay":"658f92414d6f627d-OTP","ingressRule":"0","originService":"https://www.domain.com:443","time":"2021-06-02T09:14:24Z"}
    {"level":"error","error":"Unable to reach the origin service. The service may be down or it may not be responding to traffic from cloudflared: EOF","cfRay":"658f9daf6d38a7b0-IST","ingressRule":"0","originService":"https://www.domain.com:443","time":"2021-06-02T09:22:13Z"}
    {"level":"error","error":"Unable to reach the origin service. The service may be down or it may not be responding to traffic from cloudflared: EOF","cfRay":"6590d0dbbb1fa7f8-IST","ingressRule":"0","originService":"https://www.domain.com:443","time":"2021-06-02T12:51:54Z"}
    {"level":"info","connIndex":1,"time":"2021-06-02T14:34:43Z","message":"Lost connection with the edge"}
    {"level":"error","connIndex":1,"error":"connection with edge closed","time":"2021-06-02T14:34:43Z","message":"Serve tunnel error"}
    {"level":"info","connIndex":1,"time":"2021-06-02T14:34:43Z","message":"Retrying connection in up to 1s seconds"}
    {"level":"info","connIndex":1,"time":"2021-06-02T14:34:43Z","message":"Unregistered tunnel connection"}
    {"level":"info","connIndex":1,"location":"DFW","time":"2021-06-02T14:34:43Z","message":"Connection 6f7f3f2b-fd68-4abe-8eb4-e53bbceac633 registered"}
    {"level":"info","connIndex":1,"time":"2021-06-02T14:59:59Z","message":"Lost connection with the edge"}
    {"level":"error","connIndex":1,"error":"connection with edge closed","time":"2021-06-02T14:59:59Z","message":"Serve tunnel error"}
    {"level":"info","connIndex":1,"time":"2021-06-02T14:59:59Z","message":"Unregistered tunnel connection"}
    {"level":"info","connIndex":1,"time":"2021-06-02T14:59:59Z","message":"Retrying connection in up to 1s seconds"}
    {"level":"info","connIndex":1,"location":"DFW","time":"2021-06-02T15:00:00Z","message":"Connection e9c8636c-1c14-44c0-9db9-e68fe885e208 registered"}
    {"level":"error","error":"Unable to reach the origin service. The service may be down or it may not be responding to traffic from cloudflared: EOF","cfRay":"6592393dfe161cd4-BUD","ingressRule":"0","originService":"https://www.domain.com:443","time":"2021-06-02T16:57:56Z"}
    


    Some of these were manifested as "525" errors to my visitors.

    This has become an unexpectedly busy week for my site, so CF Tunnel has cost me a lot of traffic. I prefer to use it but I had no choice but to disable it today.
     
  7. eva2000

    eva2000 Administrator Staff Member

    46,851
    10,627
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +16,493
    Local Time:
    8:06 PM
    Nginx 1.19.x
    MariaDB 5.5/10.x
    Sorry to hear, probably bad timing as CF status page has a few events and datacenter upgrades so maybe related?
     
  8. deltahf

    deltahf Premium Member Premium Member

    423
    186
    43
    Jun 8, 2014
    Ratings:
    +327
    Local Time:
    6:06 AM
    I'm really not sure. I would hope that Cloudflare's infrastructure would be resilient enough to route around any maintenance or downtime at specific POPs.

    I contacted Cloudflare Support with the details and full logs. They were friendly but unhelpful, and requested an HAR file of the 525 or 524 error. Of course, these are generated by the browser, but I never saw any of the error pages myself and I have no way of knowing which POPs will be affected, nor do I have the ability to connect to specific POPs easily, so that's not something I can easily provide or ask affected visitors to produce. And I really don't see how a map of HTTP requests from the POP to the connecting browser would be useful at all, as the problem is clearly inside my own server or Cloudflare's infrastructure.

    Years ago, when I first tried Cloudflare and long before I became so enthusiastic about it (I'm now a public investor in the company, too), I first gave up on it because of disparate reports of 524 errors (Cloudflare can't connect to the origin server) from various users around the world, but never at the same time... Seeking support, I would always hear "oh, your server is probably struggling under load" or "oh, your server is probably losing connectivity", but I knew that was hogwash as I looked at my bare-metal dedicated server with <1.00 load in a top-tier datacenter, blasting out traffic to plenty of other users connecting through other POPs without a hitch... :ROFLMAO:

    It's interesting that now I am getting the exact same problems while experimenting with new Cloudflare technology. I can't help but be suspicious of something in my nginx configuration... the config file is quite old at this point, as I have basically just been patching and editing the same file for probably 10 years or so. But I have looked through it and I don't really see anything that could be causing problems.

    I am going to continue toying with this, running Tunnel on a "tun." subdomain of my main site. I also have a second site, with a newly-generated (last year via Centminmod) vhost.conf, running through the tunnel on the same server. I will be watching the cloudflared.log file to see if it has trouble maintaining its connection to Cloudflare without all the regular traffic running through it.

    I will keep this thread updated with any discoveries.