Join the community today
Register Now

Security Incident report on memory leak caused by Cloudflare parser bug

Discussion in 'All Internet & Web Performance News' started by pamamolf, Feb 24, 2017.

Tags:
  1. pamamolf

    pamamolf Premium Member Premium Member

    4,087
    428
    83
    May 31, 2014
    Ratings:
    +834
    Local Time:
    8:12 AM
    Nginx-1.25.x
    MariaDB 10.3.x
    Hello :)

    It seems a security day today :)


    Last Friday,
    Tavis Ormandy from Google’s Project Zero contacted Cloudflare to report a security problem with our edge servers. He was seeing corrupted web pages being returned by some HTTP requests run through Cloudflare.

    It turned out that in some unusual circumstances, which I’ll detail below, our edge servers were running past the end of a buffer and returning memory that contained private information such as HTTP cookies, authentication tokens, HTTP POST bodies, and other sensitive data. And some of that data had been cached by search engines.

    For the avoidance of doubt, Cloudflare customer SSL private keys were not leaked. Cloudflare has always terminated SSL connections through an isolated instance of NGINX that was not affected by this bug.

    We quickly identified the problem and turned off three minor Cloudflare features (email obfuscation, Server-side Excludes and Automatic HTTPS Rewrites) that were all using the same HTML parser chain that was causing the leakage. At that point it was no longer possible for memory to be returned in an HTTP response.


    Because of the seriousness of such a bug, a cross-functional team from software engineering, infosec and operations formed in San Francisco and London to fully understand the underlying cause, to understand the effect of the memory leakage, and to work with Google and other search engines to remove any cached HTTP responses.

    Having a global team meant that, at 12 hour intervals, work was handed over between offices enabling staff to work on the problem 24 hours a day. The team has worked continuously to ensure that this bug and its consequences are fully dealt with. One of the advantages of being a service is that bugs can go from reported to fixed in minutes to hours instead of months. The industry standard time allowed to deploy a fix for a bug like this is usually three months; we were completely finished globally in under 7 hours with an initial mitigation in 47 minutes.

    The bug was serious because the leaked memory could contain private information and because it had been cached by search engines. We are disclosing this problem now as we are satisfied that search engine caches have now been cleared of sensitive information. We have also not discovered any evidence of malicious exploits of the bug or other reports of its existence.

    The greatest period of impact was from February 13 and February 18 with around 1 in every 3,300,000 HTTP requests through Cloudflare potentially resulting in memory leakage (that’s about 0.00003% of requests).

    We are grateful that it was found by one of the world’s top security research teams and reported to us.

    This blog post is rather long but, as is our tradition, we prefer to be open and technically detailed about problems that occur with our service.

    Parsing and modifying HTML on the fly
    Many of Cloudflare’s services rely on parsing and modifying HTML pages as they pass through our edge servers. For example, we can insert the Google Analytics tag, safely rewrite http:// links to https://, exclude parts of a page from bad bots, obfuscate email addresses, enable AMP, and more by modifying the HTML of a page.

    To modify the page, we need to read and parse the HTML to find elements that need changing. Since the very early days of Cloudflare, we’ve used a parser written using Ragel. A single .rl file contains an HTML parser used for all the on-the-fly HTML modifications that Cloudflare performs.

    About a year ago we decided that the Ragel parser had become too complex to maintain and we started to write a new parser, named cf-html, to replace it. This streaming parser works correctly with HTML5 and is much, much faster and easier to maintain.

    We first used this new parser for the Automatic HTTP Rewrites feature and have been slowly migrating functionality that uses the old Ragel parser to cf-html.

    Both cf-html and the old Ragel parser are implemented as NGINX modules compiled into our NGINX builds. These NGINX filter modules parse buffers (blocks of memory) containing HTML responses, make modifications as necessary, and pass the buffers onto the next filter.

    It turned out that the underlying bug that caused the memory leak had been present in our Ragel-based parser for many years but no memory was leaked because of the way the internal NGINX buffers were used. Introducing cf-html subtly changed the buffering which enabled the leakage even though there were no problems in cf-html itself.

    Once we knew that the bug was being caused by the activation of cf-html (but before we knew why) we disabled the three features that caused it to be used. Every feature Cloudflare ships has a corresponding feature flag, which we call a ‘global kill’. We activated the Email Obfuscation global kill 47 minutes after receiving details of the problem and the Automatic HTTPS Rewrites global kill 3h05m later. The Email Obfuscation feature had been changed on February 13 and was the primary cause of the leaked memory, thus disabling it quickly stopped almost all memory leaks.

    Within a few seconds, those features were disabled worldwide. We confirmed we were not seeing memory leakage via test URIs and had Google double check that they saw the same thing.

    We then discovered that a third feature, Server-Side Excludes, was also vulnerable and did not have a global kill switch (it was so old it preceded the implementation of global kills). We implemented a global kill for Server-Side Excludes and deployed a patch to our fleet worldwide. From realizing Server-Side Excludes were a problem to deploying a patch took roughly three hours. However, Server-Side Excludes are rarely used and only activated for malicious IP addresses.


    More info at:

    Incident report on memory leak caused by Cloudflare parser bug
     
  2. eva2000

    eva2000 Administrator Staff Member

    54,927
    12,240
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,811
    Local Time:
    4:12 PM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    ouch very serious bug ! Thanks for the heads up !
     
  3. eva2000

    eva2000 Administrator Staff Member

    54,927
    12,240
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,811
    Local Time:
    4:12 PM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    More info at 1139 - cloudflare: Cloudflare Reverse Proxies are Dumping Uninitialized Memory - project-zero - Monorail

    more serious than cloudflare states
     
    Last edited: Feb 24, 2017
  4. eva2000

    eva2000 Administrator Staff Member

    54,927
    12,240
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,811
    Local Time:
    4:12 PM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    Cloudflare Data Leak: How to Secure Your Site

     
  5. eva2000

    eva2000 Administrator Staff Member

    54,927
    12,240
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,811
    Local Time:
    4:12 PM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    Also GitHub - pirate/sites-using-cloudflare: List of domains using Cloudflare (potentially affected by the CloudBleed HTTPS traffic leak)

     
  6. eva2000

    eva2000 Administrator Staff Member

    54,927
    12,240
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,811
    Local Time:
    4:12 PM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    http://www.bbc.com/news/technology-39077611

     
  7. eva2000

    eva2000 Administrator Staff Member

    54,927
    12,240
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,811
    Local Time:
    4:12 PM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    Cloudflare's 'Cloudbleed' Bug Exposes Sensitive Data: Who Is Affected?

    who to believe :)
     
  8. eva2000

    eva2000 Administrator Staff Member

    54,927
    12,240
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,811
    Local Time:
    4:12 PM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    http://gizmodo.com/everything-you-need-to-know-about-cloudbleed-the-lates-1792710616

     
  9. Sametto Chan

    Sametto Chan Member

    37
    11
    8
    Feb 26, 2017
    Kepler-186F
    Ratings:
    +13
    Local Time:
    3:12 PM
    10.2.9
    I just changed for API Key and CloudFlare account.
     
  10. eva2000

    eva2000 Administrator Staff Member

    54,927
    12,240
    113
    May 24, 2014
    Brisbane, Australia
    Ratings:
    +18,811
    Local Time:
    4:12 PM
    Nginx 1.27.x
    MariaDB 10.x/11.4+
    Quantifying the Impact of "Cloudbleed"