Cloudflare Blames Outage on Internal Configuration Error

Recorded: Nov. 19, 2025, 5:03 p.m.

Original

Summarized

Cloudflare Blames Outage on Internal Error TechTarget and Informa Tech’s Digital Business Combine.TechTarget and InformaTechTarget and Informa Tech’s Digital Business Combine.Together, we power an unparalleled network of 220+ online properties covering 10,000+ granular topics, serving an audience of 50+ million professionals with original, objective content from trusted sources. We help you gain critical insights and make more informed decisions across your business priorities.Dark Reading Resource LibraryBlack Hat NewsOmdia CybersecurityAdvertiseNewsletter Sign-UpNewsletter Sign-UpCybersecurity TopicsRelated TopicsApplication SecurityCybersecurity CareersCloud SecurityCyber RiskCyberattacks & Data BreachesCybersecurity AnalyticsCybersecurity OperationsData PrivacyEndpoint SecurityICS/OT SecurityIdentity & Access Mgmt SecurityInsider ThreatsIoTMobile SecurityPerimeterPhysical SecurityRemote WorkforceThreat IntelligenceVulnerabilities & ThreatsRecent in Cybersecurity TopicsCybersecurity OperationsCan a Global, Decentralized System Save CVE Data?Can a Global, Decentralized System Save CVE Data?byRobert Lemos, Contributing WriterNov 18, 20254 Min ReadApplication SecurityMalicious Npm Packages Abuse Adspect Cloaking in Crypto ScamMalicious Npm Packages Abuse Adspect Cloaking in Crypto ScambyElizabeth Montalbano, Contributing WriterNov 18, 20255 Min ReadWorld Related TopicsDR GlobalMiddle East & AfricaAsia PacificRecent in World See AllCyberattacks & Data BreachesCoyote, Maverick Banking Trojans Run Rampant in BrazilCoyote, Maverick Banking Trojans Run Rampant in BrazilbyAlexander CulafiNov 13, 20254 Min ReadThreat IntelligenceSilver Fox APT Blurs the Line Between Espionage & CybercrimeSilver Fox APT Blurs the Line Between Espionage & CybercrimebyNate Nelson, Contributing WriterAug 8, 20253 Min ReadThe EdgeDR TechnologyEventsRelated TopicsUpcoming EventsPodcastsWebinarsSEE ALLResourcesRelated TopicsResource LibraryNewslettersPodcastsReportsVideosWebinarsWhite Papers Partner PerspectivesDark Reading Resource LibraryCyber RiskСloud SecurityCybersecurity OperationsICS/OT SecurityNewsCloudflare Blames Outage on Internal Configuration ErrorInitially though to be a DDoS attack, the incident was actually due to a routine change in permissions that caused widespread software failure.Elizabeth Montalbano, Contributing WriterNovember 19, 20254 Min ReadSource: ZUMA Press Inc. via Alamy Stock PhotoCloudflare blamed an outage that put major websites and services out of commission for several hours Tuesday on an internal configuration error, highlighting once again the issue of third-party interdependence in the cloud computing ecosystem.The incident — which occurred beginning 11:20 UTC on Tuesday and affected sites such as X, Uber, Canva, ChatGPT, among others — was initially thought to be due to a distributed denial of service (DDoS) attack, according to the blog post by Cloudflare founder and CEO Matthew Prince, which broke down the technical aspects of the outage. However, the company discovered the issue was "a change to one of our database systems' permissions, which caused the database to output multiple entries into a 'feature file' used by our Bot Management system," Prince wrote. As a result to this change, the feature file doubled in size and was then propagated to all the machines in Cloudflare's network; the software running on these systems reads the file to keep its Bot Management system up to date with constantly shifting threats. However, there is a limit on the size of feature files, which the new file surpassed, according to Prince. "That caused the software to fail," which in turn led to websites running on Cloudflare to deliver "internal server error" messages to users, he wrote.Related:How CISOs Can Best Work With CEOs and the Board: Lessons From the FieldTechnical DetailsCloudflare first noticed an issue in its network when it discovered that the volume of 5xx error HTTP status codes began to spike well above baseline, and subsequently fluctuate wildly. This demonstrated that the system was failing due to loading the incorrect feature file, Prince explained.What was notable in the system's behavior is that it would repeatedly fail and then recover, which is "very unusual behavior for an internal error," he noted. That's why initially Cloudflare thought it was under DDoS attack.However, what administrators eventually realized is that the misconfigured file was being generated every five minutes by a query running on a ClickHouse database cluster, which was being gradually updated to improve permissions management. "Bad data was only generated if the query ran on a part of the cluster which had been updated," Prince wrote. "As a result, every five minutes there was a chance of either a good or a bad set of configuration files being generated and rapidly propagated across the network."This fluctuation made it unclear what was happening until eventually, "every ClickHouse node was generating the bad configuration file and the fluctuation stabilized in the failing state," he observed. Related:Microsoft Exchange 'Under Imminent Threat,' Act NowOnce the problem was pinpointed, Cloudflare solved the issue by stopping the generation and propagation of the bad feature file and manually inserting a known good file into its distribution queue, and then forcing a restart of its core proxy.Resolution and ApologyCloudflare had mostly resolved the incident by 14:30 UTC, with "core traffic was largely flowing as normal," according to the post; by 17:06 UTC, all systems at Cloudflare were once again functioning as normal.Prince acknowledged the importance of Cloudflare to the Internet ecosystem and apologized for the incident, declaring that "any outage of any of our systems is unacceptable." The company serves as a content delivery network providing security and other services for about 20 percent of all websites on the Internet."That there was a period of time where our network was not able to route traffic is deeply painful to every member of our team," Prince wrote. "We know we let you down today."Indeed, the company has not had an outage that caused the majority of core traffic to stop flowing through its network since 2019, though it has had minor incidents in the period between.Outage Highlights Business ContinuityRelated:Grandparents to C-Suite: Elder Fraud Reveals Gaps in Human-Centered CybersecurityUnfortunately, network outages that cause downtime on critical business websites are still a fairly common occurrence. Last year a buggy CrowdStrike update knocked offline various systems, including payment and airline reservations systems, offline, costing businesses an estimated $5.4 billion and spurring lawsuits against the company for lost revenue and other downstream effects. And on Oct. 20, AWS suffered a major outage, which stemmed from a DNS issue, that affected cloud service customers for much of the day. These incidents call into question once again the vulnerability of organizations' reliance on the Internet running smoothly and without a hitch for their business survival. Indeed, in the era where artificial intelligence (AI), quantum computing, and other advanced technologies are on the rise, the infrastructure that provides the networks powering these technologies are now as critical as the electrical grid or water supply. While the incident was not the result of a cyberattack, it shows the fragility of an Internet ecosystem and demonstrates the need for organizations to understand where interdependence exists. It also emphasizes the continued need for organizations to have business continuity and disaster recovery plans in place to provide back-up for any third-party issues that could disrupt their websites, services or other business activities.Prince said Cloudflare is already working to harden its networks against future failures. These steps include hardening ingestion of Cloudflare-generated configuration files in the same way that would be done for user-generated input; enabling more global kill switches for features; eliminating the ability for core dumps or other error reports to overwhelm system resources; and reviewing failure modes for error conditions across all core proxy modules.About the AuthorElizabeth Montalbano, Contributing WriterElizabeth Montalbano is a freelance writer, journalist, and therapeutic writing mentor with more than 25 years of professional experience. Her areas of expertise include technology, business, and culture. Elizabeth previously lived and worked as a full-time journalist in Phoenix, San Francisco, and New York City; she currently resides in a village on the southwest coast of Portugal. In her free time, she enjoys surfing, hiking with her dogs, traveling, playing music, yoga, and cooking.See more from Elizabeth Montalbano, Contributing WriterMore InsightsIndustry Reports2025 State of Threat Intelligence: What it means for your cybersecurity strategyGartner Innovation Insight: AI SOC AgentsState of AI and Automation in Threat IntelligenceGuide to Network Analysis Visibility SolutionsOrganizations Require a New Approach to Handle Investigation and Response in the CloudAccess More ResearchWebinarsIdentity Security in the Agentic AI EraHow AI & Autonomous Patching Eliminate Exposure RisksSecuring the Hybrid Workforce: Challenges and SolutionsCybersecurity Outlook 2026Threat Hunting Tools & Techniques for Staying Ahead of Cyber AdversariesMore WebinarsYou May Also LikeEditor's ChoiceVulnerabilities & Threats'CitrixBleed 2' Wreaks Havoc as Zero-Day Bug'CitrixBleed 2' Wreaks Havoc as Zero-Day BugbyJai Vijayan, Contributing WriterNov 12, 20255 Min ReadKeep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.SubscribeWebinarsIdentity Security in the Agentic AI EraTues, Dec 9, 2025 at 1pm ESTHow AI & Autonomous Patching Eliminate Exposure RisksOn-DemandSecuring the Hybrid Workforce: Challenges and SolutionsTues, Nov 4, 2025 at 1pm ESTCybersecurity Outlook 2026Virtual Event | December 3rd, 2025 | 11:00am - 5:20pm ET | Doors Open at 10:30am ETThreat Hunting Tools & Techniques for Staying Ahead of Cyber AdversariesTuesday, Oct 21, 2025 at 1pm ESTMore WebinarsWhite PapersMissing 88% of Exploits: Rethinking KEV in the AI EraThe Straightforward Buyer's Guide to EDRThe True Cost of a Cyberattack - 2025 EditionHow to be a Better Threat HunterFrom the C-Suite to the SOC: Consolidating the Network Security SolutionsExplore More White PapersDiscover MoreBlack HatOmdiaWorking With UsAbout UsAdvertiseReprintsJoin UsNewsletter Sign-UpFollow UsCopyright © 2025 TechTarget, Inc. d/b/a Informa TechTarget. This website is owned and operated by Informa TechTarget, part of a global network that informs, influences and connects the world’s technology buyers and sellers. All copyright resides with them. Informa PLC’s registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. TechTarget, Inc.’s registered office is 275 Grove St. Newton, MA 02466.Home|Cookie Policy|Privacy|Terms of Use

Cloudflare’s recent outage, a significant disruption for numerous prominent websites and services, stemmed from an internal configuration error rather than a malicious cyberattack, as initially suspected. The incident, which began on November 19, 2025, highlights critical vulnerabilities within the increasingly complex and interdependent nature of modern cloud computing ecosystems. This summary will delve into the technical details of the disruption, the immediate response by Cloudflare, and the broader implications for businesses reliant on third-party services.

The outage began when a routine change to database permissions within Cloudflare’s network resulted in the propagation of a corrupted “feature file.” This file, utilized by Cloudflare’s Bot Management system, doubled in size due to the erroneous modification. This overabundance of data then triggered a failure within Cloudflare’s system, leading to the widespread delivery of "internal server error" messages to users accessing websites reliant on Cloudflare’s CDN services. The company’s founder and CEO, Matthew Prince, detailed the technical aspects of the failure, emphasizing the unusual behavior of the system – specifically, its repeated failure and subsequent recovery. This unusual recovery pattern initially led Cloudflare to suspect a distributed denial-of-service (DDoS) attack.

The core of the issue lay in the cyclical nature of the configuration change. A query running on a ClickHouse database cluster, responsible for updating permissions management, generated the faulty configuration file every five minutes. As the query only ran on a portion of the cluster that had been updated, it created a situation where the corrupt file was sporadically generated, only to be rapidly disseminated across the entire network. This intermittent generation and propagation solidified the system’s failure, as every node began processing the incorrect configuration. Cloudflare’s response involved promptly stopping the generation and propagation of the corrupted file, manually inserting a known-good file into the distribution queue, and subsequently initiating a restart of its core proxy modules. This rapid intervention, lasting from 11:20 UTC to 14:30 UTC, largely restored normal traffic flow by 17:06 UTC, signifying a return to almost full functionality.

The incident underscores the systemic risks associated with relying on third-party service providers. The immediate impact extended beyond Cloudflare's customer base, affecting sites such as X, Uber, Canva, and ChatGPT, demonstrating the interconnectedness of the internet infrastructure. Furthermore, the event reinforces the importance of business continuity and disaster recovery planning. Organizations that depend on Cloudflare or similar services must proactively anticipate potential disruptions and establish robust plans to mitigate their impact.

The outage also highlights the critical need for organizations to understand the dependencies within their own IT environments. The reliance on services like Cloudflare has created a situation where a single point of failure can trigger a cascading effect, impacting a vast range of websites and applications. This incident provides a valuable lesson in the necessity of thorough risk assessments, diverse service providers, and strategies for seamlessly transitioning to alternative solutions in the event of unforeseen disruptions. As Cloudflare itself acknowledged, this was only the third major outage in the past six years, demonstrating the relatively low frequency of significant incidents, yet emphasizing the severity of their impact.

Finally, the Cloudflare outage serves as a stark reminder that the cloud’s promise of scalability and efficiency comes with inherent vulnerabilities. As technological advancements accelerate – including AI, quantum computing, and increasingly complex network architectures – the resilience of these systems becomes paramount. The company’s commitment to further hardening its networks, including implementing stricter ingestion controls, enabling more global kill switches, and eliminating error report overloads, reflects a proactive approach to mitigating future risks.