FINAL UPDATE – Post-Mortem Released: Cloudflare has released the detailed post-mortem for the November 18 event. The outage was caused by an internal software error triggered by a database permission change, not a cyberattack[cite: 25, 26]. Below is the technical breakdown of exactly what went wrong.
TL;DR – The Summary
- Start Time: 11:20 UTC – Significant traffic delivery failures began immediately following a database update.
- The Root Cause: A permission change to a ClickHouse database caused a “feature file” (used for Bot Management) to double in size due to duplicate rows[cite: 26, 27, 81].
- The Failure: The file grew beyond a hard-coded limit (200 features) in the new “FL2” proxy engine, causing the Rust-based code to crash (panic)[cite: 190, 191, 194].
- Resolution: 17:06 UTC – All systems fully restored (Main traffic recovered by 14:30 UTC)[cite: 32, 90].
The Technical Details: A “Panic” in the Proxy
The outage was a classic “cascading failure” scenario. Here is the simplified chain of events from the report:
- The Trigger (11:05 UTC): Engineers applied a permission change to a ClickHouse database cluster to improve security. This inadvertently caused a query to return duplicate rows[cite: 160, 172].
- The Bloat: This bad data flowed into a configuration file used by the Bot Management system, causing it to exceed its expected size[cite: 27, 125].
- The Crash: Cloudflare’s proxy software (specifically the FL2 engine written in Rust) had a memory preallocation limit of 200 features. When the bloated file hit this limit, the code triggered a
panic(specificallycalled Result::unwrap() on an Err value), causing the service to fail with HTTP 500 errors[cite: 190, 218, 219]. - The Confusion: To make matters worse, Cloudflare’s external Status Page also went down (returning 504 Gateway Timeouts) due to a coincidence, leading engineers to initially suspect a massive coordinated cyberattack.
Official Timeline (UTC)
| Time (UTC) | Status | Event Description |
|---|---|---|
| 17:06 | Resolved | All services resolved. Remaining long-tail services restarted and full operations restored[cite: 268]. |
| 14:30 | Remediating | Main impact resolved. A known-good configuration file was manually deployed; core traffic began flowing normally [cite: 32, 268]. |
| 13:37 | Identified | Engineers identified the Bot Management file as the trigger and stopped the automatic propagation of the bad file [cite: 268]. |
| 13:05 | Mitigating | A bypass was implemented for Workers KV and Access to route around the failing proxy engine, reducing error rates [cite: 267]. |
| 11:20 | Outage Starts | Network begins experiencing significant failures to deliver core traffic . |
| 11:05 | Trigger | Database access control change deployed[cite: 267]. |
Final Thoughts
Cloudflare’s CEO Matthew Prince was direct in the post-mortem: “We know we let you down today”[cite: 37]. The company has identified the specific code path that failed and is implementing “global kill switches” for features to prevent a single configuration file from taking down the network in the future[cite: 259].
Read the full technical post-mortem: Cloudflare Blog: 18 November 2025 Outage