PJFP.com

Pursuit of Joy, Fulfillment, and Purpose

Cloudflare Down November 18 2025: Massive Global Outage Takes X (Twitter), ChatGPT, Discord, Spotify, League of Legends & Thousands of Websites Offline

FINAL UPDATE – Post-Mortem Released: Cloudflare has released the detailed post-mortem for the November 18 event. The outage was caused by an internal software error triggered by a database permission change, not a cyberattack[cite: 25, 26]. Below is the technical breakdown of exactly what went wrong.


TL;DR – The Summary

  • Start Time: 11:20 UTC – Significant traffic delivery failures began immediately following a database update.
  • The Root Cause: A permission change to a ClickHouse database caused a “feature file” (used for Bot Management) to double in size due to duplicate rows[cite: 26, 27, 81].
  • The Failure: The file grew beyond a hard-coded limit (200 features) in the new “FL2” proxy engine, causing the Rust-based code to crash (panic)[cite: 190, 191, 194].
  • Resolution: 17:06 UTC – All systems fully restored (Main traffic recovered by 14:30 UTC)[cite: 32, 90].

The Technical Details: A “Panic” in the Proxy

The outage was a classic “cascading failure” scenario. Here is the simplified chain of events from the report:

  • The Trigger (11:05 UTC): Engineers applied a permission change to a ClickHouse database cluster to improve security. This inadvertently caused a query to return duplicate rows[cite: 160, 172].
  • The Bloat: This bad data flowed into a configuration file used by the Bot Management system, causing it to exceed its expected size[cite: 27, 125].
  • The Crash: Cloudflare’s proxy software (specifically the FL2 engine written in Rust) had a memory preallocation limit of 200 features. When the bloated file hit this limit, the code triggered a panic (specifically called Result::unwrap() on an Err value), causing the service to fail with HTTP 500 errors[cite: 190, 218, 219].
  • The Confusion: To make matters worse, Cloudflare’s external Status Page also went down (returning 504 Gateway Timeouts) due to a coincidence, leading engineers to initially suspect a massive coordinated cyberattack.

Official Timeline (UTC)

Time (UTC) Status Event Description
17:06 Resolved All services resolved. Remaining long-tail services restarted and full operations restored[cite: 268].
14:30 Remediating Main impact resolved. A known-good configuration file was manually deployed; core traffic began flowing normally [cite: 32, 268].
13:37 Identified Engineers identified the Bot Management file as the trigger and stopped the automatic propagation of the bad file [cite: 268].
13:05 Mitigating A bypass was implemented for Workers KV and Access to route around the failing proxy engine, reducing error rates [cite: 267].
11:20 Outage Starts Network begins experiencing significant failures to deliver core traffic .
11:05 Trigger Database access control change deployed[cite: 267].

Final Thoughts

Cloudflare’s CEO Matthew Prince was direct in the post-mortem: “We know we let you down today”[cite: 37]. The company has identified the specific code path that failed and is implementing “global kill switches” for features to prevent a single configuration file from taking down the network in the future[cite: 259].

Read the full technical post-mortem: Cloudflare Blog: 18 November 2025 Outage