For about eleven minutes, a small number of users saw someone else’s error page. This is what happened and how we caught it.
The unkeyed header
Our CDN was configured to vary cache keys on path and query string, but not on an internal debug header one client library set on retries. A 500 response generated under that header got cached and served to everyone hitting the same path afterward.
Why it was hard to see
The bug only manifested for requests carrying that specific header, which almost nothing in our own testing sent — it came from one third-party SDK. Synthetic monitoring never tripped it.
What changed
We now require every cache-control header change to list the full vary set explicitly in the PR description, and added a canary check that replays real traffic samples against cache config changes before they roll out fleet-wide.