Previous incidents
API is degraded
Resolved Jan 30 at 03:44am PST
TL;DR
The API experienced intermittent downtime due to choked database connections and subsequent call failures caused by the database running out of memory. A forced deployment using direct connections and capacity adjustments restored service.
Timeline
2:09AM: Alerts triggered for API unavailability (503 errors) and frequent pod crashes.
2:40AM: A switch to a backup deployment showed temporary stability, but pods continued to restart and out-of-memory errors began appearing.
3:27AM...
1 previous update
API is down
Resolved Jan 29 at 09:24am PST
TL;DR
A failed deployment by Supabase of their connection pooler, Supavisor, in one region caused all database connections to fail. Since API pods rely on a successful database health check at startup, none could start properly. The workaround was to bypass the pooler and connect directly to the database, restoring service.
Timeline
8:08am PST, Jan 29: Monitoring detects Postgres errors.
8:13am: The provider’s status page reports a failed connection pooler deployment. (Due to subscri...
3 previous updates
Updates to DB are failing
Resolved Jan 21 at 05:23am PST
TL;DR
A configuration error caused the production database to switch to read-only mode, blocking write operations and eventually leading to an API outage. Restarting the database restored service.
Timeline
5:03:04am: A SQL client connected to the production database via the connection pooler, which inadvertently set the database to read-only.
5:05am: Write operations began failing.
5:18am: The API went down due to accumulated errors.
~5:23am: The team initiated a database restart.
5:...
1 previous update
Calls not connecting for `weekly` channel
Resolved Jan 13 at 08:49am PST
TL;DR: Scaler failed and we didn't have enough workers
Root Cause
During a weekly deployment, Redis IP addresses changed. This prevented our scaling system from finding the queue, leaving us stuck at fixed number workers instead of scaling up as needed. We resolved the issue by temporarily moving traffic to our daily environment.
Timeline
Jan 11, 5:12 PM: Deploy started
Jan 13, 6:00 AM: Calls started failing due to scaling issues
Jan 13, 8:45 AM: Resolved by moving traffic to daily
Ja...
1 previous update
OpenAI API is degraded
Resolved Dec 11 at 08:00pm PST
Resolved: https://status.openai.com/
1 previous update