API is down
Resolved
Nov 07 at 06:11pm PST
Misconfiguration on networking cluster. Resolved now.
Here's what happened:
Summary
On November 7, 2024, from 5:59 PM to 6:10 PM PT, our API service experienced an outage due to an unintended configuration change. During this period, new API calls were unable to initiate, though existing connections remained largely unaffected.
Impact
- Duration: 11 minutes
- Service returned 521 errors for new inbound API calls
- Existing API calls remained stable
- Service was fully restored at 6:10 PM PT
Root Cause
The incident occurred when a configuration intended for our staging environment was accidentally applied to production during a routine debugging session. This resulted in the deletion of a critical API gateway configuration.
Timeline
- 5:59 PM PT - Accidental deletion of production configuration during staging environment debugging
- 6:00 PM PT - Monitoring systems detected service degradation
- 6:08 PM PT - Engineering team identified root cause
- 6:09 PM PT - Fix deployed (configuration restored)
- 6:10 PM PT - Full service recovery confirmed
Changes we've implemented
- Changing namespace to include cluster name.
networking
>networking-staging
andnetworking-production
. This forces you to specify the environment while running kubectl commands. - Preventing deletion of resources that would never be expected to be deleted using Kubernetes deletion webhook.
If working on realtime distributed systems excites you, consider applying: https://jobs.ashbyhq.com/vapi/295f5269-1bb5-4740-81fa-9716adc32ad5
Affected services
Vapi API
Created
Nov 07 at 06:09pm PST
API is down. We're investigating. Updates to follow.
Affected services
Vapi API