Back to overview
Downtime

API is down

Nov 07 at 06:09pm PST
Affected services
Vapi API

Resolved
Nov 07 at 06:11pm PST

Misconfiguration on networking cluster. Resolved now.

Here's what happened:

Summary

On November 7, 2024, from 5:59 PM to 6:10 PM PT, our API service experienced an outage due to an unintended configuration change. During this period, new API calls were unable to initiate, though existing connections remained largely unaffected.

Impact

  • Duration: 11 minutes
  • Service returned 521 errors for new inbound API calls
  • Existing API calls remained stable
  • Service was fully restored at 6:10 PM PT

Root Cause

The incident occurred when a configuration intended for our staging environment was accidentally applied to production during a routine debugging session. This resulted in the deletion of a critical API gateway configuration.

Timeline

  • 5:59 PM PT - Accidental deletion of production configuration during staging environment debugging
  • 6:00 PM PT - Monitoring systems detected service degradation
  • 6:08 PM PT - Engineering team identified root cause
  • 6:09 PM PT - Fix deployed (configuration restored)
  • 6:10 PM PT - Full service recovery confirmed

Changes we've implemented

  1. Changing namespace to include cluster name. networking > networking-staging and networking-production. This forces you to specify the environment while running kubectl commands.
  2. Preventing deletion of resources that would never be expected to be deleted using Kubernetes deletion webhook.

If working on realtime distributed systems excites you, consider applying: https://jobs.ashbyhq.com/vapi/295f5269-1bb5-4740-81fa-9716adc32ad5

Created
Nov 07 at 06:09pm PST

API is down. We're investigating. Updates to follow.