Back to overview
Degraded

Phone calls are degraded

Nov 11 at 04:58pm PST
Affected services
Vapi API

Resolved
Nov 11 at 05:03pm PST

TL;DR: API gateway rejected Websocket requests

Summary

On November 11, 2024, from 4:22 PM to 5:05 PM PST, our WebSocket-based calls experienced disruption due to a configuration issue in our API gateway. This affected both inbound and outbound phone calls in one of our production clusters.

Impact

  • Duration: 43 minutes
  • Affected services: WebSocket-based phone calls
  • System returned 404 errors for affected connections
  • Service was fully restored by routing traffic to our backup cluster

Root Cause

The incident occurred due to a control plane issue in our API gateway that attempted to reload plugin configurations. Due to an expired authentication token, this reload failed, causing the WebSocket routing system to enter a degraded state.

Timeline

4:22 PM PST - Initial service degradation began
4:53 PM PST - Issue identified through customer reports
5:05 PM PST - Full service restored by failing over to backup cluster

Changes we've implemented

  1. Fixed the underlying control plane issue that triggered unnecessary plugin reloads
  2. Implemented authentication token rotation to prevent credential expiration issues
  3. Enhanced monitoring systems to improve detection of WebSocket routing failures

If you enjoy realtime distributed systems, consider applying: https://jobs.ashbyhq.com/vapi/295f5269-1bb5-4740-81fa-9716adc32ad5

Created
Nov 11 at 04:58pm PST

We're investigating.