Phone calls are degraded
Resolved
Nov 11 at 05:03pm PST
TL;DR: API gateway rejected Websocket requests
Summary
On November 11, 2024, from 4:22 PM to 5:05 PM PST, our WebSocket-based calls experienced disruption due to a configuration issue in our API gateway. This affected both inbound and outbound phone calls in one of our production clusters.
Impact
- Duration: 43 minutes
- Affected services: WebSocket-based phone calls
- System returned 404 errors for affected connections
- Service was fully restored by routing traffic to our backup cluster
Root Cause
The incident occurred due to a control plane issue in our API gateway that attempted to reload plugin configurations. Due to an expired authentication token, this reload failed, causing the WebSocket routing system to enter a degraded state.
Timeline
4:22 PM PST - Initial service degradation began
4:53 PM PST - Issue identified through customer reports
5:05 PM PST - Full service restored by failing over to backup cluster
Changes we've implemented
- Fixed the underlying control plane issue that triggered unnecessary plugin reloads
- Implemented authentication token rotation to prevent credential expiration issues
- Enhanced monitoring systems to improve detection of WebSocket routing failures
If you enjoy realtime distributed systems, consider applying: https://jobs.ashbyhq.com/vapi/295f5269-1bb5-4740-81fa-9716adc32ad5
Affected services
Vapi API
Created
Nov 11 at 04:58pm PST
We're investigating.
Affected services
Vapi API