Back to overview
Degraded
We are seeing degraded service from Deepgram
Mar 11 at 12:30am PDT
Affected services
Deepgram
Resolved
Mar 11 at 12:59am PDT
TL;DR
An application-level bug was leaked into production, causing a spike in pipeline-error-deepgram-returning-502-network-error errors. This resulted in roughly 1.48K failed calls.
Timeline in PST
- 12:03am - Rollout to prod1 containing the offending change is started
- 12:13am - Rollout to prod1 is complete
- 12:25am - A huddle in #eng-scale is started
- 12:43am - Rollback to prod3 is started
- 12:55am - Rollback to prod3 is complete
Root Cause
- An application-level bug related to the Deepgram Numerals setting caused WebSocket connections to return a non-101 status code. This was masked as a pipeline-error-deepgram-returning-502-network-error error, initially leading us to believe it was a Deepgram issue.
Impact
There were 1.48K pipeline-error-deepgram-returning-502-network-error errors, meaning there were 1.48K calls that failed due to this issue.
What went poorly?
- The monitor did not fire early enough to trigger the Canary Manager’s rollback
- We did not roll back immediately upon noticing the correlation between the error-count increase and the start of the canary rollout
- We were misled by the error name
What went well?
- The monitor caught the issue and alerted us shortly after rollout completion
- Multiple team members responded promptly, initiating a huddle in #eng-scale
Remediation
- Increase sensitivity of pipeline error monitor
- Investigate and resolve the application bug
- Refactor Deepgram error categorization to clearly indicate non-Deepgram related issues
- Refactor Canary Manager to use direct DD metrics instead of relying on monitor alerts
If working on realtime distributed systems excites you, consider applying: https://jobs.ashbyhq.com/vapi/295f5269-1bb5-4740-81fa-9716adc32ad5
Affected services
Deepgram
Created
Mar 11 at 12:30am PDT
Assistants which use Deepgram for transcription are unresponsive, consider using another transcription model.
Affected services
Deepgram