Back to overview
Degraded

We are seeing degraded service from Deepgram

Mar 11 at 12:30am PDT
Affected services
Deepgram

Resolved
Mar 11 at 12:59am PDT

TL;DR

An application-level bug was leaked into production, causing a spike in pipeline-error-deepgram-returning-502-network-error errors. This resulted in roughly 1.48K failed calls.

Timeline in PST

  • 12:03am - Rollout to prod1 containing the offending change is started
  • 12:13am - Rollout to prod1 is complete
  • 12:25am - A huddle in #eng-scale is started
  • 12:43am - Rollback to prod3 is started
  • 12:55am - Rollback to prod3 is complete

Root Cause

  • An application-level bug related to the Deepgram Numerals setting caused WebSocket connections to return a non-101 status code. This was masked as a pipeline-error-deepgram-returning-502-network-error error, initially leading us to believe it was a Deepgram issue.

Impact

There were 1.48K pipeline-error-deepgram-returning-502-network-error errors, meaning there were 1.48K calls that failed due to this issue.

What went poorly?

  • The monitor did not fire early enough to trigger the Canary Manager’s rollback
  • We did not roll back immediately upon noticing the correlation between the error-count increase and the start of the canary rollout
  • We were misled by the error name

What went well?

  • The monitor caught the issue and alerted us shortly after rollout completion
  • Multiple team members responded promptly, initiating a huddle in #eng-scale

Remediation

  • Increase sensitivity of pipeline error monitor
  • Investigate and resolve the application bug
  • Refactor Deepgram error categorization to clearly indicate non-Deepgram related issues
  • Refactor Canary Manager to use direct DD metrics instead of relying on monitor alerts

If working on realtime distributed systems excites you, consider applying: https://jobs.ashbyhq.com/vapi/295f5269-1bb5-4740-81fa-9716adc32ad5

Created
Mar 11 at 12:30am PDT

Assistants which use Deepgram for transcription are unresponsive, consider using another transcription model.