Back to overview
Degraded

API degradation

May 02 at 05:44pm PDT
Affected services
Vapi API
Vapi API [Weekly]

Resolved
May 02 at 06:43pm PDT

RCA for May 2nd User error in manual rollout

Root cause:

  • User error in kicking off a manual rollout, driven by unblocking a release
  • Due to this, load balancer was pointed at an invalid backend cluster

Timeline

  • 5:24pm PT: Engineer flagged blocked rollout, Infra engineer identified transient error that auto-blocked rollout
  • 5:31pm PT: Infra engineer triggered manual rollout on behalf of engineer, to unblock release
  • 5:43pm PT: On-call was paged with issue in rollout manager, engineering team internally escalated downtime
  • 5:45pm PT: Infra engineer fixed misconfigured rollout and confirmed load balancer was correctly pointed
  • 5:50pm PT: Engineering team manually tested API and calls were working again

Impact

  • Calls, API and dashboard were down or degraded for up to 15 minutes
  • User experience was disrupted temporarily; Issue reported internally and by self-serve users

What went wrong?

  • We rushed through a manual rollout, which is gated to Infra team
  • Manual rollout tools did not catch user error

What went well?

  • Our pagers flagged this issue
  • Team responded quickly and was able to mitigate
  • Status page was put up proactively

Action Items:

  • Update manual deployment tools to avoid such user error [Done]
  • Expand rollout auto-blocking mechanism to incorporate other pages [Done]
  • Better documentation for rollout/rollback steps
  • Further lock down manual deployment, gate behind approval by 1 more infra eng

Updated
May 02 at 05:54pm PDT

We identified the root cause of the issue in a bad deployment. The team rolled out a fix. API is fully operational again.

Created
May 02 at 05:44pm PDT

Some API endpoints may be unavailable. Team is working on implementing a fix.