Back to overview
Degraded
API degradation
May 3, 2025 at 12:44am UTC
Affected services
Vapi API
Vapi API [Weekly]
Resolved
May 3, 2025 at 1:43am UTC
RCA for May 2nd User error in manual rollout
Root cause:
- User error in kicking off a manual rollout, driven by unblocking a release
- Due to this, load balancer was pointed at an invalid backend cluster
Timeline
- 5:24pm PT: Engineer flagged blocked rollout, Infra engineer identified transient error that auto-blocked rollout
- 5:31pm PT: Infra engineer triggered manual rollout on behalf of engineer, to unblock release
- 5:43pm PT: On-call was paged with issue in rollout manager, engineering team internally escalated downtime
- 5:45pm PT: Infra engineer fixed misconfigured rollout and confirmed load balancer was correctly pointed
- 5:50pm PT: Engineering team manually tested API and calls were working again
Impact
- Calls, API and dashboard were down or degraded for up to 15 minutes
- User experience was disrupted temporarily; Issue reported internally and by self-serve users
What went wrong?
- We rushed through a manual rollout, which is gated to Infra team
- Manual rollout tools did not catch user error
What went well?
- Our pagers flagged this issue
- Team responded quickly and was able to mitigate
- Status page was put up proactively
Action Items:
- Update manual deployment tools to avoid such user error [Done]
- Expand rollout auto-blocking mechanism to incorporate other pages [Done]
- Better documentation for rollout/rollback steps
- Further lock down manual deployment, gate behind approval by 1 more infra eng
Affected services
Updated
May 3, 2025 at 12:54am UTC
We identified the root cause of the issue in a bad deployment. The team rolled out a fix. API is fully operational again.
Affected services
Created
May 3, 2025 at 12:44am UTC
Some API endpoints may be unavailable. Team is working on implementing a fix.
Affected services