Back to overview
Degraded
API degradation
May 02 at 05:44pm PDT
Affected services
Vapi API
Vapi API [Weekly]
Resolved
May 02 at 06:43pm PDT
RCA for May 2nd User error in manual rollout
Root cause:
- User error in kicking off a manual rollout, driven by unblocking a release
- Due to this, load balancer was pointed at an invalid backend cluster
Timeline
- 5:24pm PT: Engineer flagged blocked rollout, Infra engineer identified transient error that auto-blocked rollout
- 5:31pm PT: Infra engineer triggered manual rollout on behalf of engineer, to unblock release
- 5:43pm PT: On-call was paged with issue in rollout manager, engineering team internally escalated downtime
- 5:45pm PT: Infra engineer fixed misconfigured rollout and confirmed load balancer was correctly pointed
- 5:50pm PT: Engineering team manually tested API and calls were working again
Impact
- Calls, API and dashboard were down or degraded for up to 15 minutes
- User experience was disrupted temporarily; Issue reported internally and by self-serve users
What went wrong?
- We rushed through a manual rollout, which is gated to Infra team
- Manual rollout tools did not catch user error
What went well?
- Our pagers flagged this issue
- Team responded quickly and was able to mitigate
- Status page was put up proactively
Action Items:
- Update manual deployment tools to avoid such user error [Done]
- Expand rollout auto-blocking mechanism to incorporate other pages [Done]
- Better documentation for rollout/rollback steps
- Further lock down manual deployment, gate behind approval by 1 more infra eng
Affected services
Vapi API
Vapi API [Weekly]
Updated
May 02 at 05:54pm PDT
We identified the root cause of the issue in a bad deployment. The team rolled out a fix. API is fully operational again.
Affected services
Vapi API
Vapi API [Weekly]
Created
May 02 at 05:44pm PDT
Some API endpoints may be unavailable. Team is working on implementing a fix.
Affected services
Vapi API
Vapi API [Weekly]