Calls are intermittently ending abruptly
Resolved
Mar 14 at 04:01pm PDT
TL;DR
Calls ended abruptly due to call-workers restarting themselves caused by high memory usage (OOMKilled).
Timeline in PST
- March 13th 3:47am: Issue raised regarding calls ending without a call-ended-reason.
- 1:57pm: High memory usage identified on call-workers exceeding the 2GB limit.
- 3:29pm: Confirmation received that another customer experienced the same issue.
- 4:30pm: Changes implemented to increase memory request and limit on call-workers.
- March 14th 12:27pm: Changes deployed.
Root Cause
Call-workers exceeded Kubernetes-set memory limits, causing containers to restart unexpectedly and terminate ongoing calls. Since call-workers maintain call state internally, calls could not be recovered, leading to abrupt terminations.
Impact
1705 call-workers exceeded the 2GB memory threshold, causing 1705 abrupt call terminations.
What went poorly?
- Issue identified only after user notification.
- The fix required a code change rather than immediate manual intervention, delaying remediation.
- Release complications delayed quick deployment.
- Investigation took 10 hours, and remediation required an additional 3 hours.
What went well?
- Effective communication allowed identification and planning of the fix once the issue was understood.
Remediation
- Increase memory requests and limits on call-workers.
- Implement monitoring for call-worker memory usage exceeding limits.
- Implement monitoring for call-worker container restarts.
If working on realtime distributed systems excites you, consider applying: https://jobs.ashbyhq.com/vapi/295f5269-1bb5-4740-81fa-9716adc32ad5
Affected services
Vapi API
Vapi API [Weekly]
Created
Mar 14 at 12:11pm PDT
We are currently experiencing higher memory usage in our call workers which may be causing calls to end abruptly.
Our team is actively investigating and working to resolve the issue promptly.
We apologize for any inconvenience this may cause and appreciate your patience. Further updates will be provided by 2pm PST.
Affected services
Vapi API
Vapi API [Weekly]