Increased call start errors due to Vapi fault transport errors + Twilio timeouts
Resolved
Mar 10 at 07:18pm PDT
RCA: vapifault-transport-never-connected errors caused call failures
Date: 03/10/2025
Summary:
A recent update to our production environment increased the memory usage of one of our core
call-processing services. This led to an unintended triggering of our automated process restart
mechanism, resulting in a brief period of call failures. The issue was resolved by adjusting the
memory threshold for these restarts.
Timeline:
1. 5:50am A few calls start facing issues in starting due to
vapifault-transport-never-connected.
2. 6:40am Call failures start to increase. Partial outage of call starts. Our monitoring picked
it up and paged oncall. Some discord users and customers on slack start reporting
errors.
3. 6:55am - 7:20am Investigated causes for failures. Shifted the calls to a previous cluster,
but calls were still failing.
4. 7:35am We reached a RCA on why the failures were occurring and a fix was scoped
out.
5. 7:58am The hotfix was completely deployed and the failures stopped. The incident was
resolved at this point.
Root Cause:
A recent production update increased the memory requirements of our call-processing service.
As a result, an internal safeguard—designed to restart processes exceeding a set memory
threshold—was activated more frequently than anticipated.
Mediation:
1. Threshold Adjustment: We have increased the memory threshold that triggers a
process restart to better handle higher usage.
2. Enhanced Monitoring: We are implementing additional alerts to detect similar issues
earlier.
3. Process Review: We are further examining our restart protocols to reduce unnecessary
service interruptions during periods of high demand.
Affected services
Vapi API
Updated
Mar 10 at 08:12am PDT
Issue has been patched and we are monitoring the fix. We will be following up with a detailed RCA soon.
Affected services
Vapi API
Created
Mar 10 at 07:09am PDT
We are noticing increased occurrences of 31920 error in Twilio calls. Team in investigating and mitigating the issue.
Affected services
Vapi API