Vapi workers not connecting due to lack of workers
Resolved
Mar 17 at 08:56pm PDT
TL;DR
Weekly Cluster customers saw vapifault-transport-never-connected errors due to workers not scaling fast enough to meet demand
Timeline in PST
- 7:00am - Customers report an increased number of vapifault-transport-never-connected errors. A degradation incident is posted on BetterStack
- 7:30am - The issue is resolved as call workers scaled to meet demand
Root Cause
- Call workers did not scale fast enough on the weekly cluster
Impact
There were 34 instances of vapifault-transport-never-connected errors, meaning there were 34 calls that failed due to the issue.
What went poorly?
- We were unable to detect the issue before customers did
What went well?
- The solution was straightforward → Pre-scaling workers on the Weekly Cluster
Remediation
- Pre scaling workers on all clusters to prevent vapifault errors
- Increase size of worker nodes to aid in scaling, by allowing more call workers to fit per node
- Increase sensitivity of pipeline error monitors / Dedicated monitor for vapifault errors
If working on realtime distributed systems excites you, consider applying: https://jobs.ashbyhq.com/vapi/295f5269-1bb5-4740-81fa-9716adc32ad5
Affected services
Vapi API [Weekly]
Updated
Mar 14 at 01:00pm PDT
We have investigated and resolved this issue by prescaling the impacted cluster to handle a higher volume of traffic. We will update with an RCA.
Affected services
Vapi API [Weekly]
Updated
Mar 14 at 07:30am PDT
This issue resolved itself as more workers were created. We are investigating further to provide a more long-term remediation and will update.
Affected services
Vapi API [Weekly]
Created
Mar 14 at 07:00am PDT
Workers did not scale to meet an increase in demand resulting in vapifault-transport-never-connected errors.
Affected services
Vapi API [Weekly]