Back to overview
Degraded

Vapi workers not connecting due to lack of workers

Mar 14 at 07:00am PDT
Affected services
Vapi API [Weekly]

Resolved
Mar 17 at 08:56pm PDT

TL;DR

Weekly Cluster customers saw vapifault-transport-never-connected errors due to workers not scaling fast enough to meet demand

Timeline in PST

  • 7:00am - Customers report an increased number of vapifault-transport-never-connected errors. A degradation incident is posted on BetterStack
  • 7:30am - The issue is resolved as call workers scaled to meet demand

Root Cause

  • Call workers did not scale fast enough on the weekly cluster

Impact

There were 34 instances of vapifault-transport-never-connected errors, meaning there were 34 calls that failed due to the issue.

What went poorly?

  • We were unable to detect the issue before customers did

What went well?

  • The solution was straightforward → Pre-scaling workers on the Weekly Cluster

Remediation

  • Pre scaling workers on all clusters to prevent vapifault errors
  • Increase size of worker nodes to aid in scaling, by allowing more call workers to fit per node
  • Increase sensitivity of pipeline error monitors / Dedicated monitor for vapifault errors

If working on realtime distributed systems excites you, consider applying: https://jobs.ashbyhq.com/vapi/295f5269-1bb5-4740-81fa-9716adc32ad5

Updated
Mar 14 at 01:00pm PDT

We have investigated and resolved this issue by prescaling the impacted cluster to handle a higher volume of traffic. We will update with an RCA.

Updated
Mar 14 at 07:30am PDT

This issue resolved itself as more workers were created. We are investigating further to provide a more long-term remediation and will update.

Created
Mar 14 at 07:00am PDT

Workers did not scale to meet an increase in demand resulting in vapifault-transport-never-connected errors.