Resolved
Mar 17 at 08:56pm PDT

TL;DR

Weekly Cluster customers saw vapifault-transport-never-connected errors due to workers not scaling fast enough to meet demand

Timeline in PST

7:00am - Customers report an increased number of vapifault-transport-never-connected errors. A degradation incident is posted on BetterStack
7:30am - The issue is resolved as call workers scaled to meet demand

There were 34 instances of vapifault-transport-never-connected errors, meaning there were 34 calls that failed due to the issue.

Pre scaling workers on all clusters to prevent vapifault errors
Increase size of worker nodes to aid in scaling, by allowing more call workers to fit per node
Increase sensitivity of pipeline error monitors / Dedicated monitor for vapifault errors

If working on realtime distributed systems excites you, consider applying: https://jobs.ashbyhq.com/vapi/295f5269-1bb5-4740-81fa-9716adc32ad5

Updated
Mar 14 at 01:00pm PDT

We have investigated and resolved this issue by prescaling the impacted cluster to handle a higher volume of traffic. We will update with an RCA.

Updated
Mar 14 at 07:30am PDT

This issue resolved itself as more workers were created. We are investigating further to provide a more long-term remediation and will update.

Created
Mar 14 at 07:00am PDT

Workers did not scale to meet an increase in demand resulting in vapifault-transport-never-connected errors.