Back to overview

providerfault-transport errors

May 13 at 10:29am PDT
Affected services
Vapi API
Vapi API [Weekly]

Resolved
May 13 at 10:29am PDT

RCA: Providerfault-transport-never-connected

Summary

During a surge in inbound call traffic, two distinct errors were observed: "vapifault-transport-worker-not-available" and "providerfault-transport-never-connected." This report focuses on the root cause analysis of the "providerfault-transport-never-connected" errors occurring during the increased call volume.

Timeline of Events (PT)

  • 10:26 AM: Significant spike in inbound call volume.
  • 10:26 – 10:40 AM: Intermittent HTTP 520 errors returned by CDN for inbound call endpoints (46 calls impacted).
  • 11:00 AM – 12:00 PM: Infrastructure intermittently failed to establish transport connections despite successfully picking up calls (172 calls impacted).
  • 12:00 PM: Call volume returns to normal; errors cease.

Root Cause Analysis

1. HTTP 520 Errors at CDN

  • High load triggered intermittent HTTP 520 errors for critical endpoints.
  • Internal tracing confirmed successful API responses not properly relayed back, indicating issues in network layers external to core services.
  • Active investigation ongoing with network provider to identify the underlying cause.

2. Resource Exhaustion on Proxy Service

  • During peak load, the proxy service responsible for handling call connections exhausted available CPU and memory resources (observed usage ~1.27 CPU cores and 1.2 GB RAM).
  • Insufficient resource allocation led to failed transport connections.
  • Logs showed degraded pod performance, including failures in auxiliary tasks like recording uploads.

What Went Wrong?

  • Misclassification of Errors: Internally treated as external provider faults rather than recognizing infrastructure capacity issues.
  • Insufficient Monitoring: Lack of alerts and monitoring for proxy resource saturation conditions.
  • Load-Testing Gap: Prior load tests did not replicate proxy resource constraints encountered in production scenarios.