Back to overview
Downtime

Voice calls dropped and dashboard unavailable

May 21, 2026 at 1:55pm UTC
Affected services
Vapi API
Vapi API [Weekly]
Vapi Dashboard

Resolved
May 22, 2026 at 3:03am UTC

Incident Report: Database Outage (Log Collector Misconfiguration)

What Happened

Vapi experienced a large service outage causing voice calls to fail and the dashboard to become unavailable. This was caused by a failure in an audit log collector in the Vapi production database. The triggering event was an apply_config that our database provider executed at 6:44 AM PST. A misconfiguration in the project-wide telemetry settings caused Postgres processes to become stuck writing to syslog when accepting new connections, exhausting the connection pool and rendering the database unable to accept traffic, including from within the pod itself.

We notified our provider's support line at 8:10 AM PST. The root cause was identified at 10:03 AM PST by our database provider. Mitigation was applied by disabling the OTEL connection and restarting the endpoint, after which the system returned to a normal state. A fix to the audit collectors was subsequently published and confirmed stable.

Customer Impact

  • Service availability: Large outage. Vapi's voice services were unavailable during the incident window, affecting 100% of customers from 7:12 AM PT until 11:49 AM PT when the incident was marked as resolved.
  • The Vapi dashboard was also unavailable during that time.

Timeline (PST)

Time Event
6:44 AM Our database provider executes an apply_config change, triggering the incident.
7:12 AM Vapi begins to observe call degradation.
7:22 AM The team begins its investigation.
7:43 AM Vapi updates the status page to notify customers of observed degradation.
7:53 AM Vapi updates the status page to confirm a full outage of both voice calls and the dashboard.
8:10 AM Vapi suspects production database behavior as the source of the problem and notifies the database provider's support team. Initial investigation begins; a large spike in waiting-status connections is observed.
8:16 AM Internal escalation with the database provider for increased urgency.
8:29 AM Vapi confirms production database behavior as the source of the problem on the status page and continues to collaborate with the database provider on mitigation.
9:35–10:00 AM A brief recovery is observed after restarting the database, but degradation reappears after services are scaled back up.
10:03 AM Database provider identifies the root cause: Postgres processes stuck on syslog writes during connection acceptance, caused by a misconfigured project-wide telemetry_setting for log collectors.
10:03 AM+ OTEL connection disabled; endpoint restarted; system returns to normal state.
10:38 AM Vapi increases traffic back on the weekly environment and confirms that service is restored.
11:09 AM Vapi increases traffic back on the daily environment, confirms service is restored, and moves to a monitoring stage.
11:49 AM Vapi marks the incident as resolved.

Updated
May 21, 2026 at 6:32pm UTC

Service has returned to normal operating levels. Call success metrics have recovered and remained stable for 30 minutes across both daily and weekly channels, and all platform functionality has been restored. We’re continuing to monitor closely and will provide further updates if anything changes. We will update the status page with an incident report within 12 hours. Thank you for your patience.

Updated
May 21, 2026 at 6:09pm UTC

Services are recovering across both our weekly and daily clusters, and all metrics are trending positive. Our DB provider has identified and confirmed the root cause and we have applied an initial remediation. Our DB provider team remains actively engaged with us as we scale load back up. We are continuing to monitor closely and will provide updates as we have them.

Updated
May 21, 2026 at 6:06pm UTC

Our weekly cluster is still showing recovery and calls are going through. We are still monitoring the situation. We have shifted our focus to our daily cluster.

Updated
May 21, 2026 at 5:41pm UTC

Our weekly cluster has seen recovery over the last 20 minutes. We are still monitoring the situation as there is a possibility for calls to fail again.
We are moving to fixing calls in our daily clusters now.

We will post updates when we have new information to share or in 30 minutes.

Updated
May 21, 2026 at 5:13pm UTC

Our daily cluster is still experiencing a full outage. Weekly is seeing some recovery.

Updated
May 21, 2026 at 5:11pm UTC

We are still in a degraded state on weekly and working on fully resolving the issue.
Our daily cluster is still out.

Updated
May 21, 2026 at 4:41pm UTC

We are seeing some recovery in Voice Calls and Dashboard calls and are continuing to monitor the situation. We will post updates as we have news or in 30 minutes.

Updated
May 21, 2026 at 4:29pm UTC

Our DB provider has escalated to the highest level. Their most senior architect is now directly involved in identifying the fix. We are collaborating closely on resolution.
We will post an update as we have news or in 30 minutes.

Updated
May 21, 2026 at 3:59pm UTC

Our DB provider confirmed the config change which we have identified as the cause for our DB outage, which causes voice calls to drop and our dashboard to not load.
We are collaborating with our provider on an eventual fix or workaround. They have escalated this issue to the highest level of urgency on their side.
We will post an update as we have news or in 30 minutes.

Updated
May 21, 2026 at 3:27pm UTC

We are still investigating a complete outage in Voice Calls. Our DB provider applied a configuration change at 6:44am which is causing our DB to be completely unavailable. We are working with them to get our DBs back up.

We do not have an ETA or resolution yet, however our provider has escalated the issue internally.

We will post an update as we learn more or in 30 minutes.

Created
May 21, 2026 at 1:55pm UTC

We are investigating an incident causing voice calls dropped. We will publish updates as we get more information or in 30 minutes.