API is degraded
Resolved
Oct 09 at 09:24am PDT
We're back.
RCA:
* At 9:15am PT: We were alerted by a big spike in request aborted
.
* By 9:20am: We identified the root cause was head of line blocking on the API pods (some requests were taking too long, blocking other requests)
* By 9:25am: We scaled and restarted the api pods. Everything reverted to normal.
Action Items:
* We'll be setting a hard query timeout and returning timeout on ones that exceed. Eg. GET /assistant?limit=1000. (statementtimeout)
* We'll be making API pods aware of the health of their own DB connection, so it can restart gracefully.
* We'll be lowering how long each API pod can hold a DB connection so it can't monopolize time (idletimeout).
Affected services
Vapi API
Created
Oct 09 at 09:18am PDT
We're investigating.
Affected services
Vapi API