Retry and Concurrency
How Inngest handles failures, step-level durability, retry configuration, and concurrency limits.
Inngest provides step-level durability out of the box. Each step.run() call is a checkpoint: if a later step fails, retries start from the failed step, not from the beginning. This is the core reason Inngest was chosen over simpler alternatives like BullMQ (see Inngest Decision).
Step-Level Durability
Consider the index-content function with four steps:
Step 1: chunk-content -- success, result cached
Step 2: generate-context -- success, result cached
Step 3: embed-chunks -- FAILURE (Gemini API timeout)
Step 4: store-and-sync -- never reached
On retry, Inngest replays the cached results for steps 1 and 2, then re-executes step 3. This avoids redundant Claude Haiku calls (step 2) which would waste both time and tokens.
How It Works
Each step.run() call:
- Checks if a cached result exists for this step ID in this execution
- If cached, returns the cached value immediately (no re-execution)
- If not cached, executes the function body, stores the result, and returns it
Step IDs must be unique within a function. Inngest uses the string passed to step.run("step-name", ...) as the identifier.
Retry Configuration
Retries are configured per function in the createFunction options:
| Function | Retries | Effective Attempts | Rationale |
|---|---|---|---|
index-content | 2 | 3 (1 initial + 2 retries) | External API calls (Haiku, Gemini) can have transient failures |
welcome-email | 3 | 4 (1 initial + 3 retries) | Email delivery can be flaky; more retries are cheap |
Inngest uses exponential backoff between retries by default. The backoff schedule is managed by the Inngest server, not the SDK.
When Retries Are Exhausted
If all retries fail, the function is marked as failed in the Inngest dashboard. There is no dead-letter queue or automatic alerting configured yet. Failed functions can be manually retried from the Inngest dashboard.
Concurrency Control
Concurrency limits prevent a function from overwhelming external APIs or database connections:
export const indexContent = inngest.createFunction(
{
id: "index-content",
triggers: [{ event: "search/content.created" }],
retries: 2,
concurrency: [{ limit: 5 }],
},
async ({ event, step }) => {
/* ... */
},
);
The concurrency: [{ limit: 5 }] setting means at most 5 instances of index-content run simultaneously across the entire application. This protects:
- The Gemini embedding API from rate limiting
- The Cloud SQL connection pool from exhaustion
- The single VM from CPU saturation
The welcome-email function has no concurrency limit (default is unlimited) because it performs no heavy computation.
Concurrency Scope
The current concurrency configuration uses a global limit (all executions of index-content share the same counter). Inngest also supports key-based concurrency for per-tenant or per-user limits:
// Not yet used, but available if needed:
concurrency: [{ limit: 2, key: "event.data.organizationId" }];
This would allow at most 2 concurrent indexing jobs per organization, useful if tenant fairness becomes a concern.
Failure Modes
Transient API failures
External APIs (Claude Haiku, Gemini) can return 429 (rate limit) or 5xx errors. Step-level retries handle these automatically. The 2-retry configuration for index-content provides 3 total attempts with exponential backoff.
Database connection errors
The store-and-sync step writes to PostgreSQL via withTenantContext. If the database is temporarily unavailable, this step retries. The prior steps (chunk, context, embed) are not re-executed.
Inngest server unavailability
If the Inngest server is down when an MCP tool calls emitContentCreated, the event is lost. The emitter uses fire-and-forget semantics (await client.send(...) will throw, but the MCP tool's database write has already completed). The research artifact is stored but not indexed. Re-indexing requires manually re-emitting the event or triggering a backfill.
Large content
Very large documents produce many chunks, which means many Haiku calls in step 2 (sequential within the step) and a large batch embedding call in step 3. The concurrency limit of 5 prevents multiple large documents from processing simultaneously and overwhelming the VM.
Monitoring
Failed and slow functions are visible in the Inngest dashboard. Access it via SSH tunnel:
gcloud compute ssh trovella-vm -- -L 8288:localhost:8288
# Then open http://localhost:8288
The dashboard shows:
- Function run history (success, failure, in-progress)
- Step-level execution details and cached results
- Retry attempts and backoff timing
- Event payload inspection
For production observability, see Infrastructure -- Observability.