Retry and Concurrency

How Inngest handles failures, step-level durability, retry configuration, and concurrency limits.

Inngest provides step-level durability out of the box. Each step.run() call is a checkpoint: if a later step fails, retries start from the failed step, not from the beginning. This is the core reason Inngest was chosen over simpler alternatives like BullMQ (see Inngest Decision).

Step-Level Durability

Consider the index-content function with four steps:

Step 1: chunk-content       -- success, result cached
Step 2: generate-context    -- success, result cached
Step 3: embed-chunks        -- FAILURE (Gemini API timeout)
Step 4: store-and-sync      -- never reached

On retry, Inngest replays the cached results for steps 1 and 2, then re-executes step 3. This avoids redundant Claude Haiku calls (step 2) which would waste both time and tokens.

How It Works

Each step.run() call:

Checks if a cached result exists for this step ID in this execution
If cached, returns the cached value immediately (no re-execution)
If not cached, executes the function body, stores the result, and returns it

Step IDs must be unique within a function. Inngest uses the string passed to step.run("step-name", ...) as the identifier.

Retry Configuration

Retries are configured per function in the createFunction options:

Function	Retries	Effective Attempts	Rationale
`index-content`	2	3 (1 initial + 2 retries)	External API calls (Haiku, Gemini) can have transient failures
`welcome-email`	3	4 (1 initial + 3 retries)	Email delivery can be flaky; more retries are cheap

Inngest uses exponential backoff between retries by default. The backoff schedule is managed by the Inngest server, not the SDK.

When Retries Are Exhausted

If all retries fail, the function is marked as failed in the Inngest dashboard. There is no dead-letter queue or automatic alerting configured yet. Failed functions can be manually retried from the Inngest dashboard.

Concurrency Control

Concurrency limits prevent a function from overwhelming external APIs or database connections:

export const indexContent = inngest.createFunction(
  {
    id: "index-content",
    triggers: [{ event: "search/content.created" }],
    retries: 2,
    concurrency: [{ limit: 5 }],
  },
  async ({ event, step }) => {
    /* ... */
  },
);

The concurrency: [{ limit: 5 }] setting means at most 5 instances of index-content run simultaneously across the entire application. This protects:

The Gemini embedding API from rate limiting
The Cloud SQL connection pool from exhaustion
The single VM from CPU saturation

The welcome-email function has no concurrency limit (default is unlimited) because it performs no heavy computation.

Concurrency Scope

The current concurrency configuration uses a global limit (all executions of index-content share the same counter). Inngest also supports key-based concurrency for per-tenant or per-user limits:

// Not yet used, but available if needed:
concurrency: [{ limit: 2, key: "event.data.organizationId" }];

This would allow at most 2 concurrent indexing jobs per organization, useful if tenant fairness becomes a concern.

Failure Modes

Transient API failures

External APIs (Claude Haiku, Gemini) can return 429 (rate limit) or 5xx errors. Step-level retries handle these automatically. The 2-retry configuration for index-content provides 3 total attempts with exponential backoff.

Database connection errors

The store-and-sync step writes to PostgreSQL via withTenantContext. If the database is temporarily unavailable, this step retries. The prior steps (chunk, context, embed) are not re-executed.

Inngest server unavailability

If the Inngest server is down when an MCP tool calls emitContentCreated, the event is lost. The emitter uses fire-and-forget semantics (await client.send(...) will throw, but the MCP tool's database write has already completed). The research artifact is stored but not indexed. Re-indexing requires manually re-emitting the event or triggering a backfill.

Large content

Very large documents produce many chunks, which means many Haiku calls in step 2 (sequential within the step) and a large batch embedding call in step 3. The concurrency limit of 5 prevents multiple large documents from processing simultaneously and overwhelming the VM.

Monitoring

Failed and slow functions are visible in the Inngest dashboard. Access it via SSH tunnel:

gcloud compute ssh trovella-vm -- -L 8288:localhost:8288
# Then open http://localhost:8288

The dashboard shows:

Function run history (success, failure, in-progress)
Step-level execution details and cached results
Retry attempts and backoff timing
Event payload inspection

For production observability, see Infrastructure -- Observability.