Trovella Wiki

ADR-006: Background Jobs -- Inngest (Self-Hosted)

Decision record for choosing Inngest as the durable execution engine over Temporal, BullMQ, and DIY approaches.

Status: Accepted Date: 2026-03-21 (Week 0 Decision Sprint), deferred 2026-03-27 (TRO-12), reinstated 2026-03-30 (TRO-56) Deciders: Kyle Olson (Solo Founder)

Decision

Use Inngest open-source Dev Server (Go binary), self-hosted on the Compute Engine VM, with state stored in the existing Cloud SQL PostgreSQL instance. The dashboard is accessible only via SSH tunnel. The SDK (v4) communicates with the Next.js app through a /api/inngest route using signing key authentication.

Context

The research engine architecture requires durable multi-step workflows: document chunking, contextual retrieval via Claude Haiku, Gemini embedding, pgvector insert, and Typesense sync. Each step can fail independently and needs retry capability without re-executing prior steps.

Inngest was initially scaffolded in TRO-12 with only a placeholder welcome-email function. The founder challenged this: "I don't like bringing in this many external platforms before we hit the scaling stage of the company." The system was nearly removed, then reinstated when TRO-56 (self-hosted Inngest) and TRO-58 (hybrid search pipeline) provided concrete workloads. This follows the "progressive complexity" principle: infrastructure earns its place through real use cases.

Alternatives Considered

Temporal

Most mature durable execution platform with a strong TypeScript SDK. Rejected because Temporal Cloud costs $200+/month and self-hosting requires a multi-service cluster. Inngest provides equivalent step-level durability at $0 with a single binary.

BullMQ + Redis

Well-established queue with simple semantics. Rejected because it provides only job-level retries (not step-level), so partial failure recovery would need to be built manually. Also would have required adding Redis as a dependency for the queue alone. Accepted as the graduated upgrade path at ~500K-1M executions/month if Inngest limits are hit.

Google Cloud Workflows

GCP-native and managed. Rejected because workflow definitions are YAML-based (poor for AI agent code generation) and there is no waitForEvent equivalent for human-in-the-loop research checkpoints.

DIY (fire-and-forget + custom retry)

Zero dependencies. This was the interim approach after TRO-12 deferred Inngest. It worked for trivial tasks but was insufficient for the multi-step search indexing pipeline where each step can fail independently.

Implementation Details

Self-hosted on VM, not Inngest Cloud

The Go binary runs as a Docker container on the shared Compute Engine VM. It stores workflow state in the existing Cloud SQL PostgreSQL instance, so there is no separate database or Redis dependency. Cost is $0 incremental.

Dashboard via SSH tunnel only

The Inngest dashboard is not exposed publicly. Access it with:

gcloud compute ssh trovella-vm -- -L 8288:localhost:8288
# Then open http://localhost:8288

SDK v4 signing key requirement

Inngest SDK v4 silently returns 500 errors if neither INNGEST_DEV=1 nor a valid INNGEST_SIGNING_KEY is set. In production, signing, event, and API keys are generated and stored in GCP Secret Manager, passed to the binary via CLI flags (inngest start --event-key <key> --signing-key <key>). The binary does not read these from environment variables.

Middleware bypass for /api/inngest

Inngest sends unsigned HTTP requests to the app for function registration and step callbacks. The auth middleware must not block these. /api/inngest is in the public routes list in proxy.ts, alongside /api/auth and /api/mcp.

Consequences

Positive

  • Step-level durability: a failure during embedding does not re-execute chunking
  • $0 incremental cost: self-hosted on existing VM, state stored in existing Cloud SQL
  • Event-driven architecture: MCP tools are decoupled from processing pipelines
  • Built-in dashboard for workflow monitoring, step inspection, and retry controls
  • Concurrency control prevents overwhelming external APIs

Negative

  • Additional Docker container on the VM (lightweight Go binary)
  • SDK v4 quirks: signing key requirement is not well-documented; step.run() returns any in strict TypeScript
  • Cognitive overhead of an additional platform, even when self-hosted and free

Risks

  • Inngest project health -- venture-funded startup; migration to BullMQ + Redis would require rewriting workflow definitions (mitigated by small function surface area)
  • Cloud SQL connection pooling -- the Go binary maintains its own pool alongside the web app (mitigated by Cloud SQL Enterprise Plus upgrade path)
  • Single-VM bottleneck -- CPU-heavy workflows could impact web performance (mitigated by per-function concurrency limits)

References

  • Linear: TRO-12 (initial scaffold), TRO-56 (self-hosted Inngest on VM), TRO-58 (hybrid search -- first real workload)
  • Related: ADR-008 (Compute -- VM + Docker Compose + Caddy), ADR-009 (Search -- Typesense + pgvector hybrid)

On this page