Trovella Wiki

Docker Containers

Container architecture, health checks, the multi-stage Docker build, and how production and local development stacks differ.

Production Stack

The production stack is defined in infra/docker-compose.prod.yml and runs on the Compute Engine VM at /opt/trovella/.

Container Dependency Graph

caddy
  depends_on:
    web (service_healthy)
    typesense (service_started)

web
  depends_on:
    cloud-sql-proxy (service_started)
    inngest (service_started)

Caddy waits for the web container to pass its health check before accepting traffic. The web container waits for the Cloud SQL Proxy and Inngest to start (but not necessarily be healthy -- service_started not service_healthy).

Container Details

caddy (caddy:2-alpine)

Off-the-shelf Caddy image. Mounts the Caddyfile read-only and persists TLS certificates in named volumes (caddy-data, caddy-config). Exposes ports 80, 443, and 443/udp (HTTP/3) to the host.

web (us-central1-docker.pkg.dev/trovella-shared/trovella/web:latest)

Custom-built Next.js standalone image. Reads all configuration from /opt/trovella/.env via env_file. Exposes port 3000 only to the Docker network (not the host). Includes a health check:

healthcheck:
  test:
    [
      "CMD",
      "node",
      "-e",
      "fetch('http://localhost:3000/api/health').then(r => { if (!r.ok) process.exit(1) }).catch(() => process.exit(1))",
    ]
  interval: 10s
  timeout: 5s
  retries: 3
  start_period: 15s

The health check calls /api/health, which verifies database, Redis, and Typesense connectivity.

cloud-sql-proxy (gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.21.2)

Google's official Cloud SQL Auth Proxy. Authenticates to Cloud SQL using the VM service account's IAM credentials. The connection name is passed via the CLOUD_SQL_CONNECTION_NAME environment variable from .env. Listens on port 5432 on the Docker network.

command:
  - ${CLOUD_SQL_CONNECTION_NAME}
  - --address=0.0.0.0
  - --port=5432

typesense (typesense/typesense:27.1)

Off-the-shelf Typesense image. Data persists in a named volume (typesense-data). The API key is set via the TYPESENSE_API_KEY environment variable. Only exposed on the Docker network (port 8108).

inngest (inngest/inngest:latest)

Self-hosted Inngest Dev Server (Go binary). Runs in production mode (inngest start, not inngest dev). Configured with event and signing keys passed as CLI flags. Points its SDK URL at the web container:

command: >
  inngest start
  --host 0.0.0.0
  --port 8288
  --event-key ${INNGEST_EVENT_KEY}
  --signing-key ${INNGEST_SIGNING_KEY}
  --sdk-url http://web:3000/api/inngest

For details on Inngest function definitions and job patterns, see Data & Storage -- Background Jobs.

Restart Policy

All containers use restart: unless-stopped. When the VM reboots (maintenance, upgrade), Docker automatically restarts all containers. This is how the system recovers after a machine type upgrade.

Named Volumes

VolumeContainerPurpose
caddy-datacaddyTLS certificates and OCSP staples
caddy-configcaddyCaddy runtime configuration
typesense-datatypesenseSearch index data

The web container is stateless. The Cloud SQL Proxy and Inngest containers store no local data.

Docker Build Pipeline

The Next.js app image is built by the build-push CI job using a multi-stage Dockerfile at apps/web/Dockerfile.

Build Stages

base (node:22-alpine + pnpm 10.32.1)
  |
  v
deps (pnpm install --frozen-lockfile, monorepo package.json files only)
  |
  v
builder (pnpm turbo build --filter=@repo/web)
  |
  v
runner (standalone server.js, ~150 MB final image)

base -- Enables corepack and activates pnpm. Shared by deps and builder.

deps -- Copies only pnpm-workspace.yaml, pnpm-lock.yaml, and every package's package.json for cache-efficient dependency installation. No source code is copied in this stage.

builder -- Copies the full source tree and runs the Turbo build filtered to @repo/web. Build-time arguments for NEXT_PUBLIC_* variables are injected here:

ARG NEXT_PUBLIC_BETTER_AUTH_URL=https://trovella.ai
ARG NEXT_PUBLIC_SENTRY_DSN
ARG SENTRY_AUTH_TOKEN

runner -- Minimal production image. Copies only the Next.js standalone output (server.js), static assets, and public files. Runs as a non-root nextjs user (UID 1001). Final image size is approximately 150 MB.

Image Registry

Images are pushed to Google Artifact Registry:

us-central1-docker.pkg.dev/trovella-shared/trovella/web

Each build is tagged with both the Git commit SHA and latest. The VM always pulls latest. For rollback, the commit SHA tag provides a stable reference to any previous build.

Build-Time vs Runtime Configuration

VariableWhen SetHow
NEXT_PUBLIC_BETTER_AUTH_URLBuild timeDocker ARG in builder stage
NEXT_PUBLIC_SENTRY_DSNBuild timeDocker ARG, read from Secret Manager in CI
DATABASE_URLRuntime.env file on VM, synced from Secret Manager
ANTHROPIC_API_KEYRuntime.env file on VM
All other secretsRuntime.env file on VM

NEXT_PUBLIC_* variables must be set at build time because Next.js inlines them into the client bundle. All other configuration is runtime, read from the .env file that sync-secrets-vm.sh generates.

Local Development Stack

The local stack (docker-compose.yml at the monorepo root) provides the same services with development-friendly defaults:

DifferenceProductionLocal
DatabaseCloud SQL via Auth ProxyLocal pgvector/pgvector:pg18 container
RedisUpstash (managed)Local redis:8-alpine container
EmailResend (deferred)Mailpit (localhost:8025 for UI)
Inngest modeinngest start (production)inngest dev (development, no auth)
Next.jsDocker container with standalone buildpnpm dev on the host (Turbopack)
PortsOnly 80/443 via CaddyAll services on host ports

Start with pnpm docker:up and stop with pnpm docker:down. The Next.js dev server runs on the host machine (not in a container), connecting to the containerized services.

Adding a New Service Container

  1. Add the service to infra/docker-compose.prod.yml with restart: unless-stopped
  2. Add a corresponding entry to docker-compose.yml for local development
  3. If the service needs secrets, add them to infra/sync-secrets-vm.sh and infra/environments/prod/main.tf
  4. If the service needs a persistent volume, add a named volume
  5. If the web container needs to reach the service, add a depends_on relationship
  6. Update the Compute Overview memory budget table
  7. Deploy: the CI pipeline SCPs docker-compose.prod.yml to the VM and runs docker compose up -d --remove-orphans

On this page