Deploy Pipeline

Step-by-step breakdown of what happens when code is merged to main, from CI validation through container restart.

Every merge to main triggers four CI jobs that run the code through quality gates, build a Docker image, optionally migrate the database, and deploy to the production VM. This page walks through each stage.

Pipeline Stages

merge PR to main
  |
  v
quality (15 min) ---------> migrate-prod (5 min, conditional)
  |                              |
  v                              v
build-push (15 min) -------> deploy-prod (10 min)
                                 |
                                 v
                           health check passes

The deploy-prod job requires all three upstream jobs (quality, build-push, migrate-prod) to pass before it runs. build-push and migrate-prod run in parallel after quality completes.

Stage 1: Quality Checks

The quality job runs the same checks you run locally with pnpm ci:check:

pnpm format:check -- Prettier formatting
pnpm turbo lint --affected -- ESLint across all packages
pnpm dep-cruise -- dependency graph validation
pnpm lint:dead-code -- Knip dead code detection
pnpm lint:duplication -- jscpd copy-paste detection
pnpm turbo typecheck --affected -- TypeScript compilation
pnpm db:migrate -- apply migrations to an ephemeral CI Postgres
pnpm turbo test --affected -- Vitest test suite
pnpm turbo build --affected -- Next.js production build

The job uses service containers (Postgres 18 + Redis 8 + Typesense 27.1) for integration tests. See Delivery -- Pipeline for the full CI pipeline architecture.

Stage 2: Production Migration (Conditional)

The migrate-prod job only runs when files in these paths change:

packages/db/src/migrations/**
packages/db/src/schema/**
packages/db/src/seed/**

When triggered, the job authenticates to GCP via Workload Identity Federation, starts a Cloud SQL Auth Proxy, applies migrations, verifies the migration count, and seeds reference data. For the full details, see Data & Storage -- CI Deployment.

Migrations always complete before the new application code is deployed. The deploy-prod job depends on migrate-prod, ensuring the database schema is updated first.

Stage 3: Docker Build and Push

The build-push job builds the Next.js application into a standalone Docker image using a multi-stage Dockerfile.

Build Stages

Stage	Base Image	Purpose
`base`	`node:22-alpine`	Enable pnpm via corepack
`deps`	base	Install all dependencies with `--frozen-lockfile`
`builder`	base	Run `pnpm turbo build --filter=@repo/web`
`runner`	`node:22-alpine`	Copy standalone output, run as non-root `nextjs` user

The final image is approximately 150 MB and contains only the Next.js standalone server.js, static assets, and public files.

Build Arguments

Three values are injected at build time:

Arg	Source	Purpose
`NEXT_PUBLIC_BETTER_AUTH_URL`	Hardcoded `https://trovella.ai`	Auth callback base URL (inlined by Next.js)
`NEXT_PUBLIC_SENTRY_DSN`	GCP Secret Manager	Sentry error reporting DSN (inlined by Next.js)
`SENTRY_AUTH_TOKEN`	GitHub secret	Sentry source map upload token

NEXT_PUBLIC_* variables must be set at build time because Next.js inlines them into the client bundle. Runtime environment variables cannot override them.

Image Tags

Each build pushes two tags to Artifact Registry (us-central1-docker.pkg.dev/trovella-shared/trovella/web):

<commit-sha> -- immutable tag for rollback targeting
latest -- mutable tag pulled by docker compose pull on the VM

Build Caching

The build uses GitHub Actions cache (type=gha) with Buildx. Layer caching means most deploys only rebuild the builder and runner stages (the deps stage is cached when pnpm-lock.yaml hasn't changed).

Stage 4: Deploy to Production VM

The deploy-prod job connects to the production VM via IAP SSH tunnel and performs five steps:

Step 1: Copy files to the VM

gcloud compute scp \
  infra/docker-compose.prod.yml \
  infra/Caddyfile \
  infra/sync-secrets-vm.sh \
  trovella-prod-vm:~ \
  --tunnel-through-iap

These files are copied from the repo, not from the VM's filesystem. This means every deploy brings the latest compose configuration, Caddy config, and secret sync script.

Step 2: Move files into position

The SCP'd files land in ~/ and are moved to /opt/trovella/, the working directory for all Docker Compose operations.

Step 3: Sync secrets

sync-secrets-vm.sh reads each secret from GCP Secret Manager and writes them to /opt/trovella/.env. The script:

Reads 14 secrets (API keys, database URL, auth secrets, etc.)
Writes static config values (NODE_ENV, HOSTNAME, PORT, service URLs)
Rewrites DATABASE_URL to route through the cloud-sql-proxy container instead of the public Cloud SQL IP
Atomically replaces the .env file (write to temp, mv over)
Sets chmod 600 on the result

See Infrastructure -- Secrets for the full secret management flow.

Step 4: Pull and restart containers

docker compose -f docker-compose.prod.yml pull
docker compose -f docker-compose.prod.yml up -d --remove-orphans
docker image prune -f

docker compose pull fetches the new web:latest image from Artifact Registry. up -d --remove-orphans restarts only containers whose images or configuration changed. The --remove-orphans flag removes containers for services no longer in the compose file. image prune cleans up old image layers.

Step 5: Health check

The web container has a Docker health check that polls http://localhost:3000/api/health every 10 seconds with a 15-second start period. Docker Compose marks the container as healthy when the check passes, and Caddy's depends_on: service_healthy condition ensures it only routes traffic to healthy containers.

Container Dependencies

The docker-compose.prod.yml defines a dependency chain:

caddy
  depends_on: web (service_healthy), typesense (service_started)

web
  depends_on: cloud-sql-proxy (service_started), inngest (service_started)

Caddy waits for the web container to pass its health check before accepting traffic. The web container waits for the database proxy and Inngest to start (but not necessarily be healthy -- these are service_started).

Concurrency Control

The CI workflow uses GitHub's concurrency feature:

concurrency:
  group: ci-${{ github.ref }}
  cancel-in-progress: true

If a new push to main arrives while a deploy is in progress, the in-progress run is canceled and the new one takes over. This prevents race conditions where two deploys try to update the VM simultaneously.

Timeline

Job	Typical Duration	Timeout
quality	3--5 min	15 min
migrate-prod (when triggered)	1--2 min	5 min
build-push	3--8 min	15 min
deploy-prod	1--2 min	10 min
Total	5--10 min