Deploy Pipeline
Step-by-step breakdown of what happens when code is merged to main, from CI validation through container restart.
Every merge to main triggers four CI jobs that run the code through quality gates, build a Docker image, optionally migrate the database, and deploy to the production VM. This page walks through each stage.
Pipeline Stages
merge PR to main
|
v
quality (15 min) ---------> migrate-prod (5 min, conditional)
| |
v v
build-push (15 min) -------> deploy-prod (10 min)
|
v
health check passes
The deploy-prod job requires all three upstream jobs (quality, build-push, migrate-prod) to pass before it runs. build-push and migrate-prod run in parallel after quality completes.
Stage 1: Quality Checks
The quality job runs the same checks you run locally with pnpm ci:check:
pnpm format:check-- Prettier formattingpnpm turbo lint --affected-- ESLint across all packagespnpm dep-cruise-- dependency graph validationpnpm lint:dead-code-- Knip dead code detectionpnpm lint:duplication-- jscpd copy-paste detectionpnpm turbo typecheck --affected-- TypeScript compilationpnpm db:migrate-- apply migrations to an ephemeral CI Postgrespnpm turbo test --affected-- Vitest test suitepnpm turbo build --affected-- Next.js production build
The job uses service containers (Postgres 18 + Redis 8 + Typesense 27.1) for integration tests. See Delivery -- Pipeline for the full CI pipeline architecture.
Stage 2: Production Migration (Conditional)
The migrate-prod job only runs when files in these paths change:
packages/db/src/migrations/**packages/db/src/schema/**packages/db/src/seed/**
When triggered, the job authenticates to GCP via Workload Identity Federation, starts a Cloud SQL Auth Proxy, applies migrations, verifies the migration count, and seeds reference data. For the full details, see Data & Storage -- CI Deployment.
Migrations always complete before the new application code is deployed. The deploy-prod job depends on migrate-prod, ensuring the database schema is updated first.
Stage 3: Docker Build and Push
The build-push job builds the Next.js application into a standalone Docker image using a multi-stage Dockerfile.
Build Stages
| Stage | Base Image | Purpose |
|---|---|---|
base | node:22-alpine | Enable pnpm via corepack |
deps | base | Install all dependencies with --frozen-lockfile |
builder | base | Run pnpm turbo build --filter=@repo/web |
runner | node:22-alpine | Copy standalone output, run as non-root nextjs user |
The final image is approximately 150 MB and contains only the Next.js standalone server.js, static assets, and public files.
Build Arguments
Three values are injected at build time:
| Arg | Source | Purpose |
|---|---|---|
NEXT_PUBLIC_BETTER_AUTH_URL | Hardcoded https://trovella.ai | Auth callback base URL (inlined by Next.js) |
NEXT_PUBLIC_SENTRY_DSN | GCP Secret Manager | Sentry error reporting DSN (inlined by Next.js) |
SENTRY_AUTH_TOKEN | GitHub secret | Sentry source map upload token |
NEXT_PUBLIC_* variables must be set at build time because Next.js inlines them into the client bundle. Runtime environment variables cannot override them.
Image Tags
Each build pushes two tags to Artifact Registry (us-central1-docker.pkg.dev/trovella-shared/trovella/web):
<commit-sha>-- immutable tag for rollback targetinglatest-- mutable tag pulled bydocker compose pullon the VM
Build Caching
The build uses GitHub Actions cache (type=gha) with Buildx. Layer caching means most deploys only rebuild the builder and runner stages (the deps stage is cached when pnpm-lock.yaml hasn't changed).
Stage 4: Deploy to Production VM
The deploy-prod job connects to the production VM via IAP SSH tunnel and performs five steps:
Step 1: Copy files to the VM
gcloud compute scp \
infra/docker-compose.prod.yml \
infra/Caddyfile \
infra/sync-secrets-vm.sh \
trovella-prod-vm:~ \
--tunnel-through-iap
These files are copied from the repo, not from the VM's filesystem. This means every deploy brings the latest compose configuration, Caddy config, and secret sync script.
Step 2: Move files into position
The SCP'd files land in ~/ and are moved to /opt/trovella/, the working directory for all Docker Compose operations.
Step 3: Sync secrets
sync-secrets-vm.sh reads each secret from GCP Secret Manager and writes them to /opt/trovella/.env. The script:
- Reads 14 secrets (API keys, database URL, auth secrets, etc.)
- Writes static config values (NODE_ENV, HOSTNAME, PORT, service URLs)
- Rewrites
DATABASE_URLto route through thecloud-sql-proxycontainer instead of the public Cloud SQL IP - Atomically replaces the
.envfile (write to temp,mvover) - Sets
chmod 600on the result
See Infrastructure -- Secrets for the full secret management flow.
Step 4: Pull and restart containers
docker compose -f docker-compose.prod.yml pull
docker compose -f docker-compose.prod.yml up -d --remove-orphans
docker image prune -f
docker compose pull fetches the new web:latest image from Artifact Registry. up -d --remove-orphans restarts only containers whose images or configuration changed. The --remove-orphans flag removes containers for services no longer in the compose file. image prune cleans up old image layers.
Step 5: Health check
The web container has a Docker health check that polls http://localhost:3000/api/health every 10 seconds with a 15-second start period. Docker Compose marks the container as healthy when the check passes, and Caddy's depends_on: service_healthy condition ensures it only routes traffic to healthy containers.
Container Dependencies
The docker-compose.prod.yml defines a dependency chain:
caddy
depends_on: web (service_healthy), typesense (service_started)
web
depends_on: cloud-sql-proxy (service_started), inngest (service_started)
Caddy waits for the web container to pass its health check before accepting traffic. The web container waits for the database proxy and Inngest to start (but not necessarily be healthy -- these are service_started).
Concurrency Control
The CI workflow uses GitHub's concurrency feature:
concurrency:
group: ci-${{ github.ref }}
cancel-in-progress: true
If a new push to main arrives while a deploy is in progress, the in-progress run is canceled and the new one takes over. This prevents race conditions where two deploys try to update the VM simultaneously.
Timeline
| Job | Typical Duration | Timeout |
|---|---|---|
| quality | 3--5 min | 15 min |
| migrate-prod (when triggered) | 1--2 min | 5 min |
| build-push | 3--8 min | 15 min |
| deploy-prod | 1--2 min | 10 min |
| Total | 5--10 min |