ADR-016: Environment Strategy

Decision record for local Docker hybrid development, three GCP projects for IAM isolation, and GCP Secret Manager.

Status: Accepted Date: 2026-03-21 (Week 0 Decision Sprint), implemented 2026-03-24 through 2026-03-30 Deciders: Kyle Olson (Solo Founder)

Decision

Use a hybrid environment strategy: Next.js runs natively for fast hot reload, infrastructure services (PostgreSQL, Redis, Typesense, Inngest, Mailpit) run in Docker Compose. Production runs entirely in Docker Compose on a single Compute Engine VM. Three GCP projects provide IAM isolation between environments. Secrets live in GCP Secret Manager and sync to .env files via shell scripts.

Context

A solo founder building with AI coding agents needs an environment strategy that balances three concerns:

Fast local feedback -- Docker filesystem on Windows kills Turbopack HMR (sub-100ms becomes multi-second)
Service parity -- local Docker services must match production versions exactly
Minimal operational overhead -- a solo founder cannot manage multiple deployment platforms

The three-project GCP structure emerged from a security question: "Can you remind me why we need this shared project?" The answer: IAM isolation. If Terraform state and CI service accounts lived in trovella-prod, a compromised CI pipeline could modify its own permissions in the production project. The shared project creates a boundary between CI infrastructure and production resources.

Alternatives Considered

Fully Dockerized Development (All Services + App)

Single docker compose up starts everything, closest to production. Rejected because Docker filesystem on Windows kills Turbopack's hot module replacement. This is a dealbreaker for a development workflow where fast feedback is critical.

Fully Native Development (No Docker)

Maximum performance, no Docker overhead. Rejected because it requires installing and pinning exact versions of PostgreSQL, Redis, Typesense, and Inngest natively. No clean teardown, version drift between sessions, and port conflicts with other local services.

Single GCP Project

Simpler, no cross-project IAM grants. Rejected because the CI service account would have permissions in the same project as production resources. A compromised pipeline could modify its own IAM, access secrets directly, or modify the VM.

Doppler / HashiCorp Vault for Secrets

Purpose-built secret management with rotation and sync. Rejected because GCP Secret Manager is included in the free tier and integrates natively with IAM and Workload Identity Federation, avoiding additional cost and vendor dependencies.

Implementation Summary

The full implementation is described across the child pages in this topic:

Project Layout -- the three-project structure and what lives where
Terraform Structure -- directory-per-environment with GCS backend
Secret Provisioning -- empty shells in Terraform, values via sync scripts
Workload Identity Federation -- keyless CI/CD authentication

Three categories of environment variables exist in production:

Static values -- non-sensitive, hardcoded in sync-secrets-vm.sh: NODE_ENV=production, TYPESENSE_URL=http://typesense:8108, INNGEST_BASE_URL=http://inngest:8288
Runtime secrets -- pulled from Secret Manager: DATABASE_URL, ANTHROPIC_API_KEY, OAuth credentials, signing keys
Build-time public values -- NEXT_PUBLIC_* values baked into the Docker image via --build-arg in CI: NEXT_PUBLIC_BETTER_AUTH_URL, NEXT_PUBLIC_SENTRY_DSN

Consequences

Positive

True database parity: same PostgreSQL 18 + pgvector locally and in Cloud SQL
Fast local development: native Turbopack with sub-100ms hot reload
IAM isolation: compromised CI cannot modify its own permissions
Secret safety: credentials flow through Secret Manager with audit logging
One-command local setup: pnpm docker:up && pnpm db:reset
Zero-cost secret management: GCP Secret Manager free tier

Negative

Two sync scripts to maintain (sync-secrets.sh for local, sync-secrets-vm.sh for VM) -- must stay in sync when secrets are added
.env placement confusion: Next.js loads from apps/trovella/, not the monorepo root (caused hours of debugging during initial setup)
No staging environment until ~Month 2 -- production is the only live environment
Manual secret value creation: Terraform creates empty shells, actual values set via gcloud secrets versions add

Risks

IPv6 Terraform failures -- Terraform's Go runtime preferentially uses IPv6, which fails on networks with flaky IPv6 connectivity. The workaround (Windows netsh to prefer IPv4) is manual and machine-specific.
Single-environment deployment -- every merge to main deploys directly to production. Mitigated by the quality pipeline gating deployment.
Docker Hub rate limiting in CI -- GitHub Actions runners share Docker Hub pull limits, causing transient CI failures.

References

Full ADR-016 text
Related: ADR-001 (Database), ADR-008 (Compute -- VM + Docker Compose + Caddy), ADR-012 (CI/CD)
Linear: TRO-11 (Cloud Infrastructure), TRO-8 (Database setup), TRO-54 (VM infrastructure), TRO-16 (Staging -- deferred)