Trovella Wiki

ADR-012: CI/CD Pipeline -- Build, Test, Deploy

Decision record for the CI/CD pipeline structure, parallel jobs, local CI parity, and dependency automation.

Status: Accepted Date: 2026-03-25 (initial pipeline in TRO-7), parallelized 2026-04-01, ci:check local mirror added 2026-04-03 Deciders: Kyle Olson (Solo Founder)

Decision

  • CI platform: GitHub Actions with cancel-in-progress concurrency
  • Pipeline structure: Five jobs -- quality (10 checks), docs (3 checks), build-push (Docker image), migrate-prod (Cloud SQL migrations), deploy-prod (VM deployment)
  • Deploy gate: deploy-prod requires quality, build-push, and migrate-prod to all pass; docs runs independently and does not gate deployment
  • Local parity: pnpm ci:check runs the same quality checks locally that CI runs remotely
  • Docker builds: BuildKit with GitHub Actions cache layer; images pushed to Google Artifact Registry tagged with commit SHA + latest
  • Database migrations: Cloud SQL Auth Proxy in CI (not IP allowlisting)
  • VM deployment: SCP files + SSH via IAP tunnel, ~1 minute
  • Dependency automation: Renovate with weekend schedule, dev dependencies automerge, runtime dependencies require manual merge
  • Pre-commit: Husky + lint-staged for Prettier and ESLint auto-fix on staged files
  • Branch protection: CI status check required on main, force pushes blocked, no direct pushes

Context

The CI/CD pipeline was one of the first things built (TRO-7, Week 0) because the founder wanted automated quality gates from day one. The initial version was minimal -- format, lint, typecheck, test, build -- running sequentially in a single job. It took about 1 minute.

Over Phase 0, the pipeline grew as quality checks were added: dependency-cruiser for package boundary validation, Knip for dead code detection, jscpd for code duplication, RLS integration tests requiring a live Postgres database, Typesense service containers for search tests, and a documentation quality job. By Phase 0's end, the monolithic pipeline took 11 minutes.

The restructuring split the pipeline into parallel jobs. quality and build-push run concurrently (since build-push does not depend on quality checks for the Docker build itself -- only deploy-prod gates on both). The docs job was separated because prose linting should not block production deploys.

A separate problem emerged: the local development workflow did not match CI. The founder discovered Knip issues consistently passing locally but failing in CI. Root cause: the local pre-commit workflow only ran lint + typecheck + test, missing 5 of the 10 CI checks. The pnpm ci:check script was created to mirror CI, and CLAUDE.md was updated to require all AI agents to run it before every commit.

Decision Drivers

  1. Fast feedback on PRs -- developers and AI agents need results within minutes, not 11 minutes for sequential checks
  2. No path to production without quality gates -- deployment must be impossible when any quality check fails
  3. Local/CI parity -- what passes locally must pass in CI; divergence wastes time and erodes trust
  4. Cost-conscious -- GitHub Actions free tier (2,000 minutes/month) must be sufficient for a solo developer
  5. Security posture preserved -- deployment must not require exposing SSH ports or database IPs

Alternatives Considered

Monolithic single-job pipeline vs parallel jobs

The original pipeline ran all checks sequentially in one job (simpler YAML, one failure point). At 11 minutes, it was the longest wait in the development loop.

Why parallel won: Splitting into parallel jobs reduced wall time from 11 minutes to ~3.5 minutes. The quality job still runs checks sequentially (later checks depend on earlier ones), but build-push runs in parallel because the Docker build is independent. A 3x speedup justified the additional YAML complexity.

Cloud SQL IP allowlisting vs Auth Proxy for migrations

GitHub Actions runners have ephemeral IPs from a large shared pool. Allowlisting the full range would negate the security posture of restricting Cloud SQL to the VM's static IP.

Why Auth Proxy won: Preserves Cloud SQL accessibility only from known, authenticated sources. The proxy adds ~15 seconds of setup (download binary, start, health check), negligible in a CI pipeline. Authentication uses Workload Identity Federation -- keyless, no stored credentials.

Full Renovate automerge vs devDeps-only automerge

The initial Renovate configuration automerged all patch/minor updates. The founder pushed back: runtime dependencies ship to production, and a minor version bump could change behavior in ways tests do not cover.

Revised policy (three tiers):

  1. DevDependency patch/minor -- automerge when CI passes (affects dev environment only)
  2. Runtime dependency (any version) -- labeled dependency-runtime, requires manual merge (ships to production)
  3. Major updates (any dep type) -- labeled dependency-major, requires manual review (often contains breaking changes)

Key Implementation Decisions

Docs job does not gate deployment

Documentation quality failures (prose lint violations, broken links, stale docs) do not block production deploys. A typo in a guide is not worth holding back a security fix. The docs job still runs on every PR so failures are visible, just not blocking.

Expiring TODO comments

The unicorn/expiring-todo-comments ESLint rule requires every TODO, FIXME, or HACK comment to include an expiration date [YYYY-MM-DD] or a Linear ticket [TRO-NNN]. When the date passes, ESLint promotes the comment to an error and CI fails. ignoreDatesOnPullRequests: true means expiration is only checked on main, not on PR branches.

Branch protection

  • Required status check: quality must pass before merge
  • No force pushes to main
  • No direct pushes to main (all changes must go through a PR)
  • No "skip CI" escape hatch -- the founder's position is that if something is urgent enough to skip CI, the CI check should be fixed

CODEOWNERS (planned, deferred to Phase 2)

CODEOWNERS would require founder approval for changes to CI workflow files, enforcement configs, auth code, schema, and CLAUDE.md. Deferred because the founder reviews all PRs manually during Phase 1 (solo developer). The motivation: a Claude Code session once removed the dep-cruise CI step to unblock a failing deployment instead of fixing the violation.

Consequences

Positive

  • Fast CI feedback -- ~3.5 minutes instead of 11 minutes
  • No path around quality gates -- deploy-prod requires all three prerequisite jobs; branch protection prevents direct pushes
  • Local/CI parity -- pnpm ci:check eliminates the "works locally, fails in CI" class of issues
  • Secure deployment -- no public SSH, no exposed database IPs, no long-lived service account keys
  • Self-cleaning technical debt -- expiring TODOs ensure every shortcut has a timer

Negative

  • Service container fragility -- Docker Hub rate limiting causes transient CI failures. No resolution yet beyond re-running the failed job.
  • No staging environment -- the pipeline deploys directly to production. Staging is planned for Month 2.
  • Migration job complexity -- Cloud SQL Auth Proxy setup is the most complex part of the pipeline and difficult to debug when it fails.
  • Pre-commit gaps -- the hook only runs Prettier and ESLint --fix. Typecheck, dead code, dependency violations, and test failures are not caught at commit time.
  • Docs job is advisory-only -- documentation degradation requires manual attention.

Risks

  • Free tier exhaustion -- mitigated by cancel-in-progress and --affected filtering (~570 runs/month capacity)
  • DevDependency automerge -- could break the build in ways that pass CI but cause local issues; mitigated by weekend schedule
  • Docker Hub rate limiting escalation -- planned migration to pre-pull images from Artifact Registry
  • Single deploy target -- no blue-green or canary; bad deploys affect all users immediately; mitigated by ~1 minute rollback time
  • CODEOWNERS deferred -- AI agents can modify CI/enforcement files without mandatory review until Phase 2

Validation

RuleEnforcement
All quality checks pass before deploydeploy-prod job needs: [quality, build-push, migrate-prod]
No direct pushes to mainGitHub branch protection
Local checks mirror CIpnpm ci:check (format, lint, dep-cruise, dead-code, duplication, typecheck, test)
Migrations skip when unnecessarydorny/paths-filter on schema/migration/seed paths
Every TODO has an expirationunicorn/expiring-todo-comments ESLint rule
Runtime dependency updates require reviewRenovate dependency-runtime label + automerge: false

References

On this page