ADR-014: Testing Infrastructure
Decision record for Vitest, mutation testing, real PostgreSQL for RLS tests, and AI-assisted quality enforcement.
Status: Accepted Date: 2026-03-30 (test-audit CLI), coverage baseline 2026-04-01, Postgres CI service added 2026-03-25 Deciders: Kyle Olson (Solo Founder) Linear: TRO-72 (test-audit CLI), TRO-8 (RLS implementation + CI Postgres), TRO-10 (critical RLS bypass bug)
Problem
AI agents write most of the code and tests. They exhibit a specific failure mode: syntactically correct tests that execute all code paths but verify nothing. A test that calls a function and asserts toBeDefined() on the result achieves coverage without catching bugs.
Three problems needed solving simultaneously:
- Quantitative measurement -- aggregate coverage data across a pnpm monorepo where each package runs its own Vitest instance
- Quality verification beyond coverage -- prove tests detect bugs, not just run code
- Workflow enforcement -- ensure AI agents follow test-first development and produce behavioral tests
Decisions
Test Runner: Vitest (not Jest)
Vitest provides native TypeScript support, native ESM, near-instant cold starts, and first-class monorepo support via workspace configs. It shares Vite's transformation pipeline, already used in the build toolchain.
Jest is deferred to Phase 2 when React Native development begins. Vitest is the better fit for Phase 1 because of its TypeScript/ESM nativity and faster feedback cycle for agentic development.
Quality Signal: Mutation Testing (not Coverage Thresholds)
Traditional coverage thresholds (e.g., "80% line coverage required") were descoped. Mutation testing via StrykerJS is the primary quality signal.
Coverage thresholds are strictly weaker than mutation testing -- they measure execution, not verification. A file with 100% coverage but surviving mutations has weak tests. A file with 80% coverage but zero surviving mutations has strong tests.
Three factors drove the descoping of coverage gates:
- Six packages have 0% coverage at baseline. Thresholds require noisy exclusions or a ratchet approach.
- Mutation testing is strictly stronger. It directly measures whether tests catch bugs.
- Agentic workflow timing mismatch. The agent that opens a PR has exited by the time CI comments arrive. Quality enforcement during development (via
/test-write) is more effective.
RLS Tests: Real PostgreSQL (not Mocked)
RLS integration tests run against a real pgvector/pgvector:pg18 container, with real RLS policies and real withTenantContext() calls. Mocks were rejected because:
- A mock cannot verify that a PostgreSQL RLS policy actually works
- The critical RLS bug in TRO-10 -- where
tenantProcedurepassed the baredbpool instead of the transaction-scopedtx, silently bypassing all RLS policies -- would not have been caught by mocked tests - Mocked RLS tests give false confidence; the whole point of RLS is database-level enforcement regardless of application bugs
Custom CLI Tool: trovella-test-audit
A monorepo-aware CLI at tools/test-audit/ with four commands (coverage, report, map, mutate). Uses istanbul-lib-coverage for cross-package coverage merging and StrykerJS for mutation testing. See Test Audit CLI for the full reference.
AI-Assisted Testing Skills
Two Claude Code skills enforce quality during development:
/test-write-- TDD with pre-mortem fragility catalogue (10 categories), anti-rationalization table, and mutation verification loops targeting 75%+ mutation score (85%+ for RLS/auth/CASL)/test-review-- five-dimension scoring (coverage, behavioral focus, completeness, isolation, mutation resilience), automatic CRITICAL severity for untested RLS/auth/CASL code
Consequences
Positive
- Mutation testing provides concrete, measurable test quality beyond coverage
- The
mapcommand lets agents read only relevant test files, preserving context window - Quality enforcement happens during development when an agent is present to act on it
- RLS tests prove tenant isolation at the database level
Negative
- Mutation testing is slow -- unsuitable for CI gating (minutes per package)
- The StrykerJS pnpm symlink workaround is fragile and depends on pnpm internals
- Six packages still have zero test files at baseline
- Skill effectiveness depends on AI model quality -- prompt-enforced, not code-enforced
Risks
- StrykerJS ecosystem stability (smaller community than Jest/Vitest)
- V8 coverage format compatibility with
istanbul-lib-coverage - AI-generated test quality drift as codebase grows
Validation
| Rule | Enforcement |
|---|---|
All packages have Vitest config with passWithNoTests: true | Manual -- required when adding new packages |
| RLS tests run against real PostgreSQL in CI | pgvector/pgvector:pg18 service container in CI |
| Coverage data aggregated across packages | trovella-test-audit coverage merges per-package JSON |
| Test quality measured beyond coverage | trovella-test-audit mutate runs StrykerJS |
| AI agents follow TDD workflow | /test-write skill (prompt enforcement) |
| Critical areas flagged as high severity | /test-review severity triggers for RLS, auth, CASL |
References
- Full ADR:
docs/architecture/decisions/014-testing-infrastructure.md - CLI tool source:
tools/test-audit/src/ - TDD skill:
.claude/skills/test-write/SKILL.md - Review skill:
.claude/skills/test-review/SKILL.md - CI pipeline:
.github/workflows/ci.yml
Testing Overview
Test framework, patterns, coverage strategy, and the three-layer quality approach -- Vitest, trovella-test-audit, and AI-assisted testing skills.
Test Configuration
Vitest workspace setup, per-package configs, environment selection, coverage providers, and how Turborepo orchestrates test runs.