Seed Data
The tiered seeding system for reference data, development data, research fixtures, and search embeddings.
Trovella uses a tiered seeding system that controls which data runs in which environment. The seed orchestrator (packages/db/src/seed.ts) calls each tier in order, gating development-only data behind a NODE_ENV check.
Seed Tiers
| Tier | File | Environments | Idempotent? | Purpose |
|---|---|---|---|---|
| Reference | seeds/reference.ts | All (dev, staging, prod) | Yes | Lookup data the app depends on |
| Development | seeds/development.ts | Dev, staging only | No (truncates first) | Test users, orgs, memberships |
| Research | seeds/research.ts | Dev, staging only | Depends | Research plans, steps, artifacts for testing |
| Search | seeds/search.ts | Dev, staging only | Yes | Document chunks with pre-computed embeddings |
Orchestrator
The seed entry point is packages/db/src/seed.ts:
import { db } from "./client";
import { seedDevelopmentData } from "./seeds/development";
import { seedReferenceData } from "./seeds/reference";
import { seedResearchData } from "./seeds/research";
import { seedSearchData } from "./seeds/search";
async function main() {
await seedReferenceData(db);
if (process.env["NODE_ENV"] !== "production") {
await seedDevelopmentData(db);
await seedResearchData(db);
await seedSearchData(db);
}
}
In production (NODE_ENV=production), only seedReferenceData runs. This is enforced in the CI migrate-prod job which sets NODE_ENV=production explicitly.
Tier 1: Reference Data
Reference data is lookup/registry data that the application requires to function. It must be idempotent -- safe to run multiple times without duplication.
Currently includes:
- AI model registry (
seeds/ai-models.ts): 4 models (Claude Opus 4.6, Sonnet 4.6, Haiku 4.5, Gemini Embedding 2) with pricing records. UsesON CONFLICT DO NOTHINGfor idempotence.
await db.insert(aiModel).values([...]).onConflictDoNothing();
await db.insert(aiModelPricing).values([...]).onConflictDoNothing();
Adding New Reference Data
- Create a new seed function in
packages/db/src/seeds/ - Make it idempotent (use
ON CONFLICT DO NOTHINGor upsert patterns) - Call it from
seedReferenceData()inseeds/reference.ts - Test:
pnpm db:seedshould be safe to run repeatedly without errors or duplicates
Tier 2: Development Data
Development data creates a predictable local environment for manual testing and RLS integration tests. It uses fixed, deterministic IDs so tests can reference them directly.
The seed truncates all auth-related tables first, then inserts fresh data:
// Fixed IDs for deterministic dev data
export const USER_ALICE_ID = "dev_user_alice_001";
export const USER_BOB_ID = "dev_user_bob_002";
export const ORG_ACME_ID = "dev_org_acme_001";
export const ORG_ALICE_PERSONAL_ID = "dev_org_alice_personal_002";
Creates:
- 2 users: Alice (alice@dev.trovella.com) and Bob (bob@dev.trovella.com)
- 2 organizations: Acme Corp (company type) and Alice's Space (personal type)
- 3 memberships: Alice owns both orgs, Bob is a member of Acme Corp
These IDs are imported by other seed tiers and by RLS integration tests.
Warning: Truncation
Development seeding runs TRUNCATE CASCADE on auth tables before inserting. This destroys any manually created data in local development. Use pnpm db:seed only when you want a clean slate.
Tier 3: Research Data
Research data creates sample research plans, steps, and artifacts for testing the research engine. It depends on the development data IDs (Alice, Acme Corp).
Creates:
- 3 research plans: completed deep dive, in-progress execution, and a failed plan
- 12 plan steps across the 3 plans, in various statuses
- 4 skill executions: deep research, scan, research routing, and a failed execution
- 5 research artifacts: analysis, source list, synthesis, finding, and comparison
- Audit logs and MCP tool call logs for the completed plan
The research seed uses an adaptive org selection pattern — if a real (non-dev) organization exists in the database, it uses that org and its first member as the owner. This ensures seed data is visible through RLS in the admin UI during development:
const realOrgs = await db
.select({ id: organization.id })
.from(organization)
.where(sql`id NOT LIKE 'dev_%'`)
.limit(1);
let orgId = ORG_ACME_ID;
let userId = USER_ALICE_ID;
if (realOrgs[0]) {
orgId = realOrgs[0].id;
// resolve first member of that org...
}
Tier 4: Search Data
Search data loads pre-computed vector embeddings from a JSON fixture file (seeds/fixtures/seed-embeddings.json) into the document_chunk table. This avoids calling the embedding API during seeding.
const fixturePath = resolve(dir, "./fixtures/seed-embeddings.json");
// ...
await db.insert(documentChunk).values(rows).onConflictDoNothing();
If the fixture file doesn't exist, the seed skips gracefully with a message:
Search: skipping — no seed-embeddings.json fixture found.
Run: pnpm tsx scripts/generate-seed-embeddings.ts
Uses ON CONFLICT DO NOTHING for idempotence.
Commands
| Command | What It Does |
|---|---|
pnpm db:seed | Run all seed tiers (reference + dev/research/search if not production) |
pnpm db:reset | Run db:migrate then db:seed -- full reset from scratch |
pnpm db:seed-research | Run only the research seed (separate entry point) |
Seed Data in CI
CI Quality Job (Ephemeral Database)
The quality job runs pnpm db:migrate against an ephemeral PostgreSQL container but does not run seeds. RLS tests use their own test data setup.
CI migrate-prod Job (Production)
After migrations are applied to production:
NODE_ENV=production pnpm db:seed
Only reference data runs. Development users and test data are never seeded into production.
File Layout
packages/db/src/
seed.ts -- Orchestrator (entry point)
seed-research.ts -- Standalone research seed entry point
seeds/
reference.ts -- Tier 1: calls ai-models.ts
ai-models.ts -- AI model registry + pricing
development.ts -- Tier 2: test users, orgs, memberships
research.ts -- Tier 3: research plans, artifacts
search.ts -- Tier 4: document chunks with embeddings
fixtures/
seed-embeddings.json -- Pre-computed embedding vectors
ID Conventions
| Context | Prefix | Example |
|---|---|---|
| Dev users | dev_user_ | dev_user_alice_001 |
| Dev orgs | dev_org_ | dev_org_acme_001 |
| Dev memberships | dev_member_ | dev_member_001 |
| Seed plans | seed_plan_ | seed_plan_competitor_analysis_001 |
| Seed steps | seed_step_ | seed_step_1a_search |
| Seed artifacts | seed_artifact_ | seed_artifact_analysis_001 |
| Test data (in tests) | test_ | test_user_001 |
The prefixes make it easy to identify and clean up seed data:
DELETE FROM research_plan WHERE id LIKE 'seed_%';
Related
- Schema Design — Reference Data -- conventions for reference/lookup tables
- Development Workflow -- when to run seeds during development
- CI Deployment -- how seeds run in production