Trovella Wiki

Seed Data

The tiered seeding system for reference data, development data, research fixtures, and search embeddings.

Trovella uses a tiered seeding system that controls which data runs in which environment. The seed orchestrator (packages/db/src/seed.ts) calls each tier in order, gating development-only data behind a NODE_ENV check.

Seed Tiers

TierFileEnvironmentsIdempotent?Purpose
Referenceseeds/reference.tsAll (dev, staging, prod)YesLookup data the app depends on
Developmentseeds/development.tsDev, staging onlyNo (truncates first)Test users, orgs, memberships
Researchseeds/research.tsDev, staging onlyDependsResearch plans, steps, artifacts for testing
Searchseeds/search.tsDev, staging onlyYesDocument chunks with pre-computed embeddings

Orchestrator

The seed entry point is packages/db/src/seed.ts:

import { db } from "./client";
import { seedDevelopmentData } from "./seeds/development";
import { seedReferenceData } from "./seeds/reference";
import { seedResearchData } from "./seeds/research";
import { seedSearchData } from "./seeds/search";

async function main() {
  await seedReferenceData(db);

  if (process.env["NODE_ENV"] !== "production") {
    await seedDevelopmentData(db);
    await seedResearchData(db);
    await seedSearchData(db);
  }
}

In production (NODE_ENV=production), only seedReferenceData runs. This is enforced in the CI migrate-prod job which sets NODE_ENV=production explicitly.

Tier 1: Reference Data

Reference data is lookup/registry data that the application requires to function. It must be idempotent -- safe to run multiple times without duplication.

Currently includes:

  • AI model registry (seeds/ai-models.ts): 4 models (Claude Opus 4.6, Sonnet 4.6, Haiku 4.5, Gemini Embedding 2) with pricing records. Uses ON CONFLICT DO NOTHING for idempotence.
await db.insert(aiModel).values([...]).onConflictDoNothing();
await db.insert(aiModelPricing).values([...]).onConflictDoNothing();

Adding New Reference Data

  1. Create a new seed function in packages/db/src/seeds/
  2. Make it idempotent (use ON CONFLICT DO NOTHING or upsert patterns)
  3. Call it from seedReferenceData() in seeds/reference.ts
  4. Test: pnpm db:seed should be safe to run repeatedly without errors or duplicates

Tier 2: Development Data

Development data creates a predictable local environment for manual testing and RLS integration tests. It uses fixed, deterministic IDs so tests can reference them directly.

The seed truncates all auth-related tables first, then inserts fresh data:

// Fixed IDs for deterministic dev data
export const USER_ALICE_ID = "dev_user_alice_001";
export const USER_BOB_ID = "dev_user_bob_002";
export const ORG_ACME_ID = "dev_org_acme_001";
export const ORG_ALICE_PERSONAL_ID = "dev_org_alice_personal_002";

Creates:

  • 2 users: Alice (alice@dev.trovella.com) and Bob (bob@dev.trovella.com)
  • 2 organizations: Acme Corp (company type) and Alice's Space (personal type)
  • 3 memberships: Alice owns both orgs, Bob is a member of Acme Corp

These IDs are imported by other seed tiers and by RLS integration tests.

Warning: Truncation

Development seeding runs TRUNCATE CASCADE on auth tables before inserting. This destroys any manually created data in local development. Use pnpm db:seed only when you want a clean slate.

Tier 3: Research Data

Research data creates sample research plans, steps, and artifacts for testing the research engine. It depends on the development data IDs (Alice, Acme Corp).

Creates:

  • 3 research plans: completed deep dive, in-progress execution, and a failed plan
  • 12 plan steps across the 3 plans, in various statuses
  • 4 skill executions: deep research, scan, research routing, and a failed execution
  • 5 research artifacts: analysis, source list, synthesis, finding, and comparison
  • Audit logs and MCP tool call logs for the completed plan

The research seed uses an adaptive org selection pattern — if a real (non-dev) organization exists in the database, it uses that org and its first member as the owner. This ensures seed data is visible through RLS in the admin UI during development:

const realOrgs = await db
  .select({ id: organization.id })
  .from(organization)
  .where(sql`id NOT LIKE 'dev_%'`)
  .limit(1);

let orgId = ORG_ACME_ID;
let userId = USER_ALICE_ID;

if (realOrgs[0]) {
  orgId = realOrgs[0].id;
  // resolve first member of that org...
}

Tier 4: Search Data

Search data loads pre-computed vector embeddings from a JSON fixture file (seeds/fixtures/seed-embeddings.json) into the document_chunk table. This avoids calling the embedding API during seeding.

const fixturePath = resolve(dir, "./fixtures/seed-embeddings.json");
// ...
await db.insert(documentChunk).values(rows).onConflictDoNothing();

If the fixture file doesn't exist, the seed skips gracefully with a message:

Search: skipping — no seed-embeddings.json fixture found.
Run: pnpm tsx scripts/generate-seed-embeddings.ts

Uses ON CONFLICT DO NOTHING for idempotence.

Commands

CommandWhat It Does
pnpm db:seedRun all seed tiers (reference + dev/research/search if not production)
pnpm db:resetRun db:migrate then db:seed -- full reset from scratch
pnpm db:seed-researchRun only the research seed (separate entry point)

Seed Data in CI

CI Quality Job (Ephemeral Database)

The quality job runs pnpm db:migrate against an ephemeral PostgreSQL container but does not run seeds. RLS tests use their own test data setup.

CI migrate-prod Job (Production)

After migrations are applied to production:

NODE_ENV=production pnpm db:seed

Only reference data runs. Development users and test data are never seeded into production.

File Layout

packages/db/src/
  seed.ts                          -- Orchestrator (entry point)
  seed-research.ts                 -- Standalone research seed entry point
  seeds/
    reference.ts                   -- Tier 1: calls ai-models.ts
    ai-models.ts                   -- AI model registry + pricing
    development.ts                 -- Tier 2: test users, orgs, memberships
    research.ts                    -- Tier 3: research plans, artifacts
    search.ts                      -- Tier 4: document chunks with embeddings
    fixtures/
      seed-embeddings.json         -- Pre-computed embedding vectors

ID Conventions

ContextPrefixExample
Dev usersdev_user_dev_user_alice_001
Dev orgsdev_org_dev_org_acme_001
Dev membershipsdev_member_dev_member_001
Seed plansseed_plan_seed_plan_competitor_analysis_001
Seed stepsseed_step_seed_step_1a_search
Seed artifactsseed_artifact_seed_artifact_analysis_001
Test data (in tests)test_test_user_001

The prefixes make it easy to identify and clean up seed data:

DELETE FROM research_plan WHERE id LIKE 'seed_%';

On this page