Table Conventions

Column types, naming patterns, shared column helpers, indexing strategies, and JSONB typing used across all Trovella tables.

Every table in the schema follows a consistent set of conventions for column types, naming, timestamps, and indexing. These conventions exist to keep the schema predictable for both human developers and AI agents.

Primary Keys

All primary keys are text("id").primaryKey(). This matches Better Auth's ID format (text, not UUID) and avoids ::uuid casting at every join boundary.

export const researchPlan = pgTable("research_plan", {
  id: text("id").primaryKey(),
  // ...
});

IDs are generated by the application layer (Better Auth for auth tables, nanoid or similar for feature tables). The database does not generate IDs.

Shared Column Helpers

Two column spread objects in packages/db/src/schema/base.ts provide standard timestamp and tenant columns.

`tenantColumns`

Used by all tenant-scoped tables. Adds organization_id, created_at, and updated_at:

export const tenantColumns = {
  organizationId: text("organization_id").notNull(),
  createdAt: timestamp("created_at", { withTimezone: true }).defaultNow().notNull(),
  updatedAt: timestamp("updated_at", { withTimezone: true }).defaultNow().notNull(),
} as const;

Usage in a table definition:

import { tenantColumns, tenantPolicies } from "./base";

export const myTable = pgTable(
  "my_table",
  {
    id: text("id").primaryKey(),
    ...tenantColumns,
    name: text("name").notNull(),
  },
  (table) => [...tenantPolicies("my_table", "organization_id")],
).enableRLS();

`baseColumns`

Used by global (non-tenant) tables. Adds only created_at and updated_at:

export const baseColumns = {
  createdAt: timestamp("created_at", { withTimezone: true }).defaultNow().notNull(),
  updatedAt: timestamp("updated_at", { withTimezone: true }).defaultNow().notNull(),
} as const;

When to Use Which

Table type	Spread	RLS	Example
Tenant-scoped feature table	`tenantColumns`	Yes	`researchPlan`, `aiUsage`
Global reference table	`baseColumns`	No	`aiModel`
Better Auth table	Neither (manual)	Varies	`user`, `member`

Better Auth tables (user, session, account, verification, organization) define their own timestamp columns without withTimezone: true because Better Auth's plugin generates the schema. The member and invitation tables are an exception -- they are tenant-scoped and use tenantPolicies() but define their columns manually to match Better Auth's expectations.

Timestamp Conventions

All timestamps use timestamp("field", { withTimezone: true }) with two standard columns:

created_at -- set once at row creation via .defaultNow().notNull()
updated_at -- set at creation via .defaultNow().notNull(), updated by application code (not a database trigger)

Optional temporal columns follow the same { withTimezone: true } pattern:

completedAt: timestamp("completed_at", { withTimezone: true }),
startedAt: timestamp("started_at", { withTimezone: true }),
expiresAt: timestamp("expires_at", { withTimezone: true }),
revokedAt: timestamp("revoked_at", { withTimezone: true }),
lastUsedAt: timestamp("last_used_at", { withTimezone: true }),

There is no automatic updated_at trigger. The application is responsible for updating this column. This is a deliberate choice -- it avoids hidden side effects in transactions and makes update timing explicit in the code.

Column Types by Use Case

Use case	Drizzle type	PostgreSQL type	Notes
Primary key	`text("id")`	`TEXT`	Application-generated
Foreign key	`text("field_id")`	`TEXT`	Always with `.references()`
Short text	`text("name")`	`TEXT`	No `varchar` -- PostgreSQL treats them identically
Long text	`text("content")`	`TEXT`	No length limits
Boolean	`boolean("active")`	`BOOLEAN`
Integer	`integer("count")`	`INTEGER`
Decimal (money)	`numeric("cost", { precision: 10, scale: 6 })`	`NUMERIC(10,6)`	Returns string in JS -- use for financial precision
Decimal (score)	`real("confidence")`	`REAL`	4-byte float, returns native JS number
Date (no time)	`date("effective_from")`	`DATE`	Used for pricing effective dates
Enum	`myEnum("field")`	Custom enum	See Enums
Semi-structured data	`jsonb("data").$type<T>()`	`JSONB`	See below
Embedding vector	`halfvec("embedding")`	`halfvec(1536)`	Custom type, see below
UUID (rare)	`uuid("correlation_id")`	`UUID`	Only for correlation IDs with `.defaultRandom()`

`numeric` vs `real` for Decimal Values

Drizzle's numeric() always returns JavaScript strings because PostgreSQL NUMERIC can exceed Number.MAX_SAFE_INTEGER. This means code working with numeric columns must parse values explicitly.

For financial data (cost tracking), numeric is correct -- precision matters.

For confidence scores and other approximate values, real (4-byte float) returns native JavaScript numbers. The schema switched from numeric to real for confidence columns during early development to avoid unnecessary string parsing.

JSONB Column Typing

JSONB columns use Drizzle's .$type<T>() to provide TypeScript type safety without database-level schema enforcement:

// Typed object
supportedFeatures: jsonb("supported_features")
  .$type<{
    thinking?: boolean;
    vision?: boolean;
    streaming?: boolean;
    batch?: boolean;
    toolUse?: boolean;
    structuredOutputs?: boolean;
  }>()
  .notNull()
  .default({}),

// Typed array
thinkingBlocks: jsonb("thinking_blocks")
  .$type<Array<{ thinking: string; tokens?: number }>>()
  .default([]),

// Flexible record
resultSummary: jsonb("result_summary").$type<Record<string, unknown>>(),

The convention is:

Use a specific type when the shape is well-known and stable (e.g., supportedFeatures)
Use Record<string, unknown> when the shape varies by context (e.g., resultSummary, eventData, metadata)
Use string | unknown[] when the data can be either format (e.g., systemPrompt in aiCallDetails)
Validate JSONB data at the application layer with Zod, not in the database

Vector Columns (pgvector)

The document_chunk table stores embedding vectors using a custom Drizzle type for pgvector's halfvec:

const halfvec = customType<{ data: number[]; driverParam: string }>({
  dataType() {
    return "halfvec(1536)";
  },
  toDriver(value: number[]): string {
    return `[${value.join(",")}]`;
  },
  fromDriver(value: unknown): number[] {
    const str = String(value);
    return str.slice(1, -1).split(",").map(Number);
  },
});

Key design choices:

halfvec instead of vector: Half-precision (2 bytes/dimension) saves ~50% storage and memory with negligible recall loss on normalized embeddings
Dimension 1536: Matches Gemini Embedding 2's output at the MRL-truncated size, balancing cost and quality over the full 3072 dimensions
HNSW indexes (added via raw SQL in migrations): Better recall-latency than IVFFlat without periodic retraining

Naming Conventions

Table names: snake_case, singular noun or noun phrase. Examples: research_plan, plan_step, ai_usage, document_chunk.

Column names: snake_case in SQL, camelCase in TypeScript (Drizzle maps automatically). Examples: organization_id in SQL becomes organizationId in TypeScript.

Index names: tableName_columnName_idx for regular indexes, tableName_columnName_uidx for unique indexes.

index("ai_usage_organizationId_idx").on(table.organizationId),
uniqueIndex("ai_model_apiModelId_uidx").on(table.apiModelId),

Foreign key column names: Match the referenced table name with _id suffix. Examples: user_id, plan_id, model_id, source_artifact_id.

Indexing Strategy

Every tenant-scoped table indexes organization_id as a standalone column. This is critical for RLS performance -- the policy evaluates organization_id = (SELECT current_setting('app.tenant_id')) on every row access, and the index makes that comparison use an index scan instead of a sequential scan.

Beyond the tenant index, each table indexes columns used in:

Foreign key lookups: Every FK column gets an index (e.g., plan_step_planId_idx)
Status filtering: Status/enum columns used in queries (e.g., research_plan_status_idx)
Time-range queries: created_at is indexed on tables queried by time range (e.g., ai_usage_createdAt_idx)
Unique lookups: Unique constraints with unique indexes (e.g., ai_model_apiModelId_uidx)

Composite indexes are used sparingly. The document_chunk table has one for the polymorphic source reference:

index("document_chunk_source_idx").on(table.sourceTable, table.sourceId),