Embedding Generation

Gemini Embedding 2 generates 1536-dimensional vectors for document chunks, stored as halfvec in pgvector.

The embedding step converts text into 1536-dimensional vectors that capture semantic meaning. These vectors are stored in pgvector for cosine similarity search and enable the semantic leg of the hybrid search pipeline.

Model Configuration

Setting	Value
Model	`gemini-embedding-2-preview` (API), `gemini-embedding-002` (internal tracking)
Output dimensions	1536 (via MRL truncation from the native 3072)
Task type (indexing)	`RETRIEVAL_DOCUMENT`
Task type (querying)	`RETRIEVAL_QUERY`
Feature tag (indexing)	`hybrid-search-index`
Feature tag (querying)	`hybrid-search-query`

The model supports Matryoshka Representation Learning (MRL), meaning embeddings can be truncated to 768, 1536, or 3072 dimensions. Trovella uses 1536 dimensions as a balance between recall quality and storage efficiency. Vectors are stored as halfvec(1536) in PostgreSQL -- half-precision floats that use 2 bytes per dimension instead of 4, saving ~50% memory with negligible recall loss on normalized embeddings.

Task Types

Gemini embedding uses asymmetric task types -- the model is told whether it's embedding a document or a query, and optimizes the vector accordingly:

RETRIEVAL_DOCUMENT -- used during indexing. The embedding is optimized for being found by queries.
RETRIEVAL_QUERY -- used during search. The embedding is optimized for finding relevant documents.

This asymmetry is important: a document chunk embedded with RETRIEVAL_DOCUMENT will match better against a query embedded with RETRIEVAL_QUERY than if both used the same task type.

Core Function

The embedContent function in packages/ai/src/embedding.ts is the primary embedding API:

async function embedContent(ctx: AIContext, opts: EmbedOptions): Promise<EmbedResult>;

It accepts multiple texts in a single call (batched embedding) and returns one vector per input. The indexing pipeline passes all chunks from a single document in one call, which is more efficient than per-chunk calls.

Input

interface EmbedOptions {
  contents: string[]; // Texts to embed (one per chunk)
  feature: string; // Feature tag for usage tracking
  taskType?: EmbeddingTaskType; // Default: RETRIEVAL_DOCUMENT
  title?: string; // Document title (used with RETRIEVAL_DOCUMENT)
  outputDimensionality?: number; // 768, 1536, or 3072
  correlationId?: string; // Links to other AI calls in the same run
}

Output

interface EmbedResult {
  embeddings: number[][]; // One vector per input text
  usageId: string; // ID of the ai_usage record
  correlationId?: string; // Passed through from input
}

Convenience Helpers

Two shorthand functions wrap embedContent for common use cases:

embedDocument(ctx, text, feature, correlationId?) -- Embeds a single text with RETRIEVAL_DOCUMENT task type. Used by the indexing pipeline.
embedQuery(ctx, query, feature, correlationId?) -- Embeds a single text with RETRIEVAL_QUERY task type. Used by the search query path.

Usage in the Pipeline

The embedding step runs as Step 3 of the index-content Inngest function:

// From apps/web/src/inngest/functions/index-content.ts
const embeddings: number[][] = await step.run("embed-chunks", async () => {
  const texts = contextualChunks.map((c) => c.embeddedText);
  const result = await embedContent(ctx, {
    contents: texts,
    feature: "hybrid-search-index",
    taskType: "RETRIEVAL_DOCUMENT",
    title: data.title,
    outputDimensionality: 1536,
    correlationId,
  });
  return result.embeddings;
});

All chunks from a single document are embedded in one API call. The title parameter is passed for RETRIEVAL_DOCUMENT task type, which helps the model understand the document's subject.

Cost Tracking

Every embedding call is recorded in ai_usage via the recordEmbedding function. Since the Gemini embedding API does not always return token counts, the system estimates input tokens at ~1 token per 4 characters. The recorded data includes:

Input token estimate
Number of embeddings generated
Latency in milliseconds
Feature tag and model ID
Correlation ID linking to other AI calls in the same indexing run

Client Singleton

The Gemini client is a module-level singleton initialized on first use from the GOOGLE_AI_API_KEY environment variable. A resetGeminiClient() function exists for testing.

Pre-Computed Seed Embeddings

To avoid calling the Gemini API every time seed data is loaded (which would incur costs on every pnpm db:reset), seed embeddings are pre-computed once and committed to the repository at packages/db/src/seeds/fixtures/seed-embeddings.json. The Inngest indexing trigger no-ops during seeding.

Key File

packages/ai/src/embedding.ts -- the embedContent, embedDocument, and embedQuery functions, EMBEDDING_MODEL constant, and EmbedOptions/EmbedResult types.