Trovella Wiki

Embedding Generation

Gemini Embedding 2 generates 1536-dimensional vectors for document chunks, stored as halfvec in pgvector.

The embedding step converts text into 1536-dimensional vectors that capture semantic meaning. These vectors are stored in pgvector for cosine similarity search and enable the semantic leg of the hybrid search pipeline.

Model Configuration

SettingValue
Modelgemini-embedding-2-preview (API), gemini-embedding-002 (internal tracking)
Output dimensions1536 (via MRL truncation from the native 3072)
Task type (indexing)RETRIEVAL_DOCUMENT
Task type (querying)RETRIEVAL_QUERY
Feature tag (indexing)hybrid-search-index
Feature tag (querying)hybrid-search-query

The model supports Matryoshka Representation Learning (MRL), meaning embeddings can be truncated to 768, 1536, or 3072 dimensions. Trovella uses 1536 dimensions as a balance between recall quality and storage efficiency. Vectors are stored as halfvec(1536) in PostgreSQL -- half-precision floats that use 2 bytes per dimension instead of 4, saving ~50% memory with negligible recall loss on normalized embeddings.

Task Types

Gemini embedding uses asymmetric task types -- the model is told whether it's embedding a document or a query, and optimizes the vector accordingly:

  • RETRIEVAL_DOCUMENT -- used during indexing. The embedding is optimized for being found by queries.
  • RETRIEVAL_QUERY -- used during search. The embedding is optimized for finding relevant documents.

This asymmetry is important: a document chunk embedded with RETRIEVAL_DOCUMENT will match better against a query embedded with RETRIEVAL_QUERY than if both used the same task type.

Core Function

The embedContent function in packages/ai/src/embedding.ts is the primary embedding API:

async function embedContent(ctx: AIContext, opts: EmbedOptions): Promise<EmbedResult>;

It accepts multiple texts in a single call (batched embedding) and returns one vector per input. The indexing pipeline passes all chunks from a single document in one call, which is more efficient than per-chunk calls.

Input

interface EmbedOptions {
  contents: string[]; // Texts to embed (one per chunk)
  feature: string; // Feature tag for usage tracking
  taskType?: EmbeddingTaskType; // Default: RETRIEVAL_DOCUMENT
  title?: string; // Document title (used with RETRIEVAL_DOCUMENT)
  outputDimensionality?: number; // 768, 1536, or 3072
  correlationId?: string; // Links to other AI calls in the same run
}

Output

interface EmbedResult {
  embeddings: number[][]; // One vector per input text
  usageId: string; // ID of the ai_usage record
  correlationId?: string; // Passed through from input
}

Convenience Helpers

Two shorthand functions wrap embedContent for common use cases:

  • embedDocument(ctx, text, feature, correlationId?) -- Embeds a single text with RETRIEVAL_DOCUMENT task type. Used by the indexing pipeline.
  • embedQuery(ctx, query, feature, correlationId?) -- Embeds a single text with RETRIEVAL_QUERY task type. Used by the search query path.

Usage in the Pipeline

The embedding step runs as Step 3 of the index-content Inngest function:

// From apps/web/src/inngest/functions/index-content.ts
const embeddings: number[][] = await step.run("embed-chunks", async () => {
  const texts = contextualChunks.map((c) => c.embeddedText);
  const result = await embedContent(ctx, {
    contents: texts,
    feature: "hybrid-search-index",
    taskType: "RETRIEVAL_DOCUMENT",
    title: data.title,
    outputDimensionality: 1536,
    correlationId,
  });
  return result.embeddings;
});

All chunks from a single document are embedded in one API call. The title parameter is passed for RETRIEVAL_DOCUMENT task type, which helps the model understand the document's subject.

Cost Tracking

Every embedding call is recorded in ai_usage via the recordEmbedding function. Since the Gemini embedding API does not always return token counts, the system estimates input tokens at ~1 token per 4 characters. The recorded data includes:

  • Input token estimate
  • Number of embeddings generated
  • Latency in milliseconds
  • Feature tag and model ID
  • Correlation ID linking to other AI calls in the same indexing run

Client Singleton

The Gemini client is a module-level singleton initialized on first use from the GOOGLE_AI_API_KEY environment variable. A resetGeminiClient() function exists for testing.

Pre-Computed Seed Embeddings

To avoid calling the Gemini API every time seed data is loaded (which would incur costs on every pnpm db:reset), seed embeddings are pre-computed once and committed to the repository at packages/db/src/seeds/fixtures/seed-embeddings.json. The Inngest indexing trigger no-ops during seeding.

Key File

packages/ai/src/embedding.ts -- the embedContent, embedDocument, and embedQuery functions, EMBEDDING_MODEL constant, and EmbedOptions/EmbedResult types.

On this page