Embedding Generation
Gemini Embedding 2 generates 1536-dimensional vectors for document chunks, stored as halfvec in pgvector.
The embedding step converts text into 1536-dimensional vectors that capture semantic meaning. These vectors are stored in pgvector for cosine similarity search and enable the semantic leg of the hybrid search pipeline.
Model Configuration
| Setting | Value |
|---|---|
| Model | gemini-embedding-2-preview (API), gemini-embedding-002 (internal tracking) |
| Output dimensions | 1536 (via MRL truncation from the native 3072) |
| Task type (indexing) | RETRIEVAL_DOCUMENT |
| Task type (querying) | RETRIEVAL_QUERY |
| Feature tag (indexing) | hybrid-search-index |
| Feature tag (querying) | hybrid-search-query |
The model supports Matryoshka Representation Learning (MRL), meaning embeddings can be truncated to 768, 1536, or 3072 dimensions. Trovella uses 1536 dimensions as a balance between recall quality and storage efficiency. Vectors are stored as halfvec(1536) in PostgreSQL -- half-precision floats that use 2 bytes per dimension instead of 4, saving ~50% memory with negligible recall loss on normalized embeddings.
Task Types
Gemini embedding uses asymmetric task types -- the model is told whether it's embedding a document or a query, and optimizes the vector accordingly:
RETRIEVAL_DOCUMENT-- used during indexing. The embedding is optimized for being found by queries.RETRIEVAL_QUERY-- used during search. The embedding is optimized for finding relevant documents.
This asymmetry is important: a document chunk embedded with RETRIEVAL_DOCUMENT will match better against a query embedded with RETRIEVAL_QUERY than if both used the same task type.
Core Function
The embedContent function in packages/ai/src/embedding.ts is the primary embedding API:
async function embedContent(ctx: AIContext, opts: EmbedOptions): Promise<EmbedResult>;
It accepts multiple texts in a single call (batched embedding) and returns one vector per input. The indexing pipeline passes all chunks from a single document in one call, which is more efficient than per-chunk calls.
Input
interface EmbedOptions {
contents: string[]; // Texts to embed (one per chunk)
feature: string; // Feature tag for usage tracking
taskType?: EmbeddingTaskType; // Default: RETRIEVAL_DOCUMENT
title?: string; // Document title (used with RETRIEVAL_DOCUMENT)
outputDimensionality?: number; // 768, 1536, or 3072
correlationId?: string; // Links to other AI calls in the same run
}
Output
interface EmbedResult {
embeddings: number[][]; // One vector per input text
usageId: string; // ID of the ai_usage record
correlationId?: string; // Passed through from input
}
Convenience Helpers
Two shorthand functions wrap embedContent for common use cases:
embedDocument(ctx, text, feature, correlationId?)-- Embeds a single text withRETRIEVAL_DOCUMENTtask type. Used by the indexing pipeline.embedQuery(ctx, query, feature, correlationId?)-- Embeds a single text withRETRIEVAL_QUERYtask type. Used by the search query path.
Usage in the Pipeline
The embedding step runs as Step 3 of the index-content Inngest function:
// From apps/web/src/inngest/functions/index-content.ts
const embeddings: number[][] = await step.run("embed-chunks", async () => {
const texts = contextualChunks.map((c) => c.embeddedText);
const result = await embedContent(ctx, {
contents: texts,
feature: "hybrid-search-index",
taskType: "RETRIEVAL_DOCUMENT",
title: data.title,
outputDimensionality: 1536,
correlationId,
});
return result.embeddings;
});
All chunks from a single document are embedded in one API call. The title parameter is passed for RETRIEVAL_DOCUMENT task type, which helps the model understand the document's subject.
Cost Tracking
Every embedding call is recorded in ai_usage via the recordEmbedding function. Since the Gemini embedding API does not always return token counts, the system estimates input tokens at ~1 token per 4 characters. The recorded data includes:
- Input token estimate
- Number of embeddings generated
- Latency in milliseconds
- Feature tag and model ID
- Correlation ID linking to other AI calls in the same indexing run
Client Singleton
The Gemini client is a module-level singleton initialized on first use from the GOOGLE_AI_API_KEY environment variable. A resetGeminiClient() function exists for testing.
Pre-Computed Seed Embeddings
To avoid calling the Gemini API every time seed data is loaded (which would incur costs on every pnpm db:reset), seed embeddings are pre-computed once and committed to the repository at packages/db/src/seeds/fixtures/seed-embeddings.json. The Inngest indexing trigger no-ops during seeding.
Key File
packages/ai/src/embedding.ts -- the embedContent, embedDocument, and embedQuery functions, EMBEDDING_MODEL constant, and EmbedOptions/EmbedResult types.