Contextual Retrieval

Claude Haiku generates a 2-3 sentence context prefix per chunk before embedding, improving retrieval accuracy by ~49%.

Contextual retrieval is Anthropic's technique for improving chunk-based search. The problem: when a document is split into chunks, each chunk loses the context of the full document. A chunk saying "the company's revenue grew 15%" is hard to match to "Apple financial performance" without knowing which company the chunk refers to.

The solution: before embedding each chunk, an LLM generates a short context prefix that situates the chunk within its source document. The prefix is prepended to the original text, and the combined text is what gets embedded and indexed. Anthropic's research shows this reduces retrieval failures by approximately 49%.

How It Works

For each chunk in a multi-chunk document, Claude Haiku receives the full document and the specific chunk, then generates a 2-3 sentence context prefix:

Full document  ─┐
                ├─→  Claude Haiku  ─→  "This chunk is from a Q3 2025 earnings analysis
Specific chunk ─┘                       of Apple Inc. It discusses revenue growth in
                                        the Services segment."

The context prefix is prepended to the original chunk text with a double newline separator:

embeddedText = contextPrefix + "\n\n" + originalText

This combined embeddedText is what gets:

Embedded by Gemini (stored as a vector in pgvector)
Indexed in Typesense (used for BM25 keyword matching)

The original text and context prefix are stored separately in the document_chunk table for display and debugging.

System Prompt

The Haiku call uses a focused system prompt:

You are an expert at providing concise context for document chunks used in
search retrieval.

Given a full document and a specific chunk from that document, write a short
(2-3 sentence) context that situates the chunk within the overall document.
Include key entities, topics, and relationships that would help a search system
match this chunk to relevant queries.

Respond with ONLY the context text -- no labels, prefixes, or formatting.

The user message provides the full document in <document> tags and the chunk in <chunk> tags.

Model and Configuration

Setting	Value
Model	`claude-haiku-4-5`
Max tokens	200
Thinking	Disabled
Feature tag	`hybrid-search-contextual-retrieval`
Document truncation	First 150,000 characters (~37k tokens)

The 150k character truncation ensures the full document fits within Haiku's context window. At ~4 characters per token, this is approximately 37,500 tokens -- well within the 200k token limit. Virtually all research documents fall within this limit.

Single-Chunk Optimization

When a document produces only one chunk (content under ~2048 characters), contextual retrieval is skipped entirely. The rationale: a single chunk already contains the full document, so there is no lost context to restore. This saves one Haiku API call per short document.

// From index-content.ts, Step 2
if (chunks.length === 1) {
  results.push({
    index: chunk.index,
    originalText: chunk.text,
    contextPrefix: "",
    embeddedText: chunk.text,
  });
  continue;
}

Cost Tracking

Each Haiku call is automatically recorded in ai_usage by the chatCompletion function in @repo/ai. The correlationId links the Haiku call to the corresponding Gemini embedding call for the same indexing run, enabling per-document cost analysis.

The feature tag hybrid-search-contextual-retrieval allows filtering AI usage specifically for contextual retrieval costs, separate from other Haiku usage in the system.

Key File

packages/ai/src/contextual-retrieval.ts -- the generateChunkContext function. Takes an AIContext, the full document text, the chunk text, and an optional correlationId. Returns { context: string, usageId: string }.