Reasoning Overview

Architecture of the @repo/ai wrapper package -- the single entry point for all LLM operations, with automatic usage tracking and multi-provider support.

Role in the System

The @repo/ai package is the sole entry point for all AI/LLM operations in the monorepo. No other package may import @anthropic-ai/sdk or @google/genai directly -- this is enforced by ESLint no-restricted-imports rules defined in packages/config-eslint/restrictions.js and validated by dependency-cruiser.

The package provides three categories of AI operations:

Chat/Reasoning (Anthropic Claude) -- non-streaming completions, streaming completions, and async batch processing
Embeddings (Google Gemini) -- vector generation for hybrid search indexing and retrieval
Chunking & Contextual Retrieval -- text splitting and LLM-generated context prefixes for improved search quality

Every call, regardless of category, automatically records usage to the database for cost attribution and debugging.

Package Position in the Layer Hierarchy

@repo/ai sits in the infra layer:

app (apps/web)
  -> service (@repo/api, @repo/mcp)
    -> infra (@repo/ai, @repo/auth, @repo/cache, @repo/logger, @repo/search)
      -> core (@repo/db)
        -> leaf (@repo/utils, @repo/validators, @repo/design-tokens)

The package depends on @repo/db (for writing usage records) and @repo/logger (for structured logging). It exports the AIHelper interface consumed by @repo/api (tRPC routers) and @repo/mcp (MCP tool handlers).

Entry Points

There are two ways to access AI operations, depending on your calling context:

Inside tRPC Routers: `ctx.ai`

The tRPC middleware creates an AIHelper instance pre-bound with the current user's organizationId, userId, and logger. All methods on ctx.ai automatically attribute usage to the correct tenant and user.

// In a tRPC router
export const chatRouter = router({
  sendMessage: authorizedProcedure
    .input(z.object({ message: z.string() }))
    .mutation(async ({ ctx, input }) => {
      const response = await ctx.ai.chat({
        messages: [{ role: "user", content: input.message }],
        system: "You are a research assistant.",
        feature: "research-chat",
      });
      return response;
    }),
});

Outside tRPC: `createAIHelper()`

MCP tools, Inngest jobs, and other non-tRPC contexts create their own helper:

import { createAIHelper } from "@repo/ai";
import { createJobLogger } from "@repo/logger";

const logger = createJobLogger("index-content", { jobId });
const ai = createAIHelper(organizationId, userId, logger);

const response = await ai.chat({
  messages: [{ role: "user", content: chunk }],
  feature: "mcp-extract-data",
});

AIHelper API Surface

The bound AIHelper interface exposes these methods:

Method	Returns	Description
`chat(opts)`	`Promise<Message>`	Non-streaming completion. Returns the full Anthropic `Message`.
`stream(opts)`	`StreamResult`	Streaming completion. Returns `{ events, finalMessage }`.
`submitBatch(opts)`	`Promise<BatchResult>`	Submit requests for async processing at 50% cost.
`pollBatch(batchDbId, anthropicBatchId, feature)`	`Promise<{ status, results? }>`	Check batch status and process completed results.
`cancelBatch(batchDbId, anthropicBatchId)`	`Promise<void>`	Cancel a running batch.
`embed(opts)`	`Promise<EmbedResult>`	Generate embeddings via Gemini. Configurable task type and dimensions.
`embedDocument(text, feature, correlationId?)`	`Promise<{ embedding, usageId }>`	Shorthand: embed for indexing (`RETRIEVAL_DOCUMENT`).
`embedQuery(query, feature, correlationId?)`	`Promise<{ embedding, usageId }>`	Shorthand: embed for search (`RETRIEVAL_QUERY`).

Module Structure

The package is organized into focused modules:

Module	Purpose
`context.ts`	`AIHelper` interface and `createAIHelper()` factory
`client.ts`	Singleton Anthropic SDK client
`config.ts`	Model IDs, default token limits
`types.ts`	Shared types: `ChatOptions`, `AIContext`, `StreamResult`, `BatchOptions`
`request.ts`	`buildRequestParams()` -- translates `ChatOptions` to Anthropic API params
`completion.ts`	`chatCompletion()` -- non-streaming calls
`stream.ts`	`streamChat()` -- streaming with async iterable events
`batch.ts`	`submitBatch()`, `pollBatchResults()`, `cancelBatch()`
`embedding.ts`	Gemini embedding client, `embedContent()`, `embedDocument()`, `embedQuery()`
`recording.ts`	`recordCompletion()`, `recordFailure()`, `recordEmbedding()` -- database writes
`extraction.ts`	Helpers for extracting thinking blocks, tool use details, server tool usage
`errors.ts`	`AIError` class and `mapAnthropicError()` -- structured error mapping
`chunking.ts`	`chunkText()` -- recursive character splitter
`contextual-retrieval.ts`	`generateChunkContext()` -- Haiku-powered context prefix generation
`utils.ts`	`toStoredParams()` -- extracts storable fields from API params

SDK Boundary Rule

The SDK boundary is a hard architectural constraint:

Only @repo/ai may import @anthropic-ai/sdk and @google/genai
All other packages import from @repo/ai exclusively
Anthropic SDK types (like Message) are re-exported through @repo/ai's index for downstream use
Violation detection: ESLint no-restricted-imports for editor feedback, dependency-cruiser for CI gate

This ensures every AI call flows through the wrapper, which means usage tracking cannot be accidentally bypassed.

Key Gotchas

Streaming requires consuming both outputs. stream() returns { events, finalMessage }. Usage is only recorded when finalMessage resolves. If you only iterate events without awaiting finalMessage, the usage record is never written.
Batch is fire-and-forget. submitBatch() returns immediately with a batch ID. You must call pollBatch() separately (typically from an Inngest polling job) to check for completion and process results.
Embedding token counts are estimates. Gemini does not always return token counts in embedding responses. The package estimates ~1 token per 4 characters, which matches typical English text tokenization.
feature is required on every call. The TypeScript signature enforces this at compile time. The string is used for cost attribution (e.g., "research-chat", "hybrid-search-contextual-retrieval", "mcp-extract-data").
Correlation IDs link multi-step workflows. When a pipeline needs Haiku context generation followed by Gemini embedding, pass the same correlationId to both calls. This enables cost analysis across related operations.

Reasoning Overview

Role in the System

Package Position in the Layer Hierarchy

Entry Points

Inside tRPC Routers: `ctx.ai`

Outside tRPC: `createAIHelper()`

AIHelper API Surface

Module Structure

SDK Boundary Rule

Key Gotchas

What to Read Next

On this page

Reasoning Overview

Role in the System

Package Position in the Layer Hierarchy

Entry Points

Inside tRPC Routers: ctx.ai

Outside tRPC: createAIHelper()

AIHelper API Surface

Module Structure

SDK Boundary Rule

Key Gotchas

What to Read Next

On this page

Inside tRPC Routers: `ctx.ai`

Outside tRPC: `createAIHelper()`