Reasoning Overview
Architecture of the @repo/ai wrapper package -- the single entry point for all LLM operations, with automatic usage tracking and multi-provider support.
Role in the System
The @repo/ai package is the sole entry point for all AI/LLM operations in the monorepo. No other package may import @anthropic-ai/sdk or @google/genai directly -- this is enforced by ESLint no-restricted-imports rules defined in packages/config-eslint/restrictions.js and validated by dependency-cruiser.
The package provides three categories of AI operations:
- Chat/Reasoning (Anthropic Claude) -- non-streaming completions, streaming completions, and async batch processing
- Embeddings (Google Gemini) -- vector generation for hybrid search indexing and retrieval
- Chunking & Contextual Retrieval -- text splitting and LLM-generated context prefixes for improved search quality
Every call, regardless of category, automatically records usage to the database for cost attribution and debugging.
Package Position in the Layer Hierarchy
@repo/ai sits in the infra layer:
app (apps/web)
-> service (@repo/api, @repo/mcp)
-> infra (@repo/ai, @repo/auth, @repo/cache, @repo/logger, @repo/search)
-> core (@repo/db)
-> leaf (@repo/utils, @repo/validators, @repo/design-tokens)
The package depends on @repo/db (for writing usage records) and @repo/logger (for structured logging). It exports the AIHelper interface consumed by @repo/api (tRPC routers) and @repo/mcp (MCP tool handlers).
Entry Points
There are two ways to access AI operations, depending on your calling context:
Inside tRPC Routers: ctx.ai
The tRPC middleware creates an AIHelper instance pre-bound with the current user's organizationId, userId, and logger. All methods on ctx.ai automatically attribute usage to the correct tenant and user.
// In a tRPC router
export const chatRouter = router({
sendMessage: authorizedProcedure
.input(z.object({ message: z.string() }))
.mutation(async ({ ctx, input }) => {
const response = await ctx.ai.chat({
messages: [{ role: "user", content: input.message }],
system: "You are a research assistant.",
feature: "research-chat",
});
return response;
}),
});
Outside tRPC: createAIHelper()
MCP tools, Inngest jobs, and other non-tRPC contexts create their own helper:
import { createAIHelper } from "@repo/ai";
import { createJobLogger } from "@repo/logger";
const logger = createJobLogger("index-content", { jobId });
const ai = createAIHelper(organizationId, userId, logger);
const response = await ai.chat({
messages: [{ role: "user", content: chunk }],
feature: "mcp-extract-data",
});
AIHelper API Surface
The bound AIHelper interface exposes these methods:
| Method | Returns | Description |
|---|---|---|
chat(opts) | Promise<Message> | Non-streaming completion. Returns the full Anthropic Message. |
stream(opts) | StreamResult | Streaming completion. Returns { events, finalMessage }. |
submitBatch(opts) | Promise<BatchResult> | Submit requests for async processing at 50% cost. |
pollBatch(batchDbId, anthropicBatchId, feature) | Promise<{ status, results? }> | Check batch status and process completed results. |
cancelBatch(batchDbId, anthropicBatchId) | Promise<void> | Cancel a running batch. |
embed(opts) | Promise<EmbedResult> | Generate embeddings via Gemini. Configurable task type and dimensions. |
embedDocument(text, feature, correlationId?) | Promise<{ embedding, usageId }> | Shorthand: embed for indexing (RETRIEVAL_DOCUMENT). |
embedQuery(query, feature, correlationId?) | Promise<{ embedding, usageId }> | Shorthand: embed for search (RETRIEVAL_QUERY). |
Module Structure
The package is organized into focused modules:
| Module | Purpose |
|---|---|
context.ts | AIHelper interface and createAIHelper() factory |
client.ts | Singleton Anthropic SDK client |
config.ts | Model IDs, default token limits |
types.ts | Shared types: ChatOptions, AIContext, StreamResult, BatchOptions |
request.ts | buildRequestParams() -- translates ChatOptions to Anthropic API params |
completion.ts | chatCompletion() -- non-streaming calls |
stream.ts | streamChat() -- streaming with async iterable events |
batch.ts | submitBatch(), pollBatchResults(), cancelBatch() |
embedding.ts | Gemini embedding client, embedContent(), embedDocument(), embedQuery() |
recording.ts | recordCompletion(), recordFailure(), recordEmbedding() -- database writes |
extraction.ts | Helpers for extracting thinking blocks, tool use details, server tool usage |
errors.ts | AIError class and mapAnthropicError() -- structured error mapping |
chunking.ts | chunkText() -- recursive character splitter |
contextual-retrieval.ts | generateChunkContext() -- Haiku-powered context prefix generation |
utils.ts | toStoredParams() -- extracts storable fields from API params |
SDK Boundary Rule
The SDK boundary is a hard architectural constraint:
- Only
@repo/aimay import@anthropic-ai/sdkand@google/genai - All other packages import from
@repo/aiexclusively - Anthropic SDK types (like
Message) are re-exported through@repo/ai's index for downstream use - Violation detection: ESLint
no-restricted-importsfor editor feedback,dependency-cruiserfor CI gate
This ensures every AI call flows through the wrapper, which means usage tracking cannot be accidentally bypassed.
Key Gotchas
-
Streaming requires consuming both outputs.
stream()returns{ events, finalMessage }. Usage is only recorded whenfinalMessageresolves. If you only iterateeventswithout awaitingfinalMessage, the usage record is never written. -
Batch is fire-and-forget.
submitBatch()returns immediately with a batch ID. You must callpollBatch()separately (typically from an Inngest polling job) to check for completion and process results. -
Embedding token counts are estimates. Gemini does not always return token counts in embedding responses. The package estimates ~1 token per 4 characters, which matches typical English text tokenization.
-
featureis required on every call. The TypeScript signature enforces this at compile time. The string is used for cost attribution (e.g.,"research-chat","hybrid-search-contextual-retrieval","mcp-extract-data"). -
Correlation IDs link multi-step workflows. When a pipeline needs Haiku context generation followed by Gemini embedding, pass the same
correlationIdto both calls. This enables cost analysis across related operations.
What to Read Next
- AI Integration Decision (ADR-007) -- why Claude + Gemini, why a thin wrapper, why automatic tracking
- Model Selection -- which model to use for what, configuration defaults, and override patterns
- Prompt Patterns --
ChatOptionsin depth, thinking, caching, tools, structured output - Usage Tracking -- how every call is recorded, cost calculation, and the recording pipeline
- Batch Processing -- async batch lifecycle, database records, and polling patterns
Resources and Events
MCP resources exposed by the server and Inngest events emitted by tools for background processing.
ADR-007: AI Integration -- Claude API + Gemini Embeddings
Decision record for choosing a thin @repo/ai wrapper around Anthropic Claude (chat) and Gemini Embedding 2 (vectors) with automatic usage tracking.