Trovella Wiki

Reasoning Overview

Architecture of the @repo/ai wrapper package -- the single entry point for all LLM operations, with automatic usage tracking and multi-provider support.

Role in the System

The @repo/ai package is the sole entry point for all AI/LLM operations in the monorepo. No other package may import @anthropic-ai/sdk or @google/genai directly -- this is enforced by ESLint no-restricted-imports rules defined in packages/config-eslint/restrictions.js and validated by dependency-cruiser.

The package provides three categories of AI operations:

  1. Chat/Reasoning (Anthropic Claude) -- non-streaming completions, streaming completions, and async batch processing
  2. Embeddings (Google Gemini) -- vector generation for hybrid search indexing and retrieval
  3. Chunking & Contextual Retrieval -- text splitting and LLM-generated context prefixes for improved search quality

Every call, regardless of category, automatically records usage to the database for cost attribution and debugging.

Package Position in the Layer Hierarchy

@repo/ai sits in the infra layer:

app (apps/web)
  -> service (@repo/api, @repo/mcp)
    -> infra (@repo/ai, @repo/auth, @repo/cache, @repo/logger, @repo/search)
      -> core (@repo/db)
        -> leaf (@repo/utils, @repo/validators, @repo/design-tokens)

The package depends on @repo/db (for writing usage records) and @repo/logger (for structured logging). It exports the AIHelper interface consumed by @repo/api (tRPC routers) and @repo/mcp (MCP tool handlers).

Entry Points

There are two ways to access AI operations, depending on your calling context:

Inside tRPC Routers: ctx.ai

The tRPC middleware creates an AIHelper instance pre-bound with the current user's organizationId, userId, and logger. All methods on ctx.ai automatically attribute usage to the correct tenant and user.

// In a tRPC router
export const chatRouter = router({
  sendMessage: authorizedProcedure
    .input(z.object({ message: z.string() }))
    .mutation(async ({ ctx, input }) => {
      const response = await ctx.ai.chat({
        messages: [{ role: "user", content: input.message }],
        system: "You are a research assistant.",
        feature: "research-chat",
      });
      return response;
    }),
});

Outside tRPC: createAIHelper()

MCP tools, Inngest jobs, and other non-tRPC contexts create their own helper:

import { createAIHelper } from "@repo/ai";
import { createJobLogger } from "@repo/logger";

const logger = createJobLogger("index-content", { jobId });
const ai = createAIHelper(organizationId, userId, logger);

const response = await ai.chat({
  messages: [{ role: "user", content: chunk }],
  feature: "mcp-extract-data",
});

AIHelper API Surface

The bound AIHelper interface exposes these methods:

MethodReturnsDescription
chat(opts)Promise<Message>Non-streaming completion. Returns the full Anthropic Message.
stream(opts)StreamResultStreaming completion. Returns { events, finalMessage }.
submitBatch(opts)Promise<BatchResult>Submit requests for async processing at 50% cost.
pollBatch(batchDbId, anthropicBatchId, feature)Promise<{ status, results? }>Check batch status and process completed results.
cancelBatch(batchDbId, anthropicBatchId)Promise<void>Cancel a running batch.
embed(opts)Promise<EmbedResult>Generate embeddings via Gemini. Configurable task type and dimensions.
embedDocument(text, feature, correlationId?)Promise<{ embedding, usageId }>Shorthand: embed for indexing (RETRIEVAL_DOCUMENT).
embedQuery(query, feature, correlationId?)Promise<{ embedding, usageId }>Shorthand: embed for search (RETRIEVAL_QUERY).

Module Structure

The package is organized into focused modules:

ModulePurpose
context.tsAIHelper interface and createAIHelper() factory
client.tsSingleton Anthropic SDK client
config.tsModel IDs, default token limits
types.tsShared types: ChatOptions, AIContext, StreamResult, BatchOptions
request.tsbuildRequestParams() -- translates ChatOptions to Anthropic API params
completion.tschatCompletion() -- non-streaming calls
stream.tsstreamChat() -- streaming with async iterable events
batch.tssubmitBatch(), pollBatchResults(), cancelBatch()
embedding.tsGemini embedding client, embedContent(), embedDocument(), embedQuery()
recording.tsrecordCompletion(), recordFailure(), recordEmbedding() -- database writes
extraction.tsHelpers for extracting thinking blocks, tool use details, server tool usage
errors.tsAIError class and mapAnthropicError() -- structured error mapping
chunking.tschunkText() -- recursive character splitter
contextual-retrieval.tsgenerateChunkContext() -- Haiku-powered context prefix generation
utils.tstoStoredParams() -- extracts storable fields from API params

SDK Boundary Rule

The SDK boundary is a hard architectural constraint:

  • Only @repo/ai may import @anthropic-ai/sdk and @google/genai
  • All other packages import from @repo/ai exclusively
  • Anthropic SDK types (like Message) are re-exported through @repo/ai's index for downstream use
  • Violation detection: ESLint no-restricted-imports for editor feedback, dependency-cruiser for CI gate

This ensures every AI call flows through the wrapper, which means usage tracking cannot be accidentally bypassed.

Key Gotchas

  1. Streaming requires consuming both outputs. stream() returns { events, finalMessage }. Usage is only recorded when finalMessage resolves. If you only iterate events without awaiting finalMessage, the usage record is never written.

  2. Batch is fire-and-forget. submitBatch() returns immediately with a batch ID. You must call pollBatch() separately (typically from an Inngest polling job) to check for completion and process results.

  3. Embedding token counts are estimates. Gemini does not always return token counts in embedding responses. The package estimates ~1 token per 4 characters, which matches typical English text tokenization.

  4. feature is required on every call. The TypeScript signature enforces this at compile time. The string is used for cost attribution (e.g., "research-chat", "hybrid-search-contextual-retrieval", "mcp-extract-data").

  5. Correlation IDs link multi-step workflows. When a pipeline needs Haiku context generation followed by Gemini embedding, pass the same correlationId to both calls. This enables cost analysis across related operations.

On this page