Model Selection
Available AI models, their capabilities and defaults, and how to choose the right model for each use case.
Claude Models (Chat/Reasoning)
Trovella uses three Anthropic Claude models. The model is selected per-call via the model field on ChatOptions. If omitted, the default is Sonnet 4.6.
Model Defaults
| Model | ID | Default maxTokens | Streaming maxTokens | Thinking |
|---|---|---|---|---|
| Sonnet 4.6 (default) | claude-sonnet-4-6 | 16,000 | 64,000 | Adaptive (on) |
| Opus 4.6 | claude-opus-4-6 | 16,000 | 64,000 | Adaptive (on) |
| Haiku 4.5 | claude-haiku-4-5 | 8,192 | 16,000 | Off (not supported) |
These defaults are defined in packages/ai/src/config.ts:
export type ModelId = "claude-opus-4-6" | "claude-sonnet-4-6" | "claude-haiku-4-5";
export const DEFAULT_MODEL: ModelId = "claude-sonnet-4-6";
export const MODEL_MAX_TOKENS: Record<ModelId, number> = {
"claude-opus-4-6": 16_000,
"claude-sonnet-4-6": 16_000,
"claude-haiku-4-5": 8_192,
};
export const MODEL_STREAM_MAX_TOKENS: Record<ModelId, number> = {
"claude-opus-4-6": 64_000,
"claude-sonnet-4-6": 64_000,
"claude-haiku-4-5": 16_000,
};
Streaming gets higher token limits because there is no HTTP timeout risk -- the connection stays open via SSE.
When to Use Each Model
Sonnet 4.6 -- the default for most operations. Use when you need good reasoning quality at moderate cost. This is the right choice for research chat, data extraction, and general-purpose completions.
Opus 4.6 -- reserved for tasks requiring the highest reasoning quality. Use for complex multi-step analysis, nuanced document synthesis, or tasks where Sonnet produces inadequate results. The cost difference is significant (see the pricing table in Data & Storage -- Reference Data), so use deliberately.
Haiku 4.5 -- optimized for high-volume, lightweight tasks where speed and cost matter more than reasoning depth. The primary use case today is contextual retrieval: generating 2-3 sentence context prefixes for document chunks before embedding. Haiku does not support thinking mode. Use for classification, extraction of structured data from well-defined formats, and other tasks where a smaller model suffices.
Overriding Defaults Per-Call
Pass model, maxTokens, or thinking in the ChatOptions to override defaults:
const response = await ctx.ai.chat({
messages: [{ role: "user", content: complexAnalysisPrompt }],
feature: "research-deep-analysis",
model: "claude-opus-4-6", // Override model
maxTokens: 32_000, // Override token limit
});
const classification = await ctx.ai.chat({
messages: [{ role: "user", content: classificationPrompt }],
feature: "research-classify",
model: "claude-haiku-4-5",
thinking: false, // Explicitly off (Haiku default)
});
Thinking Mode
Thinking (extended reasoning) is enabled by default on Sonnet and Opus as { type: "adaptive" }. This means the model decides whether to use a thinking step based on the complexity of the request.
To disable thinking for a specific call (e.g., simple classification where thinking adds latency):
const response = await ctx.ai.chat({
messages,
feature: "simple-extraction",
thinking: false,
});
The effort level controls thinking depth. Default is "high":
const response = await ctx.ai.chat({
messages,
feature: "quick-summary",
effort: "low", // Options: "low" | "medium" | "high" | "max"
});
Haiku 4.5 does not support thinking. If thinking is not explicitly set to false, the buildRequestParams() function in request.ts skips adding thinking params for Haiku.
Gemini Embedding Model
Trovella uses a single embedding model: Gemini Embedding 2 (gemini-embedding-2-preview).
| Property | Value |
|---|---|
| API model ID | gemini-embedding-2-preview |
| Database model ID | gemini-embedding-002 |
| Native dimensions | 3,072 |
| Configured dimensions | 1,536 (MRL-truncated) |
| Storage type | halfvec(1536) in PostgreSQL |
Task Types
The Gemini embedding API accepts a taskType hint that optimizes the embedding for specific use cases:
| Task Type | When to Use |
|---|---|
RETRIEVAL_DOCUMENT | Indexing content for search (default) |
RETRIEVAL_QUERY | Embedding a user's search query |
SEMANTIC_SIMILARITY | Comparing two texts for similarity |
CLASSIFICATION | Text classification tasks |
CLUSTERING | Grouping related content |
QUESTION_ANSWERING | Q&A retrieval |
FACT_VERIFICATION | Checking factual claims |
In practice, the search pipeline uses two task types:
RETRIEVAL_DOCUMENTduring indexing (viaembedDocument())RETRIEVAL_QUERYduring search (viaembedQuery())
Using different task types for indexing vs querying improves retrieval quality because the model can optimize the vector space differently for "what this document is about" vs "what the user is looking for."
MRL Truncation
Gemini Embedding 2 supports Matryoshka Representation Learning, which means the first N dimensions of the full 3072-dimensional vector are a valid lower-dimensional embedding. Truncating to 1536 dimensions halves storage and index memory while retaining most of the quality. The outputDimensionality parameter controls this:
const result = await ctx.ai.embed({
contents: ["text to embed"],
feature: "hybrid-search-index",
outputDimensionality: 1536, // Default via the search pipeline
});
Model Registry (Database)
The runtime code in @repo/ai uses hardcoded model IDs and token limits for API calls. However, cost estimation uses the database-backed model registry:
ai_model-- stores model metadata (provider, context window, supported features, active flag)ai_model_pricing-- stores pricing per type (input, output, cache read/write, batch variants, web search, code execution) witheffectiveFrom/effectiveUntildate ranges
When Anthropic or Google changes pricing, a new ai_model_pricing row is inserted with the new effective date. The old row gets an effectiveUntil value. Cost calculations in recording.ts query the active pricing at call time.
See Data & Storage -- Reference Data for the full schema and seeding details.
Environment Variables
| Variable | Purpose | Where |
|---|---|---|
ANTHROPIC_API_KEY | Claude API authentication | Environment Variables |
GOOGLE_AI_API_KEY | Gemini Embedding API authentication | Environment Variables |
Both are stored in GCP Secret Manager and synced to the VM at deploy time. Local development uses values from apps/web/.env.
ADR-007: AI Integration -- Claude API + Gemini Embeddings
Decision record for choosing a thin @repo/ai wrapper around Anthropic Claude (chat) and Gemini Embedding 2 (vectors) with automatic usage tracking.
Prompt Patterns
ChatOptions anatomy, system prompts, prompt caching, tool use, structured output, and streaming event handling.