Model Selection

Available AI models, their capabilities and defaults, and how to choose the right model for each use case.

Claude Models (Chat/Reasoning)

Trovella uses three Anthropic Claude models. The model is selected per-call via the model field on ChatOptions. If omitted, the default is Sonnet 4.6.

Model Defaults

Model	ID	Default maxTokens	Streaming maxTokens	Thinking
Sonnet 4.6 (default)	`claude-sonnet-4-6`	16,000	64,000	Adaptive (on)
Opus 4.6	`claude-opus-4-6`	16,000	64,000	Adaptive (on)
Haiku 4.5	`claude-haiku-4-5`	8,192	16,000	Off (not supported)

These defaults are defined in packages/ai/src/config.ts:

export type ModelId = "claude-opus-4-6" | "claude-sonnet-4-6" | "claude-haiku-4-5";

export const DEFAULT_MODEL: ModelId = "claude-sonnet-4-6";

export const MODEL_MAX_TOKENS: Record<ModelId, number> = {
  "claude-opus-4-6": 16_000,
  "claude-sonnet-4-6": 16_000,
  "claude-haiku-4-5": 8_192,
};

export const MODEL_STREAM_MAX_TOKENS: Record<ModelId, number> = {
  "claude-opus-4-6": 64_000,
  "claude-sonnet-4-6": 64_000,
  "claude-haiku-4-5": 16_000,
};

Streaming gets higher token limits because there is no HTTP timeout risk -- the connection stays open via SSE.

When to Use Each Model

Sonnet 4.6 -- the default for most operations. Use when you need good reasoning quality at moderate cost. This is the right choice for research chat, data extraction, and general-purpose completions.

Opus 4.6 -- reserved for tasks requiring the highest reasoning quality. Use for complex multi-step analysis, nuanced document synthesis, or tasks where Sonnet produces inadequate results. The cost difference is significant (see the pricing table in Data & Storage -- Reference Data), so use deliberately.

Haiku 4.5 -- optimized for high-volume, lightweight tasks where speed and cost matter more than reasoning depth. The primary use case today is contextual retrieval: generating 2-3 sentence context prefixes for document chunks before embedding. Haiku does not support thinking mode. Use for classification, extraction of structured data from well-defined formats, and other tasks where a smaller model suffices.

Overriding Defaults Per-Call

Pass model, maxTokens, or thinking in the ChatOptions to override defaults:

const response = await ctx.ai.chat({
  messages: [{ role: "user", content: complexAnalysisPrompt }],
  feature: "research-deep-analysis",
  model: "claude-opus-4-6", // Override model
  maxTokens: 32_000, // Override token limit
});

const classification = await ctx.ai.chat({
  messages: [{ role: "user", content: classificationPrompt }],
  feature: "research-classify",
  model: "claude-haiku-4-5",
  thinking: false, // Explicitly off (Haiku default)
});

Thinking Mode

Thinking (extended reasoning) is enabled by default on Sonnet and Opus as { type: "adaptive" }. This means the model decides whether to use a thinking step based on the complexity of the request.

To disable thinking for a specific call (e.g., simple classification where thinking adds latency):

const response = await ctx.ai.chat({
  messages,
  feature: "simple-extraction",
  thinking: false,
});

The effort level controls thinking depth. Default is "high":

const response = await ctx.ai.chat({
  messages,
  feature: "quick-summary",
  effort: "low", // Options: "low" | "medium" | "high" | "max"
});

Haiku 4.5 does not support thinking. If thinking is not explicitly set to false, the buildRequestParams() function in request.ts skips adding thinking params for Haiku.

Gemini Embedding Model

Trovella uses a single embedding model: Gemini Embedding 2 (gemini-embedding-2-preview).

Property	Value
API model ID	`gemini-embedding-2-preview`
Database model ID	`gemini-embedding-002`
Native dimensions	3,072
Configured dimensions	1,536 (MRL-truncated)
Storage type	`halfvec(1536)` in PostgreSQL

Task Types

The Gemini embedding API accepts a taskType hint that optimizes the embedding for specific use cases:

Task Type	When to Use
`RETRIEVAL_DOCUMENT`	Indexing content for search (default)
`RETRIEVAL_QUERY`	Embedding a user's search query
`SEMANTIC_SIMILARITY`	Comparing two texts for similarity
`CLASSIFICATION`	Text classification tasks
`CLUSTERING`	Grouping related content
`QUESTION_ANSWERING`	Q&A retrieval
`FACT_VERIFICATION`	Checking factual claims

In practice, the search pipeline uses two task types:

RETRIEVAL_DOCUMENT during indexing (via embedDocument())
RETRIEVAL_QUERY during search (via embedQuery())

Using different task types for indexing vs querying improves retrieval quality because the model can optimize the vector space differently for "what this document is about" vs "what the user is looking for."

MRL Truncation

Gemini Embedding 2 supports Matryoshka Representation Learning, which means the first N dimensions of the full 3072-dimensional vector are a valid lower-dimensional embedding. Truncating to 1536 dimensions halves storage and index memory while retaining most of the quality. The outputDimensionality parameter controls this:

const result = await ctx.ai.embed({
  contents: ["text to embed"],
  feature: "hybrid-search-index",
  outputDimensionality: 1536, // Default via the search pipeline
});

Model Registry (Database)

The runtime code in @repo/ai uses hardcoded model IDs and token limits for API calls. However, cost estimation uses the database-backed model registry:

ai_model -- stores model metadata (provider, context window, supported features, active flag)
ai_model_pricing -- stores pricing per type (input, output, cache read/write, batch variants, web search, code execution) with effectiveFrom/effectiveUntil date ranges

When Anthropic or Google changes pricing, a new ai_model_pricing row is inserted with the new effective date. The old row gets an effectiveUntil value. Cost calculations in recording.ts query the active pricing at call time.

See Data & Storage -- Reference Data for the full schema and seeding details.

Environment Variables

Variable	Purpose	Where
`ANTHROPIC_API_KEY`	Claude API authentication	Environment Variables
`GOOGLE_AI_API_KEY`	Gemini Embedding API authentication	Environment Variables

Both are stored in GCP Secret Manager and synced to the VM at deploy time. Local development uses values from apps/web/.env.