Trovella Wiki

ADR-007: AI Integration -- Claude API + Gemini Embeddings

Decision record for choosing a thin @repo/ai wrapper around Anthropic Claude (chat) and Gemini Embedding 2 (vectors) with automatic usage tracking.

Status: Accepted Date: 2026-03-21 (Week 0 Decision Sprint), modified 2026-03-28 (TRO-22), modified 2026-04-02 (TRO-58 embeddings) Deciders: Kyle Olson (Solo Founder)

Decision

  • Chat/reasoning: Anthropic Claude API via @repo/ai wrapper -- Sonnet 4.6 (default), Opus 4.6 (complex tasks), Haiku 4.5 (high-volume lightweight tasks)
  • Embeddings: Gemini Embedding 2 at 1536 dimensions (MRL-truncated from 3072) via @repo/ai
  • Wrapper pattern: Thin wrapper -- Anthropic SDK types flow through, no custom abstraction layer
  • Usage tracking: Automatic on every call -- ai_usage (tokens, cost, latency) + ai_call_details (full request/response JSONB)
  • Model registry: Database tables (ai_model, ai_model_pricing) with effective dates -- not hardcoded config
  • Multi-provider: Single-provider (Anthropic) for chat at MVP; multi-provider deferred until a second provider is actually needed
  • SDK boundary: Only @repo/ai may import @anthropic-ai/sdk or @google/genai -- enforced via ESLint no-restricted-imports

Context

AI-powered research is Trovella's core product. Every feature -- research plans, data extraction, contextual retrieval, document generation -- makes LLM calls. This creates two requirements traditional SaaS apps don't have:

  1. Cost visibility -- AI API costs are the primary variable expense; every call must be tracked with feature attribution for metering and cost optimization
  2. Single entry point -- if any package can import the Anthropic SDK directly, calls bypass usage tracking, billing data is wrong, and logging is missing

The MCP-first architecture (ADR-010) also shapes this decision. Users do heavy LLM work inside their subsidized Claude/ChatGPT subscriptions via Trovella's MCP server -- Trovella enhances that work through structure, memory, and skills rather than competing on raw LLM access. This means Trovella's own API usage is for internal operations (contextual retrieval, extraction, summarization), not wholesale chat relay.

Decision Drivers

  1. Usage tracking on every call -- cost attribution by feature is required for metering and business decisions
  2. Single wrapper package -- prevent direct SDK imports that bypass tracking
  3. Thin abstraction -- Anthropic SDK types flow through; no premature multi-provider interface until a second provider is actually needed
  4. Embedding quality for retrieval -- benchmarks matter more than price; retrieval accuracy directly affects product quality
  5. Database-backed model registry -- prices change, models are added/deprecated; config must be queryable and versioned

Alternatives Considered

Wrapper Abstraction Level

Provider-agnostic abstraction -- a custom interface hiding all provider types behind a Trovella abstraction. Rejected because it violates the progressive complexity principle. The provider column in ai_model accommodates future providers without requiring the wrapper abstraction today. Kyle's note: "I choose option A for decision 1, but know that we will expand to use more AI model vendors in the future."

Vercel AI SDK -- pre-built provider switching and React streaming hooks. Rejected because it adds another streaming abstraction that conflicts with the existing tRPC streaming pattern, and the complexity is not justified for single-provider use.

Embedding Model

Cohere Embed v4 -- 128K context window, $0.12/M tokens. Initially recommended but rejected after benchmark analysis. MTEB Retrieval score was 57.6 vs Gemini's 67.7 (a 10-point gap). Research showed the 128K context is counterproductive for retrieval: chunking to 512 tokens improved retrieval by +24.5% vs whole-document embedding.

OpenAI text-embedding-3-large -- well-known, 3072 dimensions, $0.13/M tokens. Rejected due to the lowest MTEB Overall score (64.6) and the cross-cloud dependency it would introduce into a GCP-committed project.

Usage Tracking Approach

Event-based tracking (consumers track their own usage) -- simpler wrapper, consumers decide what to log. Rejected because a single missed tracking call corrupts billing data. Defense in depth: the wrapper records usage before returning results.

Key Implementation Decisions

Automatic usage recording on every call

chatCompletion(), streamChat(), submitBatch(), embed(), and all variants automatically write to ai_usage and ai_call_details before returning results. Failed calls write a 0-token record for debugging. The feature field is required on every call for cost attribution. See Usage Tracking for the recording pipeline details.

Model registry as database tables with effective dates

Kyle caught that hardcoded pricing would break: "The model registry needs to be a table. Any of those figures could change at any time." The ai_model table stores model metadata; ai_model_pricing stores pricing by type with effectiveFrom/effectiveUntil date ranges. Cost estimates are calculated at query time using the active pricing record. See Data & Storage -- Reference Data for the schema.

Full request/response stored in ai_call_details

Kyle overrode the initial recommendation to exclude prompts: "I do want prompts and responses stored along with thinking block content and details about tool use." The ai_call_details table stores system prompt, full request/response JSONB, thinking blocks, and tool use details. A deferred ticket (TRO-49) covers archival to GCS when storage costs justify it.

ctx.ai on authorizedProcedure

Inside tRPC routes, ctx.ai provides methods pre-bound with the user's organizationId, userId, and logger. Outside tRPC, use createAIHelper(organizationId, userId, logger). See Reasoning Overview for the entry point patterns.

Separate functions for streaming vs non-streaming

chatCompletion() returns a complete message. streamChat() returns { events, finalMessage }. The streaming function must await finalMessage for usage to be recorded. A single overloaded function was rejected because the implementations diverge enough internally that overloaded return types would be harder to reason about.

Gemini embeddings at 1536 dimensions (MRL-truncated)

Gemini Embedding 2 outputs 3072 dimensions natively but supports Matryoshka Representation Learning (MRL) for truncation to 768/1536/3072. Trovella uses 1536 as the balance between quality and storage cost, stored as halfvec(1536) in PostgreSQL. Full 3072 dimensions were rejected because they double storage and index memory with diminishing quality returns.

Contextual retrieval with Haiku from day one

Each document chunk gets a 2-3 sentence context prefix generated by Claude Haiku before embedding. This implements Anthropic's contextual retrieval technique (~49% retrieval failure reduction). Haiku and Gemini embedding calls are linked via correlationId for cost analysis. See Search & Retrieval -- Indexing for the full pipeline.

Consequences

Positive

  • Complete cost visibility -- every AI call is recorded with feature attribution, model, tokens, latency, and estimated cost
  • Single entry point enforced -- ESLint + dependency-cruiser prevent any package from importing AI SDKs directly
  • Multi-provider ready without premature abstraction -- ai_model.provider column supports adding providers without rewriting the wrapper
  • Best-in-class embeddings -- Gemini Embedding 2 scored 10 MTEB retrieval points above the nearest competitor
  • Admin observability -- full admin UI at /admin/ai-logs with dashboard and call detail drill-down. See Application -- AI Logs View.

Negative

  • Two AI providers to manage -- Anthropic for chat + Google for embeddings means two API keys, two SDKs, two billing accounts
  • Streaming usage requires await finalMessage -- if a consumer only reads events and does not await, usage is not recorded
  • ai_call_details storage growth -- full JSONB for every call will grow quickly; GCS archival (TRO-49) is deferred

Risks

  • Anthropic API pricing changes -- mitigated by database-backed pricing registry with effective dates
  • Single-provider dependency for chat -- if Anthropic has an outage, all chat features are down; the thin wrapper makes adding a fallback provider a code change, not an architectural change
  • Embedding model quality regression -- mitigated by the Inngest-based indexing pipeline which can re-process content in batch

Validation

RuleEnforcement
Only @repo/ai imports @anthropic-ai/sdk and @google/genaiESLint no-restricted-imports + dependency-cruiser
feature field required on all AI callsTypeScript compile error if omitted
Usage recorded on every callWrapper writes ai_usage before returning; 0-token on failure
tRPC routes use ctx.ai not createAIHelperESLint no-restricted-imports in router directories
Model pricing is currentai_model_pricing has effective date columns; admin updates without code changes

References

  • Linear: TRO-22 (AI API Layer), TRO-58 (Hybrid search -- embedding selection), TRO-49 (AI call content retention, deferred)
  • Related: ADR-001 (Database -- halfvec(1536) storage), ADR-006 (Inngest -- indexing pipeline), ADR-009 (Search -- hybrid fusion)

On this page