Prompt Patterns

ChatOptions anatomy, system prompts, prompt caching, tool use, structured output, and streaming event handling.

Every AI call in Trovella passes through a ChatOptions object that gets translated into Anthropic API parameters by buildRequestParams(). This page documents the full options surface, common patterns, and how they map to the underlying API.

ChatOptions Reference

interface ChatOptions {
  // Required
  messages: Anthropic.MessageParam[];
  feature: string;

  // Model and generation
  model?: ModelId; // Default: "claude-sonnet-4-6"
  maxTokens?: number; // Default: per model (see Model Selection)
  temperature?: number; // 0-1, undefined = model default
  topP?: number; // Nucleus sampling (alternative to temperature)
  topK?: number; // Top-k sampling
  stopSequences?: string[];

  // System prompt
  system?: string | Anthropic.TextBlockParam[];

  // Thinking (extended reasoning)
  thinking?: boolean; // Default: true (Sonnet/Opus), false (Haiku)
  effort?: "low" | "medium" | "high" | "max";

  // Tool use
  tools?: Anthropic.Messages.ToolUnion[];
  toolChoice?: Anthropic.ToolChoice;

  // Prompt caching
  cacheControl?: boolean | { type: "ephemeral"; ttl?: "5m" | "1h" };

  // Structured output
  outputFormat?: {
    type: "json_schema";
    schema: Record<string, unknown>;
  };

  // Metadata and advanced
  endUserMetadataId?: string; // Anthropic abuse detection
  betas?: string[]; // Beta feature headers
  container?: string; // Code execution container reuse
  correlationId?: string; // Link related calls
}

System Prompts

System prompts can be passed as a plain string or as content blocks with cache control:

Plain String

await ctx.ai.chat({
  messages: [{ role: "user", content: "Summarize this document." }],
  system: "You are a research assistant specializing in market analysis.",
  feature: "research-summarize",
});

Content Blocks with Cache Control

For long system prompts that should be cached (e.g., shared instructions across multiple calls), pass an array of TextBlockParam with cache_control:

await ctx.ai.chat({
  messages: [{ role: "user", content: userQuery }],
  system: [
    {
      type: "text",
      text: longInstructionSet,
      cache_control: { type: "ephemeral" },
    },
  ],
  feature: "research-chat",
});

The system prompt is stored separately in ai_call_details.systemPrompt regardless of format, so it is searchable in the admin UI.

Prompt Caching

Prompt caching reduces cost and latency for repeated prefixes. Two mechanisms are available:

Top-Level Auto-Caching

Set cacheControl: true to enable automatic caching on the last cacheable block:

await ctx.ai.chat({
  messages,
  system: longSystemPrompt,
  feature: "research-chat",
  cacheControl: true,
});

This maps to cache_control: { type: "ephemeral" } on the API request. The default TTL is 5 minutes. To request a longer TTL:

cacheControl: { type: "ephemeral", ttl: "1h" }

Fine-Grained Cache Control

For control over which specific blocks are cached, pass cache_control directly in the system or messages content blocks (as shown in the content blocks example above). This is preferred when the system prompt has both static (cacheable) and dynamic (per-request) sections.

Cache Metrics

Cache hits and writes are tracked automatically:

ai_usage.cacheReadTokens -- tokens served from cache (reduced cost)
ai_usage.cacheWriteTokens -- tokens written to cache (one-time cost)

These are visible in the AI Logs dashboard.

Tool Use

Tools are passed via the tools field. Both custom tools and server-side tools (web search, code execution) are supported.

Custom Tools

await ctx.ai.chat({
  messages: [{ role: "user", content: "Extract the key entities from this text." }],
  feature: "research-extract",
  tools: [
    {
      name: "record_entity",
      description: "Record a discovered entity with its type and context.",
      input_schema: {
        type: "object",
        properties: {
          name: { type: "string" },
          entityType: { type: "string", enum: ["person", "company", "location"] },
          context: { type: "string" },
        },
        required: ["name", "entityType"],
      },
    },
  ],
  toolChoice: { type: "auto" },
});

Server-Side Tools (Web Search, Code Execution)

Anthropic's server-side tools are passed as ToolUnion entries. The response content blocks include server_tool_use, web_search_tool_result, web_fetch_tool_result, and code execution results. The extraction.ts module handles extracting these for recording:

// Server tool usage is tracked separately in ai_usage
// webSearchRequests, webFetchRequests columns

Tool use details (name, input, output, type) are extracted by extractToolUseDetails() and stored in ai_call_details.toolUseDetails for debugging via the admin UI.

Structured Output

Force the model to respond with JSON matching a specific schema:

const response = await ctx.ai.chat({
  messages: [{ role: "user", content: documentText }],
  system: "Extract structured data from the provided document.",
  feature: "mcp-extract-data",
  outputFormat: {
    type: "json_schema",
    schema: {
      type: "object",
      properties: {
        title: { type: "string" },
        date: { type: "string", format: "date" },
        entities: {
          type: "array",
          items: { type: "object", properties: { name: { type: "string" } } },
        },
      },
      required: ["title"],
    },
  },
});

The buildRequestParams() function maps this to the Anthropic API's output_config.format field.

Streaming Events

When using ctx.ai.stream(), the returned events async iterable yields typed ChatStreamEvent objects:

Event Type	Fields	When
`message_start`	`messageId`	Stream begins
`text`	`text`	Text content delta
`thinking`	`thinking`	Thinking content delta
`tool_use_start`	`toolName`, `toolId`	Tool use block begins
`input_json`	`partialJson`	Incremental tool input JSON
`content_block_stop`	`index`	A content block finishes

The text and thinking events record time-to-first-token (TTFT) for latency tracking.

Consuming a Stream

const { events, finalMessage } = ctx.ai.stream({
  messages,
  system: "You are a research assistant.",
  feature: "research-chat",
});

// Yield events to the client (e.g., via tRPC subscription or SSE)
for await (const event of events) {
  if (event.type === "text") {
    sendToClient(event.text);
  }
  if (event.type === "thinking") {
    // Optionally show thinking to the user
  }
}

// IMPORTANT: await finalMessage for usage to be recorded
const message = await finalMessage;

Failing to await finalMessage means the usage record is never written. This is documented as a key gotcha in the Reasoning Overview.

Contextual Retrieval Prompt

The generateChunkContext() function in contextual-retrieval.ts uses a specific prompt pattern for generating search context prefixes:

const CONTEXT_PROMPT = `You are an expert at providing concise context for document chunks
used in search retrieval.

Given a full document and a specific chunk from that document, write a short (2-3 sentence)
context that situates the chunk within the overall document. Include key entities, topics,
and relationships that would help a search system match this chunk to relevant queries.

Respond with ONLY the context text — no labels, prefixes, or formatting.`;

Key design decisions in this prompt:

Haiku model -- uses claude-haiku-4-5 for cost efficiency at high volume
200 max tokens -- context prefixes should be short; this caps generation
Thinking disabled -- simple task, no benefit from extended reasoning
Feature tag -- "hybrid-search-contextual-retrieval" for cost attribution
Document truncation -- full document is truncated to 150K characters (~37K tokens) to stay within Haiku's context window
Correlation ID -- links this Haiku call to the subsequent Gemini embedding call

The user message wraps the document and chunk in XML tags:

<document>{fullDocument}</document>
<chunk>{chunk}</chunk>
Provide a brief context for this chunk within the document.

Request Parameter Building

All ChatOptions are translated to Anthropic API parameters by buildRequestParams() in request.ts. This function is shared across chatCompletion(), streamChat(), and submitBatch().

Key transformations:

Model defaults -- if model is omitted, uses claude-sonnet-4-6
Token limits -- streaming calls get MODEL_STREAM_MAX_TOKENS, non-streaming get MODEL_MAX_TOKENS
Thinking -- added as { type: "adaptive" } unless explicitly disabled or the model is Haiku
Effort -- mapped to output_config.effort on the API request
Structured output -- mapped to output_config.format
Cache control -- mapped to cache_control on the API request
Container -- passed through for code execution sandbox reuse
Beta headers -- sent as anthropic-beta header, not in the request body

After building params, toStoredParams() in utils.ts extracts the subset of fields stored in ai_call_details for debugging. This includes the system prompt in its original format (string or content blocks).