Prompt Patterns
ChatOptions anatomy, system prompts, prompt caching, tool use, structured output, and streaming event handling.
Every AI call in Trovella passes through a ChatOptions object that gets translated into Anthropic API parameters by buildRequestParams(). This page documents the full options surface, common patterns, and how they map to the underlying API.
ChatOptions Reference
interface ChatOptions {
// Required
messages: Anthropic.MessageParam[];
feature: string;
// Model and generation
model?: ModelId; // Default: "claude-sonnet-4-6"
maxTokens?: number; // Default: per model (see Model Selection)
temperature?: number; // 0-1, undefined = model default
topP?: number; // Nucleus sampling (alternative to temperature)
topK?: number; // Top-k sampling
stopSequences?: string[];
// System prompt
system?: string | Anthropic.TextBlockParam[];
// Thinking (extended reasoning)
thinking?: boolean; // Default: true (Sonnet/Opus), false (Haiku)
effort?: "low" | "medium" | "high" | "max";
// Tool use
tools?: Anthropic.Messages.ToolUnion[];
toolChoice?: Anthropic.ToolChoice;
// Prompt caching
cacheControl?: boolean | { type: "ephemeral"; ttl?: "5m" | "1h" };
// Structured output
outputFormat?: {
type: "json_schema";
schema: Record<string, unknown>;
};
// Metadata and advanced
endUserMetadataId?: string; // Anthropic abuse detection
betas?: string[]; // Beta feature headers
container?: string; // Code execution container reuse
correlationId?: string; // Link related calls
}
System Prompts
System prompts can be passed as a plain string or as content blocks with cache control:
Plain String
await ctx.ai.chat({
messages: [{ role: "user", content: "Summarize this document." }],
system: "You are a research assistant specializing in market analysis.",
feature: "research-summarize",
});
Content Blocks with Cache Control
For long system prompts that should be cached (e.g., shared instructions across multiple calls), pass an array of TextBlockParam with cache_control:
await ctx.ai.chat({
messages: [{ role: "user", content: userQuery }],
system: [
{
type: "text",
text: longInstructionSet,
cache_control: { type: "ephemeral" },
},
],
feature: "research-chat",
});
The system prompt is stored separately in ai_call_details.systemPrompt regardless of format, so it is searchable in the admin UI.
Prompt Caching
Prompt caching reduces cost and latency for repeated prefixes. Two mechanisms are available:
Top-Level Auto-Caching
Set cacheControl: true to enable automatic caching on the last cacheable block:
await ctx.ai.chat({
messages,
system: longSystemPrompt,
feature: "research-chat",
cacheControl: true,
});
This maps to cache_control: { type: "ephemeral" } on the API request. The default TTL is 5 minutes. To request a longer TTL:
cacheControl: { type: "ephemeral", ttl: "1h" }
Fine-Grained Cache Control
For control over which specific blocks are cached, pass cache_control directly in the system or messages content blocks (as shown in the content blocks example above). This is preferred when the system prompt has both static (cacheable) and dynamic (per-request) sections.
Cache Metrics
Cache hits and writes are tracked automatically:
ai_usage.cacheReadTokens-- tokens served from cache (reduced cost)ai_usage.cacheWriteTokens-- tokens written to cache (one-time cost)
These are visible in the AI Logs dashboard.
Tool Use
Tools are passed via the tools field. Both custom tools and server-side tools (web search, code execution) are supported.
Custom Tools
await ctx.ai.chat({
messages: [{ role: "user", content: "Extract the key entities from this text." }],
feature: "research-extract",
tools: [
{
name: "record_entity",
description: "Record a discovered entity with its type and context.",
input_schema: {
type: "object",
properties: {
name: { type: "string" },
entityType: { type: "string", enum: ["person", "company", "location"] },
context: { type: "string" },
},
required: ["name", "entityType"],
},
},
],
toolChoice: { type: "auto" },
});
Server-Side Tools (Web Search, Code Execution)
Anthropic's server-side tools are passed as ToolUnion entries. The response content blocks include server_tool_use, web_search_tool_result, web_fetch_tool_result, and code execution results. The extraction.ts module handles extracting these for recording:
// Server tool usage is tracked separately in ai_usage
// webSearchRequests, webFetchRequests columns
Tool use details (name, input, output, type) are extracted by extractToolUseDetails() and stored in ai_call_details.toolUseDetails for debugging via the admin UI.
Structured Output
Force the model to respond with JSON matching a specific schema:
const response = await ctx.ai.chat({
messages: [{ role: "user", content: documentText }],
system: "Extract structured data from the provided document.",
feature: "mcp-extract-data",
outputFormat: {
type: "json_schema",
schema: {
type: "object",
properties: {
title: { type: "string" },
date: { type: "string", format: "date" },
entities: {
type: "array",
items: { type: "object", properties: { name: { type: "string" } } },
},
},
required: ["title"],
},
},
});
The buildRequestParams() function maps this to the Anthropic API's output_config.format field.
Streaming Events
When using ctx.ai.stream(), the returned events async iterable yields typed ChatStreamEvent objects:
| Event Type | Fields | When |
|---|---|---|
message_start | messageId | Stream begins |
text | text | Text content delta |
thinking | thinking | Thinking content delta |
tool_use_start | toolName, toolId | Tool use block begins |
input_json | partialJson | Incremental tool input JSON |
content_block_stop | index | A content block finishes |
The text and thinking events record time-to-first-token (TTFT) for latency tracking.
Consuming a Stream
const { events, finalMessage } = ctx.ai.stream({
messages,
system: "You are a research assistant.",
feature: "research-chat",
});
// Yield events to the client (e.g., via tRPC subscription or SSE)
for await (const event of events) {
if (event.type === "text") {
sendToClient(event.text);
}
if (event.type === "thinking") {
// Optionally show thinking to the user
}
}
// IMPORTANT: await finalMessage for usage to be recorded
const message = await finalMessage;
Failing to await finalMessage means the usage record is never written. This is documented as a key gotcha in the Reasoning Overview.
Contextual Retrieval Prompt
The generateChunkContext() function in contextual-retrieval.ts uses a specific prompt pattern for generating search context prefixes:
const CONTEXT_PROMPT = `You are an expert at providing concise context for document chunks
used in search retrieval.
Given a full document and a specific chunk from that document, write a short (2-3 sentence)
context that situates the chunk within the overall document. Include key entities, topics,
and relationships that would help a search system match this chunk to relevant queries.
Respond with ONLY the context text — no labels, prefixes, or formatting.`;
Key design decisions in this prompt:
- Haiku model -- uses
claude-haiku-4-5for cost efficiency at high volume - 200 max tokens -- context prefixes should be short; this caps generation
- Thinking disabled -- simple task, no benefit from extended reasoning
- Feature tag --
"hybrid-search-contextual-retrieval"for cost attribution - Document truncation -- full document is truncated to 150K characters (~37K tokens) to stay within Haiku's context window
- Correlation ID -- links this Haiku call to the subsequent Gemini embedding call
The user message wraps the document and chunk in XML tags:
<document>{fullDocument}</document>
<chunk>{chunk}</chunk>
Provide a brief context for this chunk within the document.
Request Parameter Building
All ChatOptions are translated to Anthropic API parameters by buildRequestParams() in request.ts. This function is shared across chatCompletion(), streamChat(), and submitBatch().
Key transformations:
- Model defaults -- if
modelis omitted, usesclaude-sonnet-4-6 - Token limits -- streaming calls get
MODEL_STREAM_MAX_TOKENS, non-streaming getMODEL_MAX_TOKENS - Thinking -- added as
{ type: "adaptive" }unless explicitly disabled or the model is Haiku - Effort -- mapped to
output_config.efforton the API request - Structured output -- mapped to
output_config.format - Cache control -- mapped to
cache_controlon the API request - Container -- passed through for code execution sandbox reuse
- Beta headers -- sent as
anthropic-betaheader, not in the request body
After building params, toStoredParams() in utils.ts extracts the subset of fields stored in ai_call_details for debugging. This includes the system prompt in its original format (string or content blocks).