Usage Tracking

How every AI call is recorded to the database -- the recording pipeline, cost calculation, failure recording, and the data model.

Every AI call -- successful or failed, chat or embedding -- writes a usage record to the database. This is automatic; callers do not need to opt in. The recording pipeline is the enforcement mechanism for cost visibility and debugging.

Recording Pipeline

Completion Recording (`recordCompletion`)

Called after every successful chatCompletion() or streamChat() call, and for each succeeded result in a batch.

The function performs these steps in a single database transaction:

Extract metadata from the response:
- Thinking blocks (including redacted thinking) via extractThinkingBlocks()
- Tool use details (custom tools, server tools, web search/fetch results, code execution results) via extractToolUseDetails()
- Server tool usage counts (web search requests, web fetch requests) via extractServerToolUsage()
- Container info (sandbox ID and expiry, if present)
Calculate estimated cost by looking up active pricing from ai_model_pricing:
- Input token cost
- Output token cost
- Cache read token cost
- Cache write token cost
- Web search per-request cost (if applicable)
Write ai_usage row with token counts, cost estimate, latency, feature, model, and stop reason
Write ai_call_details row with the system prompt, full request params, full response content, thinking blocks, and tool use details

Both inserts happen in the same transaction -- either both succeed or neither does.

Failure Recording (`recordFailure`)

Called when chatCompletion() or streamChat() throws an error. The failure recording:

Writes ai_usage with 0 tokens, stopReason: "error", and the measured latency
Writes ai_call_details with the full request params (so you can replay the failed call) and an empty response with the error code as the stop reason
Logs the failure at error level with the feature, model, error code, and message

Failure recording itself is wrapped in a .catch() so that a recording failure does not mask the original AI error thrown to the caller.

Embedding Recording (`recordEmbedding`)

Called after every embedContent(), embedDocument(), or embedQuery() call. Writes a single ai_usage row with:

Estimated input tokens (calculated as ceil(totalChars / 4) since Gemini does not always return token counts)
Embedding count (number of vectors generated)
stopReason: "end_turn" (embeddings always complete)
No ai_call_details row (embedding requests are simple enough that the usage row suffices)

Data Model

Usage data is stored across two tenant-scoped tables. See Data & Storage -- Schema Design for the full schema definitions and Reference Data for the global model registry.

`ai_usage` -- Per-Call Metrics

Column	Type	Purpose
`id`	`text` (UUID)	Primary key
`organizationId`	`text`	Tenant scope (RLS)
`userId`	`text`	Who made the call
`feature`	`text`	Cost attribution tag (e.g., `"research-chat"`)
`messageId`	`text`	Anthropic message ID (null for embeddings/failures)
`batchId`	`text`	Links to `ai_batch` for batch-processed calls
`model`	`text`	Model used (e.g., `"claude-sonnet-4-6"`)
`inputTokens`	`integer`	Input tokens consumed
`outputTokens`	`integer`	Output tokens generated
`cacheReadTokens`	`integer`	Tokens served from prompt cache
`cacheWriteTokens`	`integer`	Tokens written to prompt cache
`stopReason`	`enum`	`end_turn`, `max_tokens`, `stop_sequence`, `tool_use`, `pause_turn`, `refusal`, `error`
`stopSequence`	`text`	The stop sequence matched (if applicable)
`latencyMs`	`integer`	Total call duration
`ttftMs`	`integer`	Time to first token (streaming only)
`streaming`	`boolean`	Whether this was a streaming call
`estimatedCost`	`numeric`	Dollar cost estimate from model pricing
`embeddingCount`	`integer`	Number of embeddings generated (embed calls only)
`webSearchRequests`	`integer`	Server-side web search invocations
`webFetchRequests`	`integer`	Server-side web fetch invocations
`containerId`	`text`	Code execution container ID
`containerExpiresAt`	`timestamp`	Container expiry time
`correlationId`	`text`	Links related calls in multi-step workflows

`ai_call_details` -- Full Request/Response

Column	Type	Purpose
`id`	`text` (UUID)	Primary key
`usageId`	`text`	FK to `ai_usage`
`organizationId`	`text`	Tenant scope (RLS)
`systemPrompt`	`text` or `jsonb`	System prompt in its original format
`request`	`jsonb`	Full request params (messages, tools, config)
`response`	`jsonb`	Full response (content blocks, stop reason)
`thinkingBlocks`	`jsonb`	Array of `{ thinking, tokens?, redacted? }`
`toolUseDetails`	`jsonb`	Array of `{ toolName, toolId, input, output, type }`
`cacheControlEnabled`	`boolean`	Whether prompt caching was active

Cost Calculation

Estimated cost is calculated by calculateEstimatedCost() in recording.ts. The function:

Queries all ai_model_pricing rows for the model
Filters to rows where the current date falls within effectiveFrom and effectiveUntil
Calculates cost per pricing type:

input cost    = (inputTokens / 1,000,000) * inputPrice
output cost   = (outputTokens / 1,000,000) * outputPrice
cache read    = (cacheReadTokens / 1,000,000) * cacheReadPrice
cache write   = (cacheWriteTokens / 1,000,000) * cacheWritePrice
web search    = (webSearchRequests / 1,000) * webSearchPrice

Returns the sum, or null if no active pricing is found

Embedding cost calculation (calculateEmbeddingCost) follows the same pattern but only uses input token pricing.

The estimated cost is stored as a numeric string in the database. It is an estimate because:

Gemini token counts are approximated (~1 token per 4 chars)
Pricing may not be seeded for new models
Cache pricing depends on whether the cache hit was from a previous call in the same window

Content Extraction

The extraction.ts module provides helpers that parse Anthropic response content blocks into structured data for storage:

`extractThinkingBlocks(response)`

Returns an array of thinking entries. Handles both normal thinking blocks and redacted thinking (which Anthropic returns when safety filtering triggers):

[
  { thinking: "Let me analyze this step by step...", tokens: 1234 },
  { thinking: "[redacted by Anthropic]", redacted: true },
];

`extractToolUseDetails(response)`

Returns an array of tool use details for all tool-related content blocks:

[
  { toolName: "record_entity", toolId: "toolu_01...", input: {...}, output: null, type: "user_tool" },
  { toolName: "web_search", toolId: "srvtoolu_01...", input: null, output: {...}, type: "web_search_result" },
]

Supported block types: tool_use, server_tool_use, web_search_tool_result, web_fetch_tool_result, code_execution_tool_result, bash_code_execution_tool_result, text_editor_code_execution_tool_result, tool_search_tool_result.

`extractServerToolUsage(response)`

Returns counts of server-side tool invocations from the usage field:

{ webSearchRequests: 3, webFetchRequests: 1 }

Stop Reason Mapping

The Anthropic API returns various stop reasons. mapStopReason() normalizes them to the enum values stored in ai_usage.stopReason:

API Value	Stored Value	Meaning
`end_turn`	`end_turn`	Model finished naturally
`max_tokens`	`max_tokens`	Hit token limit
`stop_sequence`	`stop_sequence`	Hit a stop sequence
`tool_use`	`tool_use`	Model wants to use a tool
`pause_turn`	`pause_turn`	Multi-turn pause
`refusal`	`refusal`	Model refused the request
`null` or unknown	`error`	Unexpected stop

Correlation IDs

The correlationId field links related AI calls across a multi-step workflow. The primary use case today is contextual retrieval, where a single document chunk generates two linked calls:

Claude Haiku generates the context prefix (feature: "hybrid-search-contextual-retrieval")
Gemini embeds the context + chunk (feature: "hybrid-search-index")

Both calls share the same correlationId (a UUID), enabling cost analysis queries like "what is the total cost per document chunk, including both context generation and embedding?"

Observability

All usage data is accessible through:

AI Logs Admin View -- dashboard with KPI cards, filterable call log, and detail drill-down
Structured logs -- every recording writes an info-level log with key metrics (model, feature, tokens, cost, latency)
Database queries -- ai_usage and ai_call_details are standard tenant-scoped tables queryable via Drizzle

Recording Pipeline

Completion Recording (recordCompletion)

Failure Recording (recordFailure)

Embedding Recording (recordEmbedding)