Usage Tracking
How every AI call is recorded to the database -- the recording pipeline, cost calculation, failure recording, and the data model.
Every AI call -- successful or failed, chat or embedding -- writes a usage record to the database. This is automatic; callers do not need to opt in. The recording pipeline is the enforcement mechanism for cost visibility and debugging.
Recording Pipeline
Completion Recording (recordCompletion)
Called after every successful chatCompletion() or streamChat() call, and for each succeeded result in a batch.
The function performs these steps in a single database transaction:
-
Extract metadata from the response:
- Thinking blocks (including redacted thinking) via
extractThinkingBlocks() - Tool use details (custom tools, server tools, web search/fetch results, code execution results) via
extractToolUseDetails() - Server tool usage counts (web search requests, web fetch requests) via
extractServerToolUsage() - Container info (sandbox ID and expiry, if present)
- Thinking blocks (including redacted thinking) via
-
Calculate estimated cost by looking up active pricing from
ai_model_pricing:- Input token cost
- Output token cost
- Cache read token cost
- Cache write token cost
- Web search per-request cost (if applicable)
-
Write
ai_usagerow with token counts, cost estimate, latency, feature, model, and stop reason -
Write
ai_call_detailsrow with the system prompt, full request params, full response content, thinking blocks, and tool use details
Both inserts happen in the same transaction -- either both succeed or neither does.
Failure Recording (recordFailure)
Called when chatCompletion() or streamChat() throws an error. The failure recording:
- Writes
ai_usagewith 0 tokens,stopReason: "error", and the measured latency - Writes
ai_call_detailswith the full request params (so you can replay the failed call) and an empty response with the error code as the stop reason - Logs the failure at
errorlevel with the feature, model, error code, and message
Failure recording itself is wrapped in a .catch() so that a recording failure does not mask the original AI error thrown to the caller.
Embedding Recording (recordEmbedding)
Called after every embedContent(), embedDocument(), or embedQuery() call. Writes a single ai_usage row with:
- Estimated input tokens (calculated as
ceil(totalChars / 4)since Gemini does not always return token counts) - Embedding count (number of vectors generated)
stopReason: "end_turn"(embeddings always complete)- No
ai_call_detailsrow (embedding requests are simple enough that the usage row suffices)
Data Model
Usage data is stored across two tenant-scoped tables. See Data & Storage -- Schema Design for the full schema definitions and Reference Data for the global model registry.
ai_usage -- Per-Call Metrics
| Column | Type | Purpose |
|---|---|---|
id | text (UUID) | Primary key |
organizationId | text | Tenant scope (RLS) |
userId | text | Who made the call |
feature | text | Cost attribution tag (e.g., "research-chat") |
messageId | text | Anthropic message ID (null for embeddings/failures) |
batchId | text | Links to ai_batch for batch-processed calls |
model | text | Model used (e.g., "claude-sonnet-4-6") |
inputTokens | integer | Input tokens consumed |
outputTokens | integer | Output tokens generated |
cacheReadTokens | integer | Tokens served from prompt cache |
cacheWriteTokens | integer | Tokens written to prompt cache |
stopReason | enum | end_turn, max_tokens, stop_sequence, tool_use, pause_turn, refusal, error |
stopSequence | text | The stop sequence matched (if applicable) |
latencyMs | integer | Total call duration |
ttftMs | integer | Time to first token (streaming only) |
streaming | boolean | Whether this was a streaming call |
estimatedCost | numeric | Dollar cost estimate from model pricing |
embeddingCount | integer | Number of embeddings generated (embed calls only) |
webSearchRequests | integer | Server-side web search invocations |
webFetchRequests | integer | Server-side web fetch invocations |
containerId | text | Code execution container ID |
containerExpiresAt | timestamp | Container expiry time |
correlationId | text | Links related calls in multi-step workflows |
ai_call_details -- Full Request/Response
| Column | Type | Purpose |
|---|---|---|
id | text (UUID) | Primary key |
usageId | text | FK to ai_usage |
organizationId | text | Tenant scope (RLS) |
systemPrompt | text or jsonb | System prompt in its original format |
request | jsonb | Full request params (messages, tools, config) |
response | jsonb | Full response (content blocks, stop reason) |
thinkingBlocks | jsonb | Array of { thinking, tokens?, redacted? } |
toolUseDetails | jsonb | Array of { toolName, toolId, input, output, type } |
cacheControlEnabled | boolean | Whether prompt caching was active |
Cost Calculation
Estimated cost is calculated by calculateEstimatedCost() in recording.ts. The function:
- Queries all
ai_model_pricingrows for the model - Filters to rows where the current date falls within
effectiveFromandeffectiveUntil - Calculates cost per pricing type:
input cost = (inputTokens / 1,000,000) * inputPrice
output cost = (outputTokens / 1,000,000) * outputPrice
cache read = (cacheReadTokens / 1,000,000) * cacheReadPrice
cache write = (cacheWriteTokens / 1,000,000) * cacheWritePrice
web search = (webSearchRequests / 1,000) * webSearchPrice
- Returns the sum, or
nullif no active pricing is found
Embedding cost calculation (calculateEmbeddingCost) follows the same pattern but only uses input token pricing.
The estimated cost is stored as a numeric string in the database. It is an estimate because:
- Gemini token counts are approximated (~1 token per 4 chars)
- Pricing may not be seeded for new models
- Cache pricing depends on whether the cache hit was from a previous call in the same window
Content Extraction
The extraction.ts module provides helpers that parse Anthropic response content blocks into structured data for storage:
extractThinkingBlocks(response)
Returns an array of thinking entries. Handles both normal thinking blocks and redacted thinking (which Anthropic returns when safety filtering triggers):
[
{ thinking: "Let me analyze this step by step...", tokens: 1234 },
{ thinking: "[redacted by Anthropic]", redacted: true },
];
extractToolUseDetails(response)
Returns an array of tool use details for all tool-related content blocks:
[
{ toolName: "record_entity", toolId: "toolu_01...", input: {...}, output: null, type: "user_tool" },
{ toolName: "web_search", toolId: "srvtoolu_01...", input: null, output: {...}, type: "web_search_result" },
]
Supported block types: tool_use, server_tool_use, web_search_tool_result, web_fetch_tool_result, code_execution_tool_result, bash_code_execution_tool_result, text_editor_code_execution_tool_result, tool_search_tool_result.
extractServerToolUsage(response)
Returns counts of server-side tool invocations from the usage field:
{ webSearchRequests: 3, webFetchRequests: 1 }
Stop Reason Mapping
The Anthropic API returns various stop reasons. mapStopReason() normalizes them to the enum values stored in ai_usage.stopReason:
| API Value | Stored Value | Meaning |
|---|---|---|
end_turn | end_turn | Model finished naturally |
max_tokens | max_tokens | Hit token limit |
stop_sequence | stop_sequence | Hit a stop sequence |
tool_use | tool_use | Model wants to use a tool |
pause_turn | pause_turn | Multi-turn pause |
refusal | refusal | Model refused the request |
null or unknown | error | Unexpected stop |
Correlation IDs
The correlationId field links related AI calls across a multi-step workflow. The primary use case today is contextual retrieval, where a single document chunk generates two linked calls:
- Claude Haiku generates the context prefix (
feature: "hybrid-search-contextual-retrieval") - Gemini embeds the context + chunk (
feature: "hybrid-search-index")
Both calls share the same correlationId (a UUID), enabling cost analysis queries like "what is the total cost per document chunk, including both context generation and embedding?"
Observability
All usage data is accessible through:
- AI Logs Admin View -- dashboard with KPI cards, filterable call log, and detail drill-down
- Structured logs -- every recording writes an
info-level log with key metrics (model, feature, tokens, cost, latency) - Database queries --
ai_usageandai_call_detailsare standard tenant-scoped tables queryable via Drizzle