Trovella Wiki

Usage Tracking

How every AI call is recorded to the database -- the recording pipeline, cost calculation, failure recording, and the data model.

Every AI call -- successful or failed, chat or embedding -- writes a usage record to the database. This is automatic; callers do not need to opt in. The recording pipeline is the enforcement mechanism for cost visibility and debugging.

Recording Pipeline

Completion Recording (recordCompletion)

Called after every successful chatCompletion() or streamChat() call, and for each succeeded result in a batch.

The function performs these steps in a single database transaction:

  1. Extract metadata from the response:

    • Thinking blocks (including redacted thinking) via extractThinkingBlocks()
    • Tool use details (custom tools, server tools, web search/fetch results, code execution results) via extractToolUseDetails()
    • Server tool usage counts (web search requests, web fetch requests) via extractServerToolUsage()
    • Container info (sandbox ID and expiry, if present)
  2. Calculate estimated cost by looking up active pricing from ai_model_pricing:

    • Input token cost
    • Output token cost
    • Cache read token cost
    • Cache write token cost
    • Web search per-request cost (if applicable)
  3. Write ai_usage row with token counts, cost estimate, latency, feature, model, and stop reason

  4. Write ai_call_details row with the system prompt, full request params, full response content, thinking blocks, and tool use details

Both inserts happen in the same transaction -- either both succeed or neither does.

Failure Recording (recordFailure)

Called when chatCompletion() or streamChat() throws an error. The failure recording:

  • Writes ai_usage with 0 tokens, stopReason: "error", and the measured latency
  • Writes ai_call_details with the full request params (so you can replay the failed call) and an empty response with the error code as the stop reason
  • Logs the failure at error level with the feature, model, error code, and message

Failure recording itself is wrapped in a .catch() so that a recording failure does not mask the original AI error thrown to the caller.

Embedding Recording (recordEmbedding)

Called after every embedContent(), embedDocument(), or embedQuery() call. Writes a single ai_usage row with:

  • Estimated input tokens (calculated as ceil(totalChars / 4) since Gemini does not always return token counts)
  • Embedding count (number of vectors generated)
  • stopReason: "end_turn" (embeddings always complete)
  • No ai_call_details row (embedding requests are simple enough that the usage row suffices)

Data Model

Usage data is stored across two tenant-scoped tables. See Data & Storage -- Schema Design for the full schema definitions and Reference Data for the global model registry.

ai_usage -- Per-Call Metrics

ColumnTypePurpose
idtext (UUID)Primary key
organizationIdtextTenant scope (RLS)
userIdtextWho made the call
featuretextCost attribution tag (e.g., "research-chat")
messageIdtextAnthropic message ID (null for embeddings/failures)
batchIdtextLinks to ai_batch for batch-processed calls
modeltextModel used (e.g., "claude-sonnet-4-6")
inputTokensintegerInput tokens consumed
outputTokensintegerOutput tokens generated
cacheReadTokensintegerTokens served from prompt cache
cacheWriteTokensintegerTokens written to prompt cache
stopReasonenumend_turn, max_tokens, stop_sequence, tool_use, pause_turn, refusal, error
stopSequencetextThe stop sequence matched (if applicable)
latencyMsintegerTotal call duration
ttftMsintegerTime to first token (streaming only)
streamingbooleanWhether this was a streaming call
estimatedCostnumericDollar cost estimate from model pricing
embeddingCountintegerNumber of embeddings generated (embed calls only)
webSearchRequestsintegerServer-side web search invocations
webFetchRequestsintegerServer-side web fetch invocations
containerIdtextCode execution container ID
containerExpiresAttimestampContainer expiry time
correlationIdtextLinks related calls in multi-step workflows

ai_call_details -- Full Request/Response

ColumnTypePurpose
idtext (UUID)Primary key
usageIdtextFK to ai_usage
organizationIdtextTenant scope (RLS)
systemPrompttext or jsonbSystem prompt in its original format
requestjsonbFull request params (messages, tools, config)
responsejsonbFull response (content blocks, stop reason)
thinkingBlocksjsonbArray of { thinking, tokens?, redacted? }
toolUseDetailsjsonbArray of { toolName, toolId, input, output, type }
cacheControlEnabledbooleanWhether prompt caching was active

Cost Calculation

Estimated cost is calculated by calculateEstimatedCost() in recording.ts. The function:

  1. Queries all ai_model_pricing rows for the model
  2. Filters to rows where the current date falls within effectiveFrom and effectiveUntil
  3. Calculates cost per pricing type:
input cost    = (inputTokens / 1,000,000) * inputPrice
output cost   = (outputTokens / 1,000,000) * outputPrice
cache read    = (cacheReadTokens / 1,000,000) * cacheReadPrice
cache write   = (cacheWriteTokens / 1,000,000) * cacheWritePrice
web search    = (webSearchRequests / 1,000) * webSearchPrice
  1. Returns the sum, or null if no active pricing is found

Embedding cost calculation (calculateEmbeddingCost) follows the same pattern but only uses input token pricing.

The estimated cost is stored as a numeric string in the database. It is an estimate because:

  • Gemini token counts are approximated (~1 token per 4 chars)
  • Pricing may not be seeded for new models
  • Cache pricing depends on whether the cache hit was from a previous call in the same window

Content Extraction

The extraction.ts module provides helpers that parse Anthropic response content blocks into structured data for storage:

extractThinkingBlocks(response)

Returns an array of thinking entries. Handles both normal thinking blocks and redacted thinking (which Anthropic returns when safety filtering triggers):

[
  { thinking: "Let me analyze this step by step...", tokens: 1234 },
  { thinking: "[redacted by Anthropic]", redacted: true },
];

extractToolUseDetails(response)

Returns an array of tool use details for all tool-related content blocks:

[
  { toolName: "record_entity", toolId: "toolu_01...", input: {...}, output: null, type: "user_tool" },
  { toolName: "web_search", toolId: "srvtoolu_01...", input: null, output: {...}, type: "web_search_result" },
]

Supported block types: tool_use, server_tool_use, web_search_tool_result, web_fetch_tool_result, code_execution_tool_result, bash_code_execution_tool_result, text_editor_code_execution_tool_result, tool_search_tool_result.

extractServerToolUsage(response)

Returns counts of server-side tool invocations from the usage field:

{ webSearchRequests: 3, webFetchRequests: 1 }

Stop Reason Mapping

The Anthropic API returns various stop reasons. mapStopReason() normalizes them to the enum values stored in ai_usage.stopReason:

API ValueStored ValueMeaning
end_turnend_turnModel finished naturally
max_tokensmax_tokensHit token limit
stop_sequencestop_sequenceHit a stop sequence
tool_usetool_useModel wants to use a tool
pause_turnpause_turnMulti-turn pause
refusalrefusalModel refused the request
null or unknownerrorUnexpected stop

Correlation IDs

The correlationId field links related AI calls across a multi-step workflow. The primary use case today is contextual retrieval, where a single document chunk generates two linked calls:

  1. Claude Haiku generates the context prefix (feature: "hybrid-search-contextual-retrieval")
  2. Gemini embeds the context + chunk (feature: "hybrid-search-index")

Both calls share the same correlationId (a UUID), enabling cost analysis queries like "what is the total cost per document chunk, including both context generation and embedding?"

Observability

All usage data is accessible through:

  • AI Logs Admin View -- dashboard with KPI cards, filterable call log, and detail drill-down
  • Structured logs -- every recording writes an info-level log with key metrics (model, feature, tokens, cost, latency)
  • Database queries -- ai_usage and ai_call_details are standard tenant-scoped tables queryable via Drizzle

On this page