Trovella Wiki

ADR-010: MCP-First Architecture + Plan Engine

Decision record for the MCP-first research architecture with a custom PostgreSQL-backed plan engine over server-side LLM orchestration or off-the-shelf workflow frameworks.

Status: Accepted Date: 2026-03-21 (product spec, TRO-53 maturity analysis), implemented 2026-03-28 (TRO-18 MCP server), 2026-04-01 (TRO-66 plan engine) Deciders: Kyle Olson (Solo Founder)

Decision

Use an MCP-first architecture where Trovella acts as the project manager and external AI platforms (Claude Code, ChatGPT) do all reasoning. The research workflow uses a PostgreSQL-backed plan engine state machine with 6 plan states, 6 step states, and pure-function transition logic -- not a server-side LLM orchestration framework.

Key choices:

  • Transport: Streamable HTTP via Next.js route (/api/mcp), stateless per-request
  • Auth: PAT (trov_ prefix, SHA-256 hashed) at MVP; OAuth 2.1 deferred
  • Plan engine: Pure functions in packages/mcp/src/plan-engine/, no framework
  • Tools: 18 MCP tools across 5 categories
  • Execution model: Pull-based -- AI platform calls get_next_step, does the work, reports back
  • Audit: Two immutable logs -- plan_audit_log (business events) + mcp_tool_call_log (every tool call)

Context

Trovella's core product is AI-enhanced research. The fundamental question was: where does the LLM reasoning happen?

AI companies (Anthropic, OpenAI, Google) subsidize consumer subscriptions with VC funding -- users get 10-12x more compute through their $20/month subscriptions than they could purchase via API. Server-side API calls cost $3-15/M tokens; Claude Code reasoning is free to Trovella. This is not a cost optimization -- it is the founding product hypothesis.

The founder explored a traditional web chat UI approach in a separate POC project before Trovella. That experience made the cost problem visceral: replicating research workflows via direct API calls would cost ~$2,500/month per power user vs $0 with MCP-first.

The MCP server is not a feature of the product -- it is the primary delivery mechanism. The web UI handles onboarding, preferences, and admin dashboards, but day-to-day value delivery happens inside Claude Code and ChatGPT via MCP.

Alternatives Considered

Traditional Web Chat UI with Server-Side LLM Calls

Full control over UX, simpler mental model. Rejected because Trovella bears the full cost of LLM API calls -- $3-15/M tokens per user session. At power user levels, ~$2,500/month per user vs $0 with MCP-first. Users also must learn a new interface instead of using their preferred AI tool.

LangGraph / Push-Based Orchestration Framework

Mature graph-based workflow orchestration with built-in state management. Rejected because push-based models assume the server calls agents and pays for API tokens -- the inverse of Trovella's model where the AI platform pulls work from the server.

XState for State Machine

Formal state machine with visual editor, guards, actions, and parallel states. Deferred, not rejected. The current pure-function implementation is sufficient for ~10 transitions and more transparent to AI code generation. XState is the planned upgrade path when complexity demands it (10+ step types or 5+ branching conditions).

Standalone MCP Server (apps/mcp-server on Cloud Run)

Independent scaling and deployment lifecycle. Rejected because it matches the existing monorepo pattern (@repo/api -> /api/trpc, @repo/mcp -> /api/mcp) and premature for MVP. Can be extracted later if scaling demands it.

Implementation

Three-Layer Architecture

  • Skills (the instructor): Markdown files in .claude/skills/research/ -- not code, not server-side
  • AI Platform (the researcher): Does all reasoning, web searches, synthesis -- zero server LLM calls for the research phase
  • MCP Tools (the project manager): Stores plans, provides instructions, validates results, enforces structure

Pull-Based Execution

The AI platform calls get_next_step, server returns instructions, AI platform does reasoning, calls submit_step_result, server validates and advances state machine. The server never initiates work.

Plan Engine as Pure Functions

Four modules in packages/mcp/src/plan-engine/:

  • plan-transitions.ts -- valid transitions, transitionPlan/Step, derivePlanStatus
  • branching.ts -- safe expression evaluator (comparison operators only, no eval())
  • stall-detection.ts -- identifies steps stuck in in_progress beyond a threshold
  • index.ts -- public API re-exports

See State Machines for the full transition maps, Branching for expression evaluation, and Stall Detection for threshold behavior.

18 MCP Tools

CategoryTools
Plan Management (5)create_research_plan, get_next_step, get_plan_status, modify_plan, list_active_plans
Step Execution (4)submit_step_result, get_step_context, request_user_review, submit_user_decision
Knowledge Storage (3)store_research, search_sources, store_research_output
Skill Tracking (3)log_skill_execution, submit_subagent_report, submit_research_feedback
Data Extraction + Utility (3)extract_data, get_research_context, ping

Two-Tier Audit Logging

  • plan_audit_log -- 8 business event types, written inside the transaction via writeAuditLog()
  • mcp_tool_call_log -- every tool call except ping, fire-and-forget via withToolCallLogging() middleware

Consequences

Positive

  • $0 LLM cost for primary use case -- all reasoning happens in the user's subscription
  • Zero adoption friction -- users stay in their preferred AI platform
  • Cross-session durability -- PostgreSQL state survives disconnects, stalls, partial failures
  • Human-in-the-loop -- request_user_review / submit_user_decision enable checkpoints
  • Platform-agnostic -- MCP protocol works with Claude Code, ChatGPT, Gemini

Negative

  • MCP protocol maturity -- emerging standard, protocol changes could require tool rewrites
  • No web UI research path at MVP -- users without MCP support cannot use research features
  • Economics assumption depends on continued AI subscription subsidies
  • Three-layer architecture has cognitive overhead for developers and AI agents
  • Pull-based testing requires an actual MCP client driving the loop

Risks

  • MCP protocol adoption stalls -- mitigated by adoption from Anthropic, OpenAI, Google as of early 2026
  • Plan engine complexity growth -- mitigated by XState upgrade path at explicit trigger
  • Growing server-side LLM surface -- mitigated by @repo/ai usage tracking and the principle that server-side calls handle targeted operations, not reasoning

References

  • Architecture: Research Engine -- full Mermaid diagram of the plan flow
  • packages/mcp/CLAUDE.md -- package conventions
  • Linear: TRO-18 (MCP server), TRO-66 (Plan engine), TRO-53 (Research maturity), TRO-67 (8-phase workflow), TRO-101 (Tool call logging)
  • Related: ADR-002 (Auth/PATs), ADR-006 (Inngest), ADR-007 (AI wrapper), ADR-009 (Search)

On this page