ADR-010: MCP-First Architecture + Plan Engine

Decision record for the MCP-first research architecture with a custom PostgreSQL-backed plan engine over server-side LLM orchestration or off-the-shelf workflow frameworks.

Status: Accepted Date: 2026-03-21 (product spec, TRO-53 maturity analysis), implemented 2026-03-28 (TRO-18 MCP server), 2026-04-01 (TRO-66 plan engine) Deciders: Kyle Olson (Solo Founder)

Decision

Use an MCP-first architecture where Trovella acts as the project manager and external AI platforms (Claude Code, ChatGPT) do all reasoning. The research workflow uses a PostgreSQL-backed plan engine state machine with 6 plan states, 6 step states, and pure-function transition logic -- not a server-side LLM orchestration framework.

Key choices:

Transport: Streamable HTTP via Next.js route (/api/mcp), stateless per-request
Auth: PAT (trov_ prefix, SHA-256 hashed) at MVP; OAuth 2.1 deferred
Plan engine: Pure functions in packages/mcp/src/plan-engine/, no framework
Tools: 18 MCP tools across 5 categories
Execution model: Pull-based -- AI platform calls get_next_step, does the work, reports back
Audit: Two immutable logs -- plan_audit_log (business events) + mcp_tool_call_log (every tool call)

Context

Trovella's core product is AI-enhanced research. The fundamental question was: where does the LLM reasoning happen?

AI companies (Anthropic, OpenAI, Google) subsidize consumer subscriptions with VC funding -- users get 10-12x more compute through their $20/month subscriptions than they could purchase via API. Server-side API calls cost $3-15/M tokens; Claude Code reasoning is free to Trovella. This is not a cost optimization -- it is the founding product hypothesis.

The founder explored a traditional web chat UI approach in a separate POC project before Trovella. That experience made the cost problem visceral: replicating research workflows via direct API calls would cost ~$2,500/month per power user vs $0 with MCP-first.

The MCP server is not a feature of the product -- it is the primary delivery mechanism. The web UI handles onboarding, preferences, and admin dashboards, but day-to-day value delivery happens inside Claude Code and ChatGPT via MCP.

Alternatives Considered

Traditional Web Chat UI with Server-Side LLM Calls

Full control over UX, simpler mental model. Rejected because Trovella bears the full cost of LLM API calls -- $3-15/M tokens per user session. At power user levels, ~$2,500/month per user vs $0 with MCP-first. Users also must learn a new interface instead of using their preferred AI tool.

LangGraph / Push-Based Orchestration Framework

Mature graph-based workflow orchestration with built-in state management. Rejected because push-based models assume the server calls agents and pays for API tokens -- the inverse of Trovella's model where the AI platform pulls work from the server.

XState for State Machine

Formal state machine with visual editor, guards, actions, and parallel states. Deferred, not rejected. The current pure-function implementation is sufficient for ~10 transitions and more transparent to AI code generation. XState is the planned upgrade path when complexity demands it (10+ step types or 5+ branching conditions).

Standalone MCP Server (apps/mcp-server on Cloud Run)

Independent scaling and deployment lifecycle. Rejected because it matches the existing monorepo pattern (@repo/api -> /api/trpc, @repo/mcp -> /api/mcp) and premature for MVP. Can be extracted later if scaling demands it.

Implementation

Three-Layer Architecture

Skills (the instructor): Markdown files in .claude/skills/research/ -- not code, not server-side
AI Platform (the researcher): Does all reasoning, web searches, synthesis -- zero server LLM calls for the research phase
MCP Tools (the project manager): Stores plans, provides instructions, validates results, enforces structure

Pull-Based Execution

The AI platform calls get_next_step, server returns instructions, AI platform does reasoning, calls submit_step_result, server validates and advances state machine. The server never initiates work.

Plan Engine as Pure Functions

Four modules in packages/mcp/src/plan-engine/:

plan-transitions.ts -- valid transitions, transitionPlan/Step, derivePlanStatus
branching.ts -- safe expression evaluator (comparison operators only, no eval())
stall-detection.ts -- identifies steps stuck in in_progress beyond a threshold
index.ts -- public API re-exports

See State Machines for the full transition maps, Branching for expression evaluation, and Stall Detection for threshold behavior.

18 MCP Tools

Category	Tools
Plan Management (5)	`create_research_plan`, `get_next_step`, `get_plan_status`, `modify_plan`, `list_active_plans`
Step Execution (4)	`submit_step_result`, `get_step_context`, `request_user_review`, `submit_user_decision`
Knowledge Storage (3)	`store_research`, `search_sources`, `store_research_output`
Skill Tracking (3)	`log_skill_execution`, `submit_subagent_report`, `submit_research_feedback`
Data Extraction + Utility (3)	`extract_data`, `get_research_context`, `ping`

Two-Tier Audit Logging

plan_audit_log -- 8 business event types, written inside the transaction via writeAuditLog()
mcp_tool_call_log -- every tool call except ping, fire-and-forget via withToolCallLogging() middleware

Consequences

Positive

$0 LLM cost for primary use case -- all reasoning happens in the user's subscription
Zero adoption friction -- users stay in their preferred AI platform
Cross-session durability -- PostgreSQL state survives disconnects, stalls, partial failures
Human-in-the-loop -- request_user_review / submit_user_decision enable checkpoints
Platform-agnostic -- MCP protocol works with Claude Code, ChatGPT, Gemini

Negative

MCP protocol maturity -- emerging standard, protocol changes could require tool rewrites
No web UI research path at MVP -- users without MCP support cannot use research features
Economics assumption depends on continued AI subscription subsidies
Three-layer architecture has cognitive overhead for developers and AI agents
Pull-based testing requires an actual MCP client driving the loop

Risks

MCP protocol adoption stalls -- mitigated by adoption from Anthropic, OpenAI, Google as of early 2026
Plan engine complexity growth -- mitigated by XState upgrade path at explicit trigger
Growing server-side LLM surface -- mitigated by @repo/ai usage tracking and the principle that server-side calls handle targeted operations, not reasoning

References

Architecture: Research Engine -- full Mermaid diagram of the plan flow
packages/mcp/CLAUDE.md -- package conventions
Linear: TRO-18 (MCP server), TRO-66 (Plan engine), TRO-53 (Research maturity), TRO-67 (8-phase workflow), TRO-101 (Tool call logging)
Related: ADR-002 (Auth/PATs), ADR-006 (Inngest), ADR-007 (AI wrapper), ADR-009 (Search)