Execution Flow

End-to-end lifecycle of a research skill invocation -- from user prompt through interview, routing, planning, step execution, output delivery, and feedback capture.

Every research request follows the same six-phase lifecycle. The phases are sequential -- each must complete before the next begins. The entire flow is driven by skill prompts that instruct the AI platform what to do, with MCP tools providing persistence and structure.

Phase 1: Interview

The router skill (/research) parses the user's message and extracts eight elements:

Element	Description	Always extracted?
`topic`	The core research subject	Yes
`researchType`	`general_topic`, `competitive_analysis`, or `decision_support`	Yes
`scope`	Breadth and depth boundaries	Yes
`depthSignals`	Words indicating desired depth ("quick", "thorough", etc.)	Yes
`decisionContext`	What decision this research informs	If applicable
`knownContext`	What the user already knows	If provided
`outputMediaType`	Preferred output format	Asked if not detectable
`outputFormattingInstructions`	Styling, layout, or length requirements	If provided

Clarification Rules

The skill follows strict rules about when to ask follow-up questions:

Proceed without asking when the question is focused and self-contained, the depth is clear from context, or there is enough information to build a plan. When proceeding without questions, the skill stores a skipQuestionsReason explaining why.

Ask 1-2 questions when the topic is ambiguous, the scope is unclear, or decision criteria are missing. Never more than 2. The question-answer exchanges are stored as clarifyingQuestions: [{ question, answer }].

Phase 2: Route

Based on the interview extraction, the router classifies the request as scan or deep. The user never sees this decision. See Routing Logic for the full classification criteria.

After deciding, the router composes a routingRationale string explaining the full decision logic -- how many options, how many dimensions, ambiguity level, depth signals, and the final reasoning. This rationale is stored in skill execution metadata for later analysis of routing accuracy.

Phase 3: Initialize and Check Prior Research

Two things happen in sequence:

Tracking Initialization

The router calls log_skill_execution to create a tracking record:

log_skill_execution({
  skillName: "research",
  status: "started",
  metadata: {
    topic, researchType, scope, depthSignals,
    decisionContext, knownContext,
    outputMediaType, outputFormattingInstructions,
    clarifyingQuestions OR skipQuestionsReason,
    routingRationale,
    delegatedTo: "scan" or "deep",
    originalQuery: "<user's message, up to 5000 chars>"
  }
})

This returns an executionId that is used for all subsequent tracking updates.

Prior Research Check

The skill calls search_sources({ query: "<key terms>", limit: 5 }) to find existing research artifacts. The result determines three possible paths:

Artifacts found	User decision	What happens
0	(automatic)	Proceed silently to planning
1-5	Use existing	Session ends -- skill execution marked completed with `closedReason`
1-5	Refresh	Proceed to planning with user's additional context
1-5	Unique	Proceed to planning as new research

When artifacts are found, each is presented with its original research prompt, date, and summary. The user's decision and the full exchange are stored in priorResearchCheck metadata.

Phase 4: Plan Creation

The skill designs a research plan based on the research type and routed mode. Plan structure varies by type (see Skill Definitions for templates).

The skill calls create_research_plan with:

name -- formatted as [Scan] <topic> or [Deep] <topic>
researchQuestion -- the core question from the interview
steps -- ordered list of { stepType, instructions } entries
branchingConditions -- quality gates (deep only)
planDesignRationale -- explanation of why this plan structure was chosen
outputFormattingNotes -- initial ideas for output layout
sessionId -- for audit tracking

Auto-Linking

When create_research_plan receives a sessionId, the server automatically finds the most recent started skill execution for that session and updates it with the planId and status: "executing". This eliminates a separate log_skill_execution call that the AI platform might forget.

The auto-link logic in packages/mcp/src/tools/create-research-plan.ts:

Query skill_execution WHERE claude_code_session_id = sessionId AND status = 'started' ORDER BY created_at DESC LIMIT 1
If found: update plan_id, status = 'executing', merge planDesignRationale into metadata
If not found: no-op (plan still works, just not linked to a skill execution)

Phase 5: Step Execution

The skill enters the pull-based execution loop. This is the core research cycle:

LOOP:
  response = get_next_step({ planId })

  IF response.status == "plan_complete": EXIT LOOP
  IF response.status == "awaiting_review": EXIT LOOP (wait for user)
  IF response.status == "plan_failed": EXIT LOOP (handle error)

  step = response.step

  IF step.stepOrder > 1:
    context = get_step_context({ planId, stepId })

  // AI platform does the actual research work here
  // (searches, reads, analyzes, reasons, writes)

  submit_step_result({
    planId, stepId,
    result: <structured>,
    confidence: <0-1>,
    stepExecutionReport: { thinking, webSearches, webFetches, otherToolCalls, subagents },
    outputFormattingNotes: <optional>
  })

  // Store significant outputs
  IF meaningful output:
    store_research({ artifactType, title, content, contentText, confidence, planId, stepId })

END LOOP

Step Types and AI Platform Behavior

Each step type instructs the AI platform to adopt a different analytical perspective:

Step Type	What the AI platform does
`search`	Investigate using knowledge, web search, and `search_sources` for existing artifacts
`extract`	Use `extract_data` for structured extraction or structure data manually
`analyze`	Review prior steps, identify patterns, assess confidence
`critique`	Challenge the analysis -- missing info, biases, alternative interpretations
`synthesize`	Combine all findings into the final deliverable
`checkpoint`	Present findings to the user via `request_user_review`, pause for input
`custom`	Follow step-specific instructions (typically follow-up on checkpoint feedback)

Step Execution Reports

Every step submission includes a stepExecutionReport that captures the AI platform's complete reasoning process:

thinking -- reasoning narrative, hypotheses, decisions, discarded approaches, confidence factors
webSearches -- exact queries, result counts, which results were used and why
webFetches -- URLs fetched, content summaries, extracted facts
otherToolCalls -- other tool invocations with inputs and results
subagents -- subagent delegations with task descriptions and linked report IDs

This is the primary observability mechanism for research quality. All fields are required even if empty (use empty arrays, not omitted fields).

Dynamic Plan Modification

During execution, the AI platform can modify the plan when circumstances change:

Add steps -- when gaps are discovered (especially after critique)
Remove steps -- when a pending step becomes irrelevant
Reorder steps -- when execution order should change
Update instructions -- when step context has changed
Fail a step -- when a step cannot be completed (plan continues)

Every modification requires a modificationRationale explaining why the change was needed.

Phase 6: Output Delivery and Feedback

When get_next_step returns plan_complete, the skill invokes /research-output to generate the final deliverable.

Delivery Flow

Load full research context via get_research_context
Generate formatted output according to user's media type preference
Store the deliverable via store_research_output for later retrieval
Present to the user
Ask for feedback: "How did this turn out? Anything I should have done differently?"

Feedback Collection

Two feedback moments bracket the end of the research experience:

Initial feedback -- captured right after delivery. The skill waits for the user's response and records their satisfaction level, feedback summary, improvement suggestions, and whether follow-up research was requested.

Closing feedback -- captured when the conversation shifts away from this research topic. This captures accumulated signals from the post-delivery discussion. Skipped if initial feedback already captured everything.

Both are persisted via submit_research_feedback to the research_feedback table.

Tracking Completion

After output delivery and feedback, the skill updates the execution record:

log_skill_execution({
  executionId,
  skillName: "research",
  status: "completed",
  metadata: { stepsCompleted, artifactsStored, checkpointFeedback }
})

Error Handling

If any step fails during execution:

Check get_plan_status({ planId }) to diagnose the failure
If recoverable: use modify_plan to skip or replace the step, continue the loop
If not recoverable: present what was gathered so far, indicate where research was interrupted
Always log the failure: log_skill_execution({ executionId, status: "failed", errorMessage }) with the skill execution status set to failed

The output skill has its own error handling: if store_research_output fails, the content is still presented to the user. If submit_research_feedback fails, it is logged silently to avoid disrupting the user experience.

Cross-Session Resume

Research plans persist server-side, enabling resume across sessions:

Call list_active_plans() to find non-terminal plans
Present active plans and let the user choose
Call get_research_context({ planId, sessionId }) to load full state (writes session_resumed audit log)
Call get_next_step({ planId }) to pick up where it left off
Continue the execution loop

This works across AI tools -- research started in Claude Code can be resumed from any MCP-compatible client.

Skill Definitions -- plan templates and step types per skill
Routing Logic -- how the router decides between scan and deep
Lifecycle Tracking -- database schema and admin API
Plan Orchestration -- the state machine governing plan and step transitions
Tool Protocol -- MCP authentication, tool catalog, and logging middleware