Trovella Wiki

Debug Endpoint

The debugSearch tRPC procedure that returns keyword, semantic, and fused results separately for search quality analysis.

The debugSearch procedure exposes the internals of the query pipeline by returning keyword, semantic, and fused results as three separate arrays. It powers the Hybrid Search Admin View and is useful for diagnosing ranking issues, comparing search engine contributions, and tuning relevance.

AspectsearchdebugSearch
ResponseSingle FusedResult[]Three arrays: keyword, semantic, fused
FiltersFull filter set (8 optional fields)Only sourceTable filter
Authorizationread on ResearchArtifactread on Organization (admin only)
Feature tag"hybrid-search-query""hybrid-search-debug"
Use caseProduction searchSearch quality debugging

The debugSearch procedure uses a separate AI feature tag (hybrid-search-debug) so debug embedding costs are tracked independently from production search costs in the ai_usage table.

Implementation

The procedure runs the same pipeline as search but returns the intermediate results:

debugSearch: authorizedProcedure
  .input(
    z.object({
      query: z.string().min(1).max(1000),
      limit: z.number().min(1).max(50).default(20),
      sourceTable: z.enum([
        "research_artifact", "research_output", "extraction_result",
      ]).optional(),
    }),
  )
  .query(async ({ ctx, input }) => {
    if (ctx.ability.cannot("read", "Organization")) {
      throw new TRPCError({ code: "FORBIDDEN" });
    }

    const queryType = classifyQuery(input.query);
    const { embedding } = await ctx.ai.embedQuery(
      input.query, "hybrid-search-debug"
    );

    const [kwResults, semResults] = await Promise.all([
      keywordSearch({
        query: input.query,
        organizationId: ctx.organizationId,
        limit: input.limit,
        sourceTable: input.sourceTable,
      }),
      semanticSearch(ctx.db, embedding, input.limit, input.sourceTable),
    ]);

    // ... map to RankedResult[] ...

    const fused = reciprocalRankFusion(kwRanked, semRanked, input.limit);

    return { queryType, keyword: kwRanked, semantic: semRanked, fused };
  }),

Unlike search, which requests limit * 2 candidates from each engine, debugSearch requests exactly limit from each. This keeps the three columns comparable in the admin UI.

Response Shape

{
  queryType: "keyword" | "balanced" | "semantic";
  keyword: RankedResult[];   // BM25 results with score and rank
  semantic: RankedResult[];  // pgvector results with similarity and rank
  fused: FusedResult[];      // Merged results with RRF score and source flags
}

Each RankedResult includes the raw score from its source engine and the rank position. The FusedResult adds rrfScore, inKeyword, and inSemantic for provenance tracking.

The stats Endpoint

The stats procedure complements debugSearch by providing index-level metrics:

stats: authorizedProcedure.query(async ({ ctx }) => {
  if (ctx.ability.cannot("read", "Organization")) {
    throw new TRPCError({ code: "FORBIDDEN" });
  }

  const rows = await ctx.db
    .select({
      sourceTable: documentChunk.sourceTable,
      count: count(),
    })
    .from(documentChunk)
    .groupBy(documentChunk.sourceTable);

  const totalChunks = rows.reduce((sum, r) => sum + r.count, 0);
  return { totalChunks, bySourceTable: rows };
});

This returns the total number of indexed chunks and the breakdown by source table. The admin view uses this to display KPI cards and the "Chunks by Content Type" table.

Use debugSearch when:

  • Investigating why a specific document ranks high or low
  • Comparing how keyword and semantic engines interpret the same query
  • Evaluating whether a query benefits more from BM25 or vector similarity
  • Testing filter behavior on a new content type

Use search for all production application code -- it returns only the fused results and supports the full filter set.

On this page