Trovella Wiki

Hybrid Search View

Admin view for debugging hybrid search queries and monitoring search index health.

The Hybrid Search view (/admin/search) provides two tabs: a Search Debugger for testing queries and comparing ranking algorithms, and an Indexing & Health dashboard for monitoring the vector index.

Search Debugger Tab

The debugger lets you submit a search query and see the results from each stage of the hybrid search pipeline side by side.

How It Works

When you type a query and click Search, the hybridSearch.debugSearch procedure:

  1. Classifies the query type (via classifyQuery from @repo/search)
  2. Generates a query embedding (via ctx.ai.embedQuery)
  3. Runs keyword search (Typesense BM25) and semantic search (pgvector) in parallel
  4. Fuses results using Reciprocal Rank Fusion (RRF, k=60)
  5. Returns all three result sets separately

Result Panels

The debugger displays three side-by-side panels:

Keyword (BM25)

Results from Typesense full-text search. Each result shows:

  • Rank position
  • Source table badge (e.g., artifact, output, extraction)
  • Text snippet (first 120 characters)
  • BM25 score

Semantic (pgvector)

Results from vector similarity search using the query embedding against document_chunk.embedding (halfvec 1536 dimensions, HNSW inner product index). Each result shows:

  • Rank position
  • Source table badge
  • Embedded text snippet (first 200 characters)
  • Cosine similarity score

Fused (RRF)

The merged result set after Reciprocal Rank Fusion. Each result shows:

  • Rank position
  • Source table badge
  • Text snippet
  • RRF score
  • Source indicators: KW badge if the result appeared in keyword results, SEM badge if it appeared in semantic results

After searching, badges above the panels show the query classification type and the count of results in each category.

Indexing & Health Tab

This tab monitors the state of the search index and provides configuration visibility.

KPI Cards

Four summary cards:

CardMetricSource
Total ChunksCount of indexed document chunkshybridSearch.stats
Source TablesNumber of distinct content types indexedhybridSearch.stats
AI Calls (All)Total AI API calls across the systemaiLogs.summary
Total AI CostCumulative AI spending (includes search embeddings)aiLogs.summary

The stats procedure queries document_chunk grouped by source_table.

Chunks by Content Type

A table showing the distribution of indexed chunks across content types:

ColumnContent
Content TypeHuman-readable label (Research Artifacts, Research Outputs, Extraction Results)
ChunksCount of chunks for that type
% of TotalPercentage of all indexed chunks

Vector Storage Configuration

A reference card showing the current vector storage setup:

SettingValue
Embedding ModelGemini Embedding 2
Dimensions1536 (halfvec)
Index TypeHNSW (inner product)
Estimated MemoryCalculated from total chunks (chunks x 1536 x 2 bytes)
Contextual RetrievalClaude Haiku 4.5
Fusion AlgorithmRRF (k=60)

Backend: hybridSearch Router

The router at packages/api/src/routers/hybrid-search.ts provides four procedures:

The production hybrid search endpoint used by application features. Accepts filters for sourceTable, artifactType, mediaType, planId, userId, and date range. Requires read ability on ResearchArtifact.

debugSearch

The admin-only endpoint used by the Search Debugger. Returns keyword, semantic, and fused results as separate arrays. Requires read ability on Organization.

typeAhead

Prefix search with typo tolerance for autocomplete use cases. Delegates to typeAheadSearch from @repo/search.

stats

Returns chunk counts grouped by source table. Requires read ability on Organization.

Semantic Search Implementation

The semantic search runs a raw SQL query against document_chunk using the pgvector <#> inner product distance operator:

SELECT id, source_table, source_id, embedded_text,
       (embedding <#> $vectorStr::halfvec(1536)) * -1 AS similarity
FROM document_chunk
ORDER BY embedding <#> $vectorStr::halfvec(1536)
LIMIT $limit

The inner product distance is negated to convert to a similarity score (higher is better). Optional source_table filtering is applied when specified.

Component Structure

search-admin-content.tsx
  |-- Tabs
      |-- Search Debugger tab
      |   |-- search-debugger.tsx
      |       |-- Query input form
      |       |-- Three-column ResultPanel layout
      |           |-- Keyword (BM25) results table
      |           |-- Semantic (pgvector) results table
      |           |-- Fused (RRF) results table with source indicators
      |
      |-- Indexing & Health tab
          |-- search-indexing-dashboard.tsx
              |-- IndexingKPICards (4 cards)
              |-- Chunks by Content Type table
              |-- Vector Storage configuration card

On this page