Hybrid Search View

Admin view for debugging hybrid search queries and monitoring search index health.

The Hybrid Search view (/admin/search) provides two tabs: a Search Debugger for testing queries and comparing ranking algorithms, and an Indexing & Health dashboard for monitoring the vector index.

Search Debugger Tab

The debugger lets you submit a search query and see the results from each stage of the hybrid search pipeline side by side.

How It Works

When you type a query and click Search, the hybridSearch.debugSearch procedure:

Classifies the query type (via classifyQuery from @repo/search)
Generates a query embedding (via ctx.ai.embedQuery)
Runs keyword search (Typesense BM25) and semantic search (pgvector) in parallel
Fuses results using Reciprocal Rank Fusion (RRF, k=60)
Returns all three result sets separately

Result Panels

The debugger displays three side-by-side panels:

Keyword (BM25)

Results from Typesense full-text search. Each result shows:

Rank position
Source table badge (e.g., artifact, output, extraction)
Text snippet (first 120 characters)
BM25 score

Semantic (pgvector)

Results from vector similarity search using the query embedding against document_chunk.embedding (halfvec 1536 dimensions, HNSW inner product index). Each result shows:

Rank position
Source table badge
Embedded text snippet (first 200 characters)
Cosine similarity score

Fused (RRF)

The merged result set after Reciprocal Rank Fusion. Each result shows:

Rank position
Source table badge
Text snippet
RRF score
Source indicators: KW badge if the result appeared in keyword results, SEM badge if it appeared in semantic results

After searching, badges above the panels show the query classification type and the count of results in each category.

Indexing & Health Tab

This tab monitors the state of the search index and provides configuration visibility.

KPI Cards

Four summary cards:

Card	Metric	Source
Total Chunks	Count of indexed document chunks	`hybridSearch.stats`
Source Tables	Number of distinct content types indexed	`hybridSearch.stats`
AI Calls (All)	Total AI API calls across the system	`aiLogs.summary`
Total AI Cost	Cumulative AI spending (includes search embeddings)	`aiLogs.summary`

The stats procedure queries document_chunk grouped by source_table.

Chunks by Content Type

A table showing the distribution of indexed chunks across content types:

Column	Content
Content Type	Human-readable label (Research Artifacts, Research Outputs, Extraction Results)
Chunks	Count of chunks for that type
% of Total	Percentage of all indexed chunks

Vector Storage Configuration

A reference card showing the current vector storage setup:

Setting	Value
Embedding Model	Gemini Embedding 2
Dimensions	1536 (halfvec)
Index Type	HNSW (inner product)
Estimated Memory	Calculated from total chunks (chunks x 1536 x 2 bytes)
Contextual Retrieval	Claude Haiku 4.5
Fusion Algorithm	RRF (k=60)

Backend: hybridSearch Router

The router at packages/api/src/routers/hybrid-search.ts provides four procedures:

`search`

The production hybrid search endpoint used by application features. Accepts filters for sourceTable, artifactType, mediaType, planId, userId, and date range. Requires read ability on ResearchArtifact.

`debugSearch`

The admin-only endpoint used by the Search Debugger. Returns keyword, semantic, and fused results as separate arrays. Requires read ability on Organization.

`typeAhead`

Prefix search with typo tolerance for autocomplete use cases. Delegates to typeAheadSearch from @repo/search.

`stats`

Returns chunk counts grouped by source table. Requires read ability on Organization.

Semantic Search Implementation

The semantic search runs a raw SQL query against document_chunk using the pgvector <#> inner product distance operator:

SELECT id, source_table, source_id, embedded_text,
       (embedding <#> $vectorStr::halfvec(1536)) * -1 AS similarity
FROM document_chunk
ORDER BY embedding <#> $vectorStr::halfvec(1536)
LIMIT $limit

The inner product distance is negated to convert to a similarity score (higher is better). Optional source_table filtering is applied when specified.

Component Structure

search-admin-content.tsx
  |-- Tabs
      |-- Search Debugger tab
      |   |-- search-debugger.tsx
      |       |-- Query input form
      |       |-- Three-column ResultPanel layout
      |           |-- Keyword (BM25) results table
      |           |-- Semantic (pgvector) results table
      |           |-- Fused (RRF) results table with source indicators
      |
      |-- Indexing & Health tab
          |-- search-indexing-dashboard.tsx
              |-- IndexingKPICards (4 cards)
              |-- Chunks by Content Type table
              |-- Vector Storage configuration card

Admin Dashboard Overview -- all admin views and shared patterns
Drizzle Patterns -- query patterns used in the search router
Authorization -- CASL checks on Organization and ResearchArtifact subjects