Hybrid Search View
Admin view for debugging hybrid search queries and monitoring search index health.
The Hybrid Search view (/admin/search) provides two tabs: a Search Debugger for testing queries and comparing ranking algorithms, and an Indexing & Health dashboard for monitoring the vector index.
Search Debugger Tab
The debugger lets you submit a search query and see the results from each stage of the hybrid search pipeline side by side.
How It Works
When you type a query and click Search, the hybridSearch.debugSearch procedure:
- Classifies the query type (via
classifyQueryfrom@repo/search) - Generates a query embedding (via
ctx.ai.embedQuery) - Runs keyword search (Typesense BM25) and semantic search (pgvector) in parallel
- Fuses results using Reciprocal Rank Fusion (RRF, k=60)
- Returns all three result sets separately
Result Panels
The debugger displays three side-by-side panels:
Keyword (BM25)
Results from Typesense full-text search. Each result shows:
- Rank position
- Source table badge (e.g.,
artifact,output,extraction) - Text snippet (first 120 characters)
- BM25 score
Semantic (pgvector)
Results from vector similarity search using the query embedding against document_chunk.embedding (halfvec 1536 dimensions, HNSW inner product index). Each result shows:
- Rank position
- Source table badge
- Embedded text snippet (first 200 characters)
- Cosine similarity score
Fused (RRF)
The merged result set after Reciprocal Rank Fusion. Each result shows:
- Rank position
- Source table badge
- Text snippet
- RRF score
- Source indicators: KW badge if the result appeared in keyword results, SEM badge if it appeared in semantic results
After searching, badges above the panels show the query classification type and the count of results in each category.
Indexing & Health Tab
This tab monitors the state of the search index and provides configuration visibility.
KPI Cards
Four summary cards:
| Card | Metric | Source |
|---|---|---|
| Total Chunks | Count of indexed document chunks | hybridSearch.stats |
| Source Tables | Number of distinct content types indexed | hybridSearch.stats |
| AI Calls (All) | Total AI API calls across the system | aiLogs.summary |
| Total AI Cost | Cumulative AI spending (includes search embeddings) | aiLogs.summary |
The stats procedure queries document_chunk grouped by source_table.
Chunks by Content Type
A table showing the distribution of indexed chunks across content types:
| Column | Content |
|---|---|
| Content Type | Human-readable label (Research Artifacts, Research Outputs, Extraction Results) |
| Chunks | Count of chunks for that type |
| % of Total | Percentage of all indexed chunks |
Vector Storage Configuration
A reference card showing the current vector storage setup:
| Setting | Value |
|---|---|
| Embedding Model | Gemini Embedding 2 |
| Dimensions | 1536 (halfvec) |
| Index Type | HNSW (inner product) |
| Estimated Memory | Calculated from total chunks (chunks x 1536 x 2 bytes) |
| Contextual Retrieval | Claude Haiku 4.5 |
| Fusion Algorithm | RRF (k=60) |
Backend: hybridSearch Router
The router at packages/api/src/routers/hybrid-search.ts provides four procedures:
search
The production hybrid search endpoint used by application features. Accepts filters for sourceTable, artifactType, mediaType, planId, userId, and date range. Requires read ability on ResearchArtifact.
debugSearch
The admin-only endpoint used by the Search Debugger. Returns keyword, semantic, and fused results as separate arrays. Requires read ability on Organization.
typeAhead
Prefix search with typo tolerance for autocomplete use cases. Delegates to typeAheadSearch from @repo/search.
stats
Returns chunk counts grouped by source table. Requires read ability on Organization.
Semantic Search Implementation
The semantic search runs a raw SQL query against document_chunk using the pgvector <#> inner product distance operator:
SELECT id, source_table, source_id, embedded_text,
(embedding <#> $vectorStr::halfvec(1536)) * -1 AS similarity
FROM document_chunk
ORDER BY embedding <#> $vectorStr::halfvec(1536)
LIMIT $limit
The inner product distance is negated to convert to a similarity score (higher is better). Optional source_table filtering is applied when specified.
Component Structure
search-admin-content.tsx
|-- Tabs
|-- Search Debugger tab
| |-- search-debugger.tsx
| |-- Query input form
| |-- Three-column ResultPanel layout
| |-- Keyword (BM25) results table
| |-- Semantic (pgvector) results table
| |-- Fused (RRF) results table with source indicators
|
|-- Indexing & Health tab
|-- search-indexing-dashboard.tsx
|-- IndexingKPICards (4 cards)
|-- Chunks by Content Type table
|-- Vector Storage configuration card
Related Pages
- Admin Dashboard Overview -- all admin views and shared patterns
- Drizzle Patterns -- query patterns used in the search router
- Authorization -- CASL checks on
OrganizationandResearchArtifactsubjects