Semantic Search (pgvector)
Vector similarity search using pgvector halfvec embeddings, the HNSW index, and tenant-scoped SQL queries.
Semantic search finds documents by meaning rather than exact keyword matches. The user's query is embedded into a 1536-dimensional vector using Gemini, then compared against stored document embeddings in pgvector using inner product distance.
Query-Time Embedding
Before the vector search can run, the query text must be converted to an embedding. This happens via ctx.ai.embedQuery() in the tRPC handler:
const { embedding } = await ctx.ai.embedQuery(input.query, "hybrid-search-query");
Under the hood, this calls embedQuery() in packages/ai/src/embedding.ts, which:
- Sends the query to Gemini Embedding 2 (
gemini-embedding-2-preview) - Uses
RETRIEVAL_QUERYtask type (asymmetric retrieval -- optimized for short queries against long documents) - Returns a 1536-dimensional vector (MRL truncation from the model's native 3072 dimensions)
- Records the call to
ai_usagewith feature tag"hybrid-search-query"for cost tracking
The RETRIEVAL_QUERY task type is distinct from the RETRIEVAL_DOCUMENT task type used during indexing. Gemini applies different internal transformations for each, which improves recall in asymmetric retrieval scenarios.
The SQL Query
The semantic search function in packages/api/src/routers/hybrid-search.ts runs raw SQL against the tenant-scoped database transaction:
SELECT id, source_table, source_id, embedded_text,
(embedding <#> $vectorStr::halfvec(1536)) * -1 AS similarity
FROM document_chunk
WHERE source_table = $sourceTable -- optional filter
ORDER BY embedding <#> $vectorStr::halfvec(1536)
LIMIT $limit
Operator: <#>
The <#> operator computes negative inner product distance. For normalized vectors (which Gemini embeddings are), negative inner product is equivalent to cosine distance. The result is negated (* -1) to produce a similarity score where higher values indicate greater similarity.
Why Not <=> (Cosine Distance)?
The HNSW index on document_chunk uses halfvec_ip_ops (inner product operations). Using the <#> operator matches the index, so PostgreSQL uses the HNSW index for the ORDER BY. If you used <=> (cosine distance operator), the query would fall back to a sequential scan because the index was built for a different operator class.
The Cast: ::halfvec(1536)
The query embedding is passed as a string [0.1,0.2,...] and cast to halfvec(1536) at query time. This matches the column type exactly. Using halfvec (half-precision, 2 bytes per dimension) instead of vector (full precision, 4 bytes) halves memory usage with negligible recall loss on normalized embeddings.
HNSW Index
The vector index is defined in migration 0009_add-document-chunk-table.sql:
CREATE INDEX IF NOT EXISTS "document_chunk_embedding_idx"
ON "document_chunk"
USING hnsw ("embedding" halfvec_ip_ops);
HNSW (Hierarchical Navigable Small World) is an approximate nearest neighbor index. It trades a small amount of recall accuracy for dramatically faster search compared to exact nearest neighbor (sequential scan).
| Parameter | Value | Notes |
|---|---|---|
| Index type | HNSW | Approximate nearest neighbor |
| Operator class | halfvec_ip_ops | Inner product distance on halfvec columns |
m | 16 (default) | Max connections per node -- higher = better recall, more memory |
ef_construction | 64 (default) | Build-time search width -- higher = better index quality, slower build |
These defaults are adequate for the current scale (< 200k vectors). If recall degrades as the index grows, increasing ef_construction or m requires a full index rebuild.
Tenant Isolation
The semantic search runs through ctx.db, which is the tenant-scoped transaction from withTenantContext. PostgreSQL RLS policies on document_chunk automatically filter results to the current organization:
CREATE POLICY "document_chunk_select" ON "document_chunk"
AS PERMISSIVE FOR SELECT TO "authenticated"
USING ("organization_id" = (SELECT current_setting('app.tenant_id')));
No application-level WHERE organization_id = ... clause is needed. The RLS policy is invisible to the query but ensures cross-tenant data never leaks.
Result Mapping
The raw SQL results are mapped to RankedResult objects for the fusion step:
const semRanked: RankedResult[] = semanticResults.map((r, idx) => ({
id: r.id,
sourceTable: r.sourceTable,
sourceId: r.sourceId,
title: "", // semantic results don't carry title from the SQL query
rank: idx + 1,
score: r.similarity,
textSnippet: r.embeddedText,
}));
The title is empty for semantic results because the raw SQL query selects only embedded_text, not title. After fusion, the fused result inherits the title from the keyword result if the same document appeared in both sets.
The embeddedText snippet is truncated to 200 characters in the semanticSearch function:
embeddedText: String(row["embedded_text"]).slice(0, 200),
The document_chunk Schema
The pgvector side of the dual-write stores chunks in the document_chunk table, defined in packages/db/src/schema/search.ts:
| Column | Type | Purpose |
|---|---|---|
id | text PK | Unique chunk identifier |
organization_id | text | Tenant scoping (RLS filtered) |
source_table | chunk_source_table enum | Which source table: artifact, output, extraction |
source_id | text | ID of the source record |
chunk_index | integer | Zero-based position within the source document |
original_text | text | Raw text before contextual prefix |
context_prefix | text | LLM-generated context (may be null for single-chunk docs) |
embedded_text | text | Full text that was embedded (prefix + original) |
embedding | halfvec(1536) | The vector embedding |
token_count | integer | Approximate token count |
correlation_id | text | Links to ai_usage for cost tracing |
Related Pages
- Query Pipeline Overview -- how semantic search fits into the full pipeline
- Drizzle Patterns -- Vector Search -- the raw SQL pattern used here
- Indexing -- Embedding -- how document embeddings are generated (write path counterpart)
- Tenant Isolation -- Enforcement -- RLS policy mechanics