Trovella Wiki

Semantic Search (pgvector)

Vector similarity search using pgvector halfvec embeddings, the HNSW index, and tenant-scoped SQL queries.

Semantic search finds documents by meaning rather than exact keyword matches. The user's query is embedded into a 1536-dimensional vector using Gemini, then compared against stored document embeddings in pgvector using inner product distance.

Query-Time Embedding

Before the vector search can run, the query text must be converted to an embedding. This happens via ctx.ai.embedQuery() in the tRPC handler:

const { embedding } = await ctx.ai.embedQuery(input.query, "hybrid-search-query");

Under the hood, this calls embedQuery() in packages/ai/src/embedding.ts, which:

  1. Sends the query to Gemini Embedding 2 (gemini-embedding-2-preview)
  2. Uses RETRIEVAL_QUERY task type (asymmetric retrieval -- optimized for short queries against long documents)
  3. Returns a 1536-dimensional vector (MRL truncation from the model's native 3072 dimensions)
  4. Records the call to ai_usage with feature tag "hybrid-search-query" for cost tracking

The RETRIEVAL_QUERY task type is distinct from the RETRIEVAL_DOCUMENT task type used during indexing. Gemini applies different internal transformations for each, which improves recall in asymmetric retrieval scenarios.

The SQL Query

The semantic search function in packages/api/src/routers/hybrid-search.ts runs raw SQL against the tenant-scoped database transaction:

SELECT id, source_table, source_id, embedded_text,
       (embedding <#> $vectorStr::halfvec(1536)) * -1 AS similarity
FROM document_chunk
WHERE source_table = $sourceTable  -- optional filter
ORDER BY embedding <#> $vectorStr::halfvec(1536)
LIMIT $limit

Operator: <#>

The <#> operator computes negative inner product distance. For normalized vectors (which Gemini embeddings are), negative inner product is equivalent to cosine distance. The result is negated (* -1) to produce a similarity score where higher values indicate greater similarity.

Why Not <=> (Cosine Distance)?

The HNSW index on document_chunk uses halfvec_ip_ops (inner product operations). Using the <#> operator matches the index, so PostgreSQL uses the HNSW index for the ORDER BY. If you used <=> (cosine distance operator), the query would fall back to a sequential scan because the index was built for a different operator class.

The Cast: ::halfvec(1536)

The query embedding is passed as a string [0.1,0.2,...] and cast to halfvec(1536) at query time. This matches the column type exactly. Using halfvec (half-precision, 2 bytes per dimension) instead of vector (full precision, 4 bytes) halves memory usage with negligible recall loss on normalized embeddings.

HNSW Index

The vector index is defined in migration 0009_add-document-chunk-table.sql:

CREATE INDEX IF NOT EXISTS "document_chunk_embedding_idx"
  ON "document_chunk"
  USING hnsw ("embedding" halfvec_ip_ops);

HNSW (Hierarchical Navigable Small World) is an approximate nearest neighbor index. It trades a small amount of recall accuracy for dramatically faster search compared to exact nearest neighbor (sequential scan).

ParameterValueNotes
Index typeHNSWApproximate nearest neighbor
Operator classhalfvec_ip_opsInner product distance on halfvec columns
m16 (default)Max connections per node -- higher = better recall, more memory
ef_construction64 (default)Build-time search width -- higher = better index quality, slower build

These defaults are adequate for the current scale (< 200k vectors). If recall degrades as the index grows, increasing ef_construction or m requires a full index rebuild.

Tenant Isolation

The semantic search runs through ctx.db, which is the tenant-scoped transaction from withTenantContext. PostgreSQL RLS policies on document_chunk automatically filter results to the current organization:

CREATE POLICY "document_chunk_select" ON "document_chunk"
  AS PERMISSIVE FOR SELECT TO "authenticated"
  USING ("organization_id" = (SELECT current_setting('app.tenant_id')));

No application-level WHERE organization_id = ... clause is needed. The RLS policy is invisible to the query but ensures cross-tenant data never leaks.

Result Mapping

The raw SQL results are mapped to RankedResult objects for the fusion step:

const semRanked: RankedResult[] = semanticResults.map((r, idx) => ({
  id: r.id,
  sourceTable: r.sourceTable,
  sourceId: r.sourceId,
  title: "", // semantic results don't carry title from the SQL query
  rank: idx + 1,
  score: r.similarity,
  textSnippet: r.embeddedText,
}));

The title is empty for semantic results because the raw SQL query selects only embedded_text, not title. After fusion, the fused result inherits the title from the keyword result if the same document appeared in both sets.

The embeddedText snippet is truncated to 200 characters in the semanticSearch function:

embeddedText: String(row["embedded_text"]).slice(0, 200),

The document_chunk Schema

The pgvector side of the dual-write stores chunks in the document_chunk table, defined in packages/db/src/schema/search.ts:

ColumnTypePurpose
idtext PKUnique chunk identifier
organization_idtextTenant scoping (RLS filtered)
source_tablechunk_source_table enumWhich source table: artifact, output, extraction
source_idtextID of the source record
chunk_indexintegerZero-based position within the source document
original_texttextRaw text before contextual prefix
context_prefixtextLLM-generated context (may be null for single-chunk docs)
embedded_texttextFull text that was embedded (prefix + original)
embeddinghalfvec(1536)The vector embedding
token_countintegerApproximate token count
correlation_idtextLinks to ai_usage for cost tracing

On this page