Tuning Guide

Practical guidance for adjusting search relevance -- RRF parameters, chunk sizing, embedding choices, and debugging workflows.

This guide covers how to diagnose and improve search relevance in Trovella. It assumes you have read the Relevance Overview and understand the three scoring layers (BM25, cosine similarity, RRF).

Before You Tune

Relevance problems usually fall into three categories. Identify which you are dealing with before changing parameters.

1. Indexing Problems

Symptoms: relevant content does not appear in either result set (keyword or semantic). The content exists in the database but is not being found.

This is not a relevance issue -- it is an indexing issue. Check:

Was the content indexed? Look at the Indexing & Health tab in the Search Debugger for chunk counts.
Did the Inngest index-content job succeed? Check the Inngest dashboard.
Is the organization_id filter correct? Typesense filters by organization_id, and pgvector queries run through tenant-scoped RLS.

See Indexing for the write path.

2. Ranking Problems

Symptoms: relevant content appears in one or both result sets but is ranked lower than less relevant content.

This is the true relevance problem. Tuning parameters (below) can help.

3. Coverage Problems

Symptoms: relevant content appears in one result set but not the other, and the fusion score is too low.

This is a signal mismatch. The content matches on one dimension (keywords or meaning) but not the other. This is expected for some queries -- RRF handles it gracefully -- but consistent one-sided results may indicate an embedding quality issue or keyword vocabulary mismatch.

Tunable Parameters

RRF k Constant

Where: reciprocalRankFusion() in packages/search/src/fusion.ts, 4th parameter.

Current value: 60 (Cormack et al. 2009 default).

When to change: The k value rarely needs adjustment. Lower k amplifies rank differences (rank 1 vs rank 5 matters more); higher k flattens them (appearing in both sources matters more). The default of 60 is deliberately conservative.

Scenario	Direction	Range to Try
Top-ranked results are wrong, but the right results are in the top 10	Lower k (30-50)	Amplifies rank signal to push truly top-ranked results higher
Single-source results dominate over dual-source results	Higher k (70-100)	Strengthens the dual-source appearance boost
Results from both sources are all roughly equally relevant	Leave at 60	The default handles this case well

How to test: The debugSearch procedure accepts the default k. To experiment, temporarily modify the call in hybridSearch.search or add a k parameter to the debug endpoint input schema.

Result Set Size (Limit Multiplier)

Where: hybridSearch.search in packages/api/src/routers/hybrid-search.ts.

Current value: input.limit * 2 for both keyword and semantic searches.

The router fetches 2x the requested results from each source before fusion. This ensures RRF has enough candidates to work with, since some results from each source will be unique.

Multiplier	Tradeoff
1x	Faster, but fusion has fewer candidates. Results that appear only in one source may be cut off before fusion.
2x (current)	Good balance. Fusion typically has 1.5-2x the requested results to work with.
3-4x	Slower (more results to fetch and embed), but better recall for difficult queries where relevant content ranks lower.

Typesense Query Fields

Where: keywordSearch() in packages/search/src/keyword-search.ts.

Current value: query_by: "title,embedded_text".

Typesense searches both the title and embedded_text fields. The default field ordering gives priority to title matches. Consider:

Adding field weights if title matches should be significantly stronger: query_by: "title,embedded_text" with query_by_weights: "3,1"
Adding or reordering fields if new content types are indexed

Chunk Parameters

Where: The index-content Inngest function (see Indexing).

Parameter	Current Value	Effect on Relevance
`chunkSize`	2048 chars (~512 tokens)	Larger chunks provide more context per embedding but dilute specificity. Smaller chunks are more precise but may lose context.
`chunkOverlap`	200 chars (~50 tokens)	Overlap prevents information loss at chunk boundaries. More overlap means more redundant chunks but better boundary coverage.
Contextual prefix	Claude Haiku 2-3 sentences	Anchors the embedding in document context. Quality of the prefix directly affects semantic search relevance.

When to adjust chunks: Only when you see a pattern where relevant content is consistently split across chunk boundaries, or where chunks are too large and dilute the semantic signal. Changing chunk parameters requires re-indexing all content.

Debugging Workflow

Step 1: Identify the Query

Start with a specific query that produces poor results. Note what the user searched for and what they expected to find.

Step 2: Run the Debug Endpoint

Use the Search Debugger at /admin/search or call hybridSearch.debugSearch directly:

Enter the query
Note the query classification (keyword/balanced/semantic)
Compare the three result panels

Step 3: Diagnose the Problem

Observation	Diagnosis	Action
Expected result is in keyword results but not semantic	The embedding does not capture this content's meaning for this query	Check the chunk's embedded text -- is the contextual prefix accurate?
Expected result is in semantic results but not keyword	The query terms do not appear in the indexed text	Consider whether the Typesense schema fields are complete
Expected result is in both but ranked low in the fused list	RRF scoring is correct but both sources rank it low	The content may genuinely be less relevant, or the chunk size may be too large
Expected result does not appear in either	Not an RRF issue -- it is an indexing or filtering problem	Check chunk counts and Inngest job status
Top fused results are all from one source only	One engine is dominating because it returns more results for this query type	Consider weighted routing (see Query Classification)

Step 4: Test Changes

If you identify a parameter change:

Make the change in the code
Run the same query through the debug endpoint
Compare the before/after result sets
Test at least 5-10 different queries to ensure you did not regress other cases

What Not to Tune

Some things that look like relevance problems have other root causes:

Missing content -- the content was never indexed. Check the Inngest dashboard.
Wrong tenant -- the user is in a different organization than the content. RLS and Typesense filters enforce this correctly.
Stale embeddings -- if the embedding model or contextual prefix prompt changed, old chunks may have embeddings generated with different parameters. Re-index affected content.
Query too vague -- a one-word query may match too many documents. This is expected behavior, not a tuning problem.

Relevance Overview -- the scoring system that tuning adjusts
RRF Algorithm -- details on the k parameter and its mathematical properties
Evaluation -- how to measure whether tuning improved results
Indexing -- chunk parameters and the write path
Search Debugger -- the admin tool for inspecting scores