Tuning Guide
Practical guidance for adjusting search relevance -- RRF parameters, chunk sizing, embedding choices, and debugging workflows.
This guide covers how to diagnose and improve search relevance in Trovella. It assumes you have read the Relevance Overview and understand the three scoring layers (BM25, cosine similarity, RRF).
Before You Tune
Relevance problems usually fall into three categories. Identify which you are dealing with before changing parameters.
1. Indexing Problems
Symptoms: relevant content does not appear in either result set (keyword or semantic). The content exists in the database but is not being found.
This is not a relevance issue -- it is an indexing issue. Check:
- Was the content indexed? Look at the Indexing & Health tab in the Search Debugger for chunk counts.
- Did the Inngest
index-contentjob succeed? Check the Inngest dashboard. - Is the
organization_idfilter correct? Typesense filters byorganization_id, and pgvector queries run through tenant-scoped RLS.
See Indexing for the write path.
2. Ranking Problems
Symptoms: relevant content appears in one or both result sets but is ranked lower than less relevant content.
This is the true relevance problem. Tuning parameters (below) can help.
3. Coverage Problems
Symptoms: relevant content appears in one result set but not the other, and the fusion score is too low.
This is a signal mismatch. The content matches on one dimension (keywords or meaning) but not the other. This is expected for some queries -- RRF handles it gracefully -- but consistent one-sided results may indicate an embedding quality issue or keyword vocabulary mismatch.
Tunable Parameters
RRF k Constant
Where: reciprocalRankFusion() in packages/search/src/fusion.ts, 4th parameter.
Current value: 60 (Cormack et al. 2009 default).
When to change: The k value rarely needs adjustment. Lower k amplifies rank differences (rank 1 vs rank 5 matters more); higher k flattens them (appearing in both sources matters more). The default of 60 is deliberately conservative.
| Scenario | Direction | Range to Try |
|---|---|---|
| Top-ranked results are wrong, but the right results are in the top 10 | Lower k (30-50) | Amplifies rank signal to push truly top-ranked results higher |
| Single-source results dominate over dual-source results | Higher k (70-100) | Strengthens the dual-source appearance boost |
| Results from both sources are all roughly equally relevant | Leave at 60 | The default handles this case well |
How to test: The debugSearch procedure accepts the default k. To experiment, temporarily modify the call in hybridSearch.search or add a k parameter to the debug endpoint input schema.
Result Set Size (Limit Multiplier)
Where: hybridSearch.search in packages/api/src/routers/hybrid-search.ts.
Current value: input.limit * 2 for both keyword and semantic searches.
The router fetches 2x the requested results from each source before fusion. This ensures RRF has enough candidates to work with, since some results from each source will be unique.
| Multiplier | Tradeoff |
|---|---|
| 1x | Faster, but fusion has fewer candidates. Results that appear only in one source may be cut off before fusion. |
| 2x (current) | Good balance. Fusion typically has 1.5-2x the requested results to work with. |
| 3-4x | Slower (more results to fetch and embed), but better recall for difficult queries where relevant content ranks lower. |
Typesense Query Fields
Where: keywordSearch() in packages/search/src/keyword-search.ts.
Current value: query_by: "title,embedded_text".
Typesense searches both the title and embedded_text fields. The default field ordering gives priority to title matches. Consider:
- Adding field weights if title matches should be significantly stronger:
query_by: "title,embedded_text"withquery_by_weights: "3,1" - Adding or reordering fields if new content types are indexed
Chunk Parameters
Where: The index-content Inngest function (see Indexing).
| Parameter | Current Value | Effect on Relevance |
|---|---|---|
chunkSize | 2048 chars (~512 tokens) | Larger chunks provide more context per embedding but dilute specificity. Smaller chunks are more precise but may lose context. |
chunkOverlap | 200 chars (~50 tokens) | Overlap prevents information loss at chunk boundaries. More overlap means more redundant chunks but better boundary coverage. |
| Contextual prefix | Claude Haiku 2-3 sentences | Anchors the embedding in document context. Quality of the prefix directly affects semantic search relevance. |
When to adjust chunks: Only when you see a pattern where relevant content is consistently split across chunk boundaries, or where chunks are too large and dilute the semantic signal. Changing chunk parameters requires re-indexing all content.
Debugging Workflow
Step 1: Identify the Query
Start with a specific query that produces poor results. Note what the user searched for and what they expected to find.
Step 2: Run the Debug Endpoint
Use the Search Debugger at /admin/search or call hybridSearch.debugSearch directly:
- Enter the query
- Note the query classification (keyword/balanced/semantic)
- Compare the three result panels
Step 3: Diagnose the Problem
| Observation | Diagnosis | Action |
|---|---|---|
| Expected result is in keyword results but not semantic | The embedding does not capture this content's meaning for this query | Check the chunk's embedded text -- is the contextual prefix accurate? |
| Expected result is in semantic results but not keyword | The query terms do not appear in the indexed text | Consider whether the Typesense schema fields are complete |
| Expected result is in both but ranked low in the fused list | RRF scoring is correct but both sources rank it low | The content may genuinely be less relevant, or the chunk size may be too large |
| Expected result does not appear in either | Not an RRF issue -- it is an indexing or filtering problem | Check chunk counts and Inngest job status |
| Top fused results are all from one source only | One engine is dominating because it returns more results for this query type | Consider weighted routing (see Query Classification) |
Step 4: Test Changes
If you identify a parameter change:
- Make the change in the code
- Run the same query through the debug endpoint
- Compare the before/after result sets
- Test at least 5-10 different queries to ensure you did not regress other cases
What Not to Tune
Some things that look like relevance problems have other root causes:
- Missing content -- the content was never indexed. Check the Inngest dashboard.
- Wrong tenant -- the user is in a different organization than the content. RLS and Typesense filters enforce this correctly.
- Stale embeddings -- if the embedding model or contextual prefix prompt changed, old chunks may have embeddings generated with different parameters. Re-index affected content.
- Query too vague -- a one-word query may match too many documents. This is expected behavior, not a tuning problem.
Related Pages
- Relevance Overview -- the scoring system that tuning adjusts
- RRF Algorithm -- details on the k parameter and its mathematical properties
- Evaluation -- how to measure whether tuning improved results
- Indexing -- chunk parameters and the write path
- Search Debugger -- the admin tool for inspecting scores