Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/S1LV4/th0th/llms.txt

Use this file to discover all available pages before exploring further.

Overview

th0th implements hybrid semantic search that combines the best of vector similarity and keyword matching. This approach achieves higher accuracy than either method alone by using Reciprocal Rank Fusion (RRF) to merge results.
98% token reduction is achieved by returning only the most relevant code chunks instead of entire files, dramatically reducing context size for AI assistants.

How It Works

Hybrid Retrieval Pipeline

1

Cache Lookup

Check L1 (memory) and L2 (SQLite) caches first. 50%+ cache hit rate on typical workloads.
2

Parallel Retrieval

If cache miss, run vector and keyword searches in parallel for speed.
3

RRF Fusion

Combine results using Reciprocal Rank Fusion with intelligent boosting.
4

Filtering

Apply minimum score threshold and file pattern filters (include/exclude).
5

Caching

Store final results in both L1 and L2 caches with TTL of 1 hour.

Embedding Generation

Each code chunk is converted to a high-dimensional vector (embedding) that captures semantic meaning:
// Example: Embedding a code chunk
const chunk = `
  async function authenticateUser(credentials) {
    const user = await db.findUser(credentials.email);
    return await bcrypt.compare(credentials.password, user.hash);
  }
`;

// Generate 768-dimensional embedding (Ollama nomic-embed-text)
const embedding = await embeddingService.embed(chunk);
// => [0.023, -0.145, 0.891, ...] (768 numbers)
ProviderModelDimensionsQualitySpeed
Ollamanomic-embed-text768GoodVery Fast
Ollamabge-m31024GreatFast
Mistralmistral-embed1024GreatMedium
Mistralcodestral-embed1024ExcellentMedium
OpenAItext-embedding-3-small1536ExcellentMedium

Similarity Calculation

Vector search finds chunks with embeddings geometrically close to the query embedding:
// Cosine similarity: dot product of normalized vectors
function cosineSimilarity(a: number[], b: number[]): number {
  let dot = 0, normA = 0, normB = 0;
  for (let i = 0; i < a.length; i++) {
    dot += a[i] * b[i];
    normA += a[i] * a[i];
    normB += b[i] * b[i];
  }
  return dot / (Math.sqrt(normA) * Math.sqrt(normB));
}
Vector search excels at finding semantically similar code even when exact keywords don’t match. For example, searching “hash password” will find bcrypt.compare() implementations.

BM25 Scoring (FTS5)

th0th uses SQLite’s FTS5 (Full-Text Search 5) with BM25 ranking:
-- Create FTS5 index with porter stemming
CREATE VIRTUAL TABLE keyword_search USING fts5(
  id UNINDEXED,
  content,
  metadata UNINDEXED,
  tokenize = 'porter unicode61'
);

-- Search with BM25 ranking
SELECT *, rank FROM keyword_search
WHERE content MATCH 'authenticateUser'
ORDER BY rank
LIMIT 10;
BM25 (Best Matching 25) is a probabilistic ranking function that considers:
  • Term frequency: How often the term appears in the document
  • Document length: Shorter documents with matches rank higher
  • Inverse document frequency: Rare terms are more valuable
Keyword search is essential for finding exact matches like function names, class names, or specific identifiers that embeddings might miss.

Reciprocal Rank Fusion (RRF)

The Algorithm

RRF combines rankings from multiple sources without needing to normalize scores:
function reciprocalRankFusion(
  resultSets: SearchResult[][],
  k: number = 60  // RRF constant (empirically optimal)
): SearchResult[] {
  const scores = new Map<string, number>();
  
  // For each result set (vector, keyword)
  for (const results of resultSets) {
    results.forEach((result, rank) => {
      const rrfScore = 1 / (k + rank + 1);
      scores.set(
        result.id,
        (scores.get(result.id) || 0) + rrfScore
      );
    });
  }
  
  // Sort by combined RRF score
  return Array.from(scores.entries())
    .sort((a, b) => b[1] - a[1])
    .map(([id, score]) => ({ ...items.get(id), score }));
}

Why RRF Works

Score-Independent

Works with incompatible scoring systems (cosine similarity vs BM25)

Rank-Based

Focuses on relative ranking, not absolute scores

Empirically Proven

k=60 is optimal across diverse datasets (TREC research)

No Tuning Needed

Parameter-free for end users

Intelligent Boosting

th0th applies context-aware boosting for code-specific queries:
// Detect code patterns in query
const codePatterns = [
  /\w+\(\)/,          // function calls: useState(), render()
  /\bfunction\b/i,    // "function" keyword
  /\bclass\b/i,       // "class" keyword
  /\bimport\b/i,      // "import" keyword
];

const isCodeQuery = codePatterns.some(p => p.test(query));

// Boost keyword results for code queries (2.5x weight)
const KEYWORD_BOOST = isCodeQuery ? 2.5 : 1.0;
const rrfScore = (1 / (k + rank + 1)) * KEYWORD_BOOST;
Query: cn() utility functionWithout boosting:
  1. Vector: Documentation about utility functions (0.85)
  2. Vector: Similar helper code (0.82)
  3. Keyword: Exact cn() definition (0.75)
With boosting (2.5x for keyword):
  1. Keyword: Exact cn() definition (1.88)
  2. Vector: Documentation about utility functions (0.85)
  3. Vector: Similar helper code (0.82)
The exact function definition now ranks first!

Smart Chunking

Language-Aware Splitting

th0th uses different chunking strategies based on file type:
Split by headings with hierarchy context:
# Installation
## Prerequisites
You need Node.js 18+

## Quick Start
Run npm install
Chunks:
  • Installation > Prerequisites (with heading context)
  • Installation > Quick Start (with heading context)

Chunk Configuration

interface ChunkerConfig {
  maxChunkLines: 200;      // Max lines per chunk
  minChunkLines: 5;        // Min lines (smaller merged)
  codeChunkTarget: 80;     // Target size for code blocks
  fixedChunkSize: 50;      // Fallback for unknown types
  addFileContext: true;    // Prepend file path to chunk
}
Why chunk? Large files (1000+ lines) would produce poor embeddings. Chunking creates focused, searchable units while preserving context.

Multi-Level Caching

Two-Level Architecture

class SearchCache {
  private l1Cache: Map<string, CacheEntry>;  // In-memory
  private l2Db: Database;                    // SQLite
  
  private readonly L1_MAX_SIZE = 100;        // entries
  private readonly L2_MAX_SIZE = 10000;      // entries
  private readonly DEFAULT_TTL = 3600;       // 1 hour
}

L1 Cache (Memory)

< 5ms lookup time100 most recent queriesLRU eviction

L2 Cache (SQLite)

< 20ms lookup time10,000 queries maxLRU eviction with indexes

Cache Key Generation

Cache keys are content-addressed using SHA256:
function generateCacheKey(
  query: string,
  projectId: string,
  options: { maxResults, include, exclude }
): string {
  const payload = JSON.stringify({
    query: query.toLowerCase().trim(),
    projectId,
    options: normalizeOptions(options)
  });
  return crypto.createHash('sha256').update(payload).digest('hex');
}
Only search-affecting parameters are included in the cache key. Presentation options like explainScores are ignored to maximize cache reuse.

Cache Invalidation

Triggered after complete reindexing:
await searchCache.invalidateProject(projectId);
// Removes all cached queries for this project
More granular: only invalidate queries affected by changed files:
const result = await searchCache.invalidateByFiles(
  projectId,
  ['src/auth.ts', 'src/utils.ts']
);

// Only queries with results from auth.ts or utils.ts are cleared
// Other queries remain cached!

Performance Metrics

Search Latency

3-5msMemory lookup + JSON deserializationFastest path

Cache Hit Rate

Typical workloads achieve 50-70% cache hit rate due to repeated queries during development sessions.
Monitor cache performance:
const stats = searchCache.getStats();
console.log(stats);
// {
//   l1Hits: 245,
//   l2Hits: 89,
//   totalHits: 334,
//   totalMisses: 166,
//   hitRate: 0.668  // 66.8%
// }

Advanced Features

File Pattern Filters

Include/exclude results by glob patterns:
const results = await search('authentication', projectId, {
  includeFilters: ['src/**/*.ts'],     // Only TypeScript files
  excludeFilters: ['**/*.test.ts'],    // Exclude tests
  maxResults: 10
});

Score Explanations

Debug ranking with detailed score breakdowns:
const results = await search('cn utility', projectId, {
  explainScores: true
});

console.log(results[0].explanation);
// {
//   vectorScore: 0.82,
//   keywordScore: 0.95,
//   vectorRank: 3,
//   keywordRank: 1,
//   rrfScore: 0.0289,
//   finalScore: 0.92,
//   breakdown: "Vector: 82.0% (rank #3) + Keyword: 95.0% (rank #1) → RRF: 0.0289 → Final: 92.0%"
// }

Warmup Queries

Pre-populate cache after indexing:
await contextualSearch.warmupCache(projectId, projectPath, [
  'authentication',
  'api endpoints',
  'database models',
  'components',
  'error handling'
]);
// Runs background searches to warm L1/L2 caches

Best Practices

Query Writing

Be specific: Use function names, class names, or technical termsNatural language works: “how to hash passwords” finds relevant codeAvoid overly broad: “utils” returns too many results

Index Maintenance

Regular reindexing: Run after major code changesIncremental updates: th0th auto-detects stale indexesCache cleanup: Runs automatically (1-hour TTL)

Performance Tuning

Adjust maxResults: Lower = faster (default: 10)Use file filters: Narrow search scope for speedMonitor cache stats: Aim for 60%+ hit rate

Architecture

Overall system design and component interaction

Compression

Reduce token usage with intelligent compression

Memory

Long-term memory and pattern recognition

API Reference

Complete search API documentation