Learn how to use the TF-IDF variant for keyword-based search without external dependencies.Documentation Index
Fetch the complete documentation index at: https://docs.agentfront.dev/llms.txt
Use this file to discover all available pages before exploring further.
In this guide you’ll learn when to use TF-IDF instead of semantic search, how to configure and query it, and understand its limitations.
When to Use TF-IDF
| Scenario | Recommendation |
|---|---|
| Small corpus (< 10K docs) | TF-IDF works well |
| No network access for model download | Use TF-IDF |
| Keyword matching is sufficient | Use TF-IDF |
| Semantic understanding required | Use VectoriaDB |
| Large corpus (> 10K docs) | Use VectoriaDB + HNSW |
Basic Usage
src/tfidf-basic.ts
Key Differences from VectoriaDB
| Feature | TFIDFVectoria | VectoriaDB |
|---|---|---|
| Dependencies | Zero | transformers.js (~22MB model) |
| Initialization | Synchronous | Async (model download) |
| Semantic understanding | Keyword-based | Full semantic |
| Best for | Small corpora (under 10K docs) | Any size |
| Reindex required | Yes, after changes | No |
Important: Reindexing
TF-IDF requires reindexing after document changes to update IDF (Inverse Document Frequency) values:src/tfidf-reindex.ts
Configuration Options
src/tfidf-config.ts
Search Options
src/tfidf-search.ts
TF-IDF Algorithm
TF-IDF (Term Frequency-Inverse Document Frequency) works by:- Term Frequency (TF): How often a term appears in a document
- Inverse Document Frequency (IDF): How rare a term is across all documents
- TF-IDF Score: TF x IDF - terms that are frequent in a document but rare overall get high scores
- Common words like “the”, “is”, “a” get low scores (low IDF)
- Unique terms specific to a document get high scores
- The query is matched against TF-IDF vectors using cosine similarity
Example: Tool Discovery
src/tfidf-tool-discovery.ts
Limitations
- No semantic understanding - “car” won’t match “automobile”
- Reindex requirement - Must call
reindex()after changes - Limited to keywords - Misspellings and synonyms aren’t handled
- Memory for large vocabularies - IDF tables grow with vocabulary size
Hybrid Approach
For best of both worlds, you can use TF-IDF as a pre-filter before semantic search:src/tfidf-hybrid.ts
Related
Welcome
Getting started
Search
Semantic search options
Storage
Storage adapters