Skip to Content
DocumentationRAG GuideRetrieval Systems

Retrieval Systems

The retrieval component is critical to RAG performance. This guide covers different retrieval approaches and how to choose between them.

Retrieval Methods

Uses embedding vectors to find semantically similar documents.

How it works:

  • Query and documents are embedded into the same vector space
  • Distance/similarity computed between vectors
  • Top-k most similar documents returned

Advantages:

  • Captures semantic meaning
  • Works across languages and paraphrasing
  • Fast with proper indexing

Disadvantages:

  • Requires good embedding model
  • May miss exact keyword matches
  • Embedding costs

Best for:

  • Semantic understanding
  • Cross-lingual queries
  • Conceptual searches

2. Sparse Retrieval (BM25)

Traditional keyword-based full-text search.

How it works:

  • Inverted index of terms
  • TF-IDF scoring
  • Keyword matching

Advantages:

  • Fast and lightweight
  • Handles exact matches well
  • Transparent and interpretable

Disadvantages:

  • No semantic understanding
  • Vocabulary mismatch issues
  • Less robust to paraphrasing

Best for:

  • Exact keyword searches
  • Structured documents
  • Low-resource environments

3. Hybrid Retrieval

Combines dense and sparse retrieval.

How it works:

  • Run both dense and sparse search
  • Combine results (e.g., reciprocal rank fusion)
  • Return merged top-k

Advantages:

  • Best of both worlds
  • Handles various query types
  • More robust

Disadvantages:

  • Slower (runs two searches)
  • More complex to tune

Best for:

  • Production systems
  • Mixed workloads
  • High accuracy requirements

4. Graph-Based Retrieval

Uses knowledge graphs or document relationships.

How it works:

  • Build graph of entities and relationships
  • Traverse graph based on query
  • Return relevant subgraph

Advantages:

  • Captures relationships
  • Structured knowledge
  • Reasoning over facts

Disadvantages:

  • Complex to build
  • Requires structured data
  • Higher setup cost

Best for:

  • Knowledge bases
  • Entity-centric queries
  • Complex reasoning

Chunking Strategies

How you split documents significantly impacts retrieval quality.

Fixed-Size Chunks

Split documents into fixed-length pieces (e.g., 512 tokens).

chunks = [doc[i:i+512] for i in range(0, len(doc), 512)]

Pros: Simple, fast
Cons: Splits context awkwardly

Semantic Chunking

Split at semantic boundaries (paragraphs, sentences).

# Split at sentence boundaries chunks = doc.split('. ')

Pros: Preserves context, natural boundaries
Cons: Variable chunk sizes

Sliding Window

Overlap chunks for context continuity.

# 512 token chunks with 50 token overlap chunks = [doc[i:i+512] for i in range(0, len(doc), 462)]

Pros: Maintains context between chunks
Cons: Redundancy, increased storage

Recursive Chunking

Split hierarchically (paragraphs → sentences → tokens).

Pros: Optimal chunk sizes, preserves structure
Cons: More complex

Embedding Models

The embedding model determines retrieval quality.

ModelDimensionsSpeedQuality
text-embedding-3-small1536Very FastExcellent
text-embedding-3-large3072FastExcellent
all-MiniLM-L6-v2384Very FastGood
bge-base-en-v1.5768FastVery Good
e5-large1024MediumExcellent

Start with all-MiniLM-L6-v2 for prototyping (fast, local), then upgrade to text-embedding-3-small for production (better quality).

Retrieval Metrics

Standard Metrics

  • Precision@k: How many of top-k results are relevant
  • Recall@k: What fraction of relevant documents are in top-k
  • MRR (Mean Reciprocal Rank): Average rank of first relevant result
  • NDCG: Normalized Discounted Cumulative Gain

RAG-Specific Metrics

  • Context Relevance: Is retrieved context related to query?
  • Faithfulness: Is response supported by retrieved context?
  • Answer Relevance: Does response answer the actual question?

Improving Retrieval

Start Simple

Begin with dense retrieval (semantic search) and measure performance.

Add Metadata Filtering

Filter results by date, source, category before ranking.

Combine dense + sparse retrieval, use reciprocal rank fusion.

Implement Reranking

Use cross-encoders to rerank top results.

Expand Queries

Generate multiple query variants, retrieve from all.

Fine-tune Embeddings

Fine-tune embedding model on your domain.

Evaluate Thoroughly

Track metrics, analyze failures, iterate.

Common Pitfalls

  • Too Large Chunks: Dilutes signal with noise
  • Too Small Chunks: Breaks up important context
  • Poor Embeddings: Garbage in, garbage out
  • No Evaluation: Flying blind
  • Ignoring Metadata: Missing filtering opportunities
Last updated on