Retrieval Systems

The retrieval component is critical to RAG performance. This guide covers different retrieval approaches and how to choose between them.

Retrieval Methods

1. Dense Retrieval (Semantic Search)

Uses embedding vectors to find semantically similar documents.

How it works:

Query and documents are embedded into the same vector space
Distance/similarity computed between vectors
Top-k most similar documents returned

Advantages:

Captures semantic meaning
Works across languages and paraphrasing
Fast with proper indexing

Disadvantages:

Requires good embedding model
May miss exact keyword matches
Embedding costs

Best for:

Semantic understanding
Cross-lingual queries
Conceptual searches

2. Sparse Retrieval (BM25)

Traditional keyword-based full-text search.

How it works:

Inverted index of terms
TF-IDF scoring
Keyword matching

Advantages:

Fast and lightweight
Handles exact matches well
Transparent and interpretable

Disadvantages:

No semantic understanding
Vocabulary mismatch issues
Less robust to paraphrasing

Best for:

Exact keyword searches
Structured documents
Low-resource environments

3. Hybrid Retrieval

Combines dense and sparse retrieval.

How it works:

Run both dense and sparse search
Combine results (e.g., reciprocal rank fusion)
Return merged top-k

Advantages:

Best of both worlds
Handles various query types
More robust

Disadvantages:

Slower (runs two searches)
More complex to tune

Best for:

Production systems
Mixed workloads
High accuracy requirements

4. Graph-Based Retrieval

Uses knowledge graphs or document relationships.

How it works:

Build graph of entities and relationships
Traverse graph based on query
Return relevant subgraph

Advantages:

Captures relationships
Structured knowledge
Reasoning over facts

Disadvantages:

Complex to build
Requires structured data
Higher setup cost

Best for:

Knowledge bases
Entity-centric queries
Complex reasoning

Chunking Strategies

How you split documents significantly impacts retrieval quality.

Fixed-Size Chunks

Split documents into fixed-length pieces (e.g., 512 tokens).


chunks = [doc[i:i+512] for i in range(0, len(doc), 512)]

Pros: Simple, fast
Cons: Splits context awkwardly

Semantic Chunking

Split at semantic boundaries (paragraphs, sentences).


# Split at sentence boundaries
chunks = doc.split('. ')

Pros: Preserves context, natural boundaries
Cons: Variable chunk sizes

Sliding Window

Overlap chunks for context continuity.


# 512 token chunks with 50 token overlap
chunks = [doc[i:i+512] for i in range(0, len(doc), 462)]

Pros: Maintains context between chunks
Cons: Redundancy, increased storage

Recursive Chunking

Split hierarchically (paragraphs → sentences → tokens).

Pros: Optimal chunk sizes, preserves structure
Cons: More complex

Embedding Models

The embedding model determines retrieval quality.

Model	Dimensions	Speed	Quality
`text-embedding-3-small`	1536	Very Fast	Excellent
`text-embedding-3-large`	3072	Fast	Excellent
`all-MiniLM-L6-v2`	384	Very Fast	Good
`bge-base-en-v1.5`	768	Fast	Very Good
`e5-large`	1024	Medium	Excellent

Start with all-MiniLM-L6-v2 for prototyping (fast, local), then upgrade to text-embedding-3-small for production (better quality).

Retrieval Metrics

Standard Metrics

Precision@k: How many of top-k results are relevant
Recall@k: What fraction of relevant documents are in top-k
MRR (Mean Reciprocal Rank): Average rank of first relevant result
NDCG: Normalized Discounted Cumulative Gain

RAG-Specific Metrics

Context Relevance: Is retrieved context related to query?
Faithfulness: Is response supported by retrieved context?
Answer Relevance: Does response answer the actual question?

Improving Retrieval

Start Simple

Begin with dense retrieval (semantic search) and measure performance.

Add Metadata Filtering

Filter results by date, source, category before ranking.

Try Hybrid Search

Combine dense + sparse retrieval, use reciprocal rank fusion.

Implement Reranking

Use cross-encoders to rerank top results.

Expand Queries

Generate multiple query variants, retrieve from all.

Fine-tune Embeddings

Fine-tune embedding model on your domain.

Evaluate Thoroughly

Track metrics, analyze failures, iterate.

Common Pitfalls

Too Large Chunks: Dilutes signal with noise
Too Small Chunks: Breaks up important context
Poor Embeddings: Garbage in, garbage out
No Evaluation: Flying blind
Ignoring Metadata: Missing filtering opportunities