Hybrid search: combining embeddings with BM25 for better RAG

Victor Huang
Victor HuangMar 15, 2026

Pure vector search misses keyword-heavy queries. I've implemented hybrid search (vector + BM25) and the improvement is significant.

Architecture

1. Store documents in both a vector DB (Pinecone) and a text search engine (Elasticsearch) 2. Query both in parallel 3. Use Reciprocal Rank Fusion to combine results

def hybrid_search(query, k=10, alpha=0.7):
    # Vector search
    embedding = get_embedding(query)
    vector_results = pinecone_index.query(vector=embedding, top_k=k)
    
    # BM25 search
    bm25_results = es.search(index="docs", body={"query": {"match": {"text": query}}})
    
    # Reciprocal Rank Fusion
    return rrf_combine(vector_results, bm25_results, alpha=alpha)

On our benchmark, hybrid search improved recall@10 from 0.78 to 0.91 compared to vector-only search.

5.7k views29 replies87 likes

Log in to reply to this topic.