Member-only story

Denser Retriever: Combining Keyword, Vector, and ML Re-Ranking for Superior RAG Performance

Agent Issue
7 min readNov 11, 2024

--

Denser-retriever leverages gradient boosting to synergize different retrieval paradigms:

  • Keyword-Based Search: This is our classic search method, leveraging techniques like BM25 to fetch documents that precisely match the query terms. It’s great for exact matches but can miss semantically relevant content that uses different wording.
  • Vector Search with Embeddings: By encoding documents and queries into high-dimensional vectors using models like BERT or Sentence Transformers, we capture semantic relationships. This means we can find relevant results even when the exact keywords aren’t present.
  • Machine Learning Re-Rankers: After retrieving candidates from the above methods, we apply a re-ranker — often a transformer-based model — that fine-tunes the ordering based on deeper contextual understanding.

Now, here’s where Denser Retriever shines.

It uses XGBoost, a powerful gradient boosting algorithm, to combine the scores from these different methods.

By training on features like keyword relevance scores, vector similarities, and re-ranker outputs, it learns the optimal way to weight each component to maximize retrieval performance.

In experiments with the Massive Text Embedding Benchmark (MTEB) datasets, this ensemble approach (denoted as ES+VS+RR_n) significantly outperformed the baseline vector search (VS).

--

--

Agent Issue
Agent Issue

Written by Agent Issue

Your front-row seat to the future of Agents.

No responses yet