Stop Saying RAG Is Dead
I’m tired of hearing “RAG is dead.” That’s why Ben Clavié and I put together this open 5-part series to discuss why RAG is not dead, and what the future of RAG looks like.
What’s Actually Dead
Ben Clavié’s opener nailed it: what’s dead is the 2023 marketing version of RAG. Chuck documents into a vector database, do cosine similarity, call it a day. This approach fails because compressing entire documents into single vectors loses critical information.
But retrieval is more important than ever. LLMs are frozen at training time. Million-token context windows don’t change the economics or efficiency of stuffing everything into every query.
Takeaways From the Series
We’ve been measuring wrong. Nandan Thakur showed that traditional IR metrics optimize for finding the #1 result. RAG needs different goals: coverage (getting all the facts), diversity (corroborating facts), and relevance. Models that ace BEIR benchmarks often fail at real RAG tasks.
Retrieval can reason. Orion Weller’s models understand instructions like “find documents about data privacy using metaphors.” His Rank1 system generates explicit reasoning traces about relevance. These models find documents that traditional systems never surface.
Single vectors lose information. Antoine Chaffin demonstrated how late-interaction models like ColBERT preserve token-level information. No more forcing everything into one conflicted representation. Result: 150M parameter models outperforming 7B parameter alternatives on reasoning tasks.
One map isn’t enough. Bryan and Ayush’s finale showed why we need multiple representations. Their art search demo finds the same painting through literal descriptions, poetic interpretations, or similar images—each using different indices. Stop searching for the perfect embedding. Build specialized representations and route intelligently.
What’s Next
I think a path forward is to combine these ideas:
- Evaluation systems that measure what matters for your use case
- Retrievers that understand instructions and reason about relevance
- Representations that preserve information instead of compressing it away
- Multiple specialized indices with intelligent routing
Annotated Notes From the Series
Each post includes timestamped annotations of slides, saving you hours of video watching. We’ve highlighted the most important bits and provided context for quickly grokking the material.
Title | Description | |
---|---|---|
![]() |
Part 1: I don’t use RAG, I just retrieve documents | Ben Clavié explains why naive single-vector search is dead, not RAG itself |
![]() |
Part 2: Modern IR Evals For RAG | Nandan Thakur shows why traditional IR metrics fail for RAG and introduces FreshStack |
![]() |
Part 3: Optimizing Retrieval with Reasoning Models | Orion Weller demonstrates retrievers that understand instructions and reason about relevance |
![]() |
Part 4: Late Interaction Models For RAG | Antoine Chaffin reveals how ColBERT-style models preserve information that single vectors lose |
![]() |
Part 5: RAG with Multiple Representations | Bryan Bischof and Ayush Chaurasia show why multiple specialized indices beat one perfect embedding |