These days RAG systems have gotten popular for boosting LLMsโbut they're brittle๐. Minor shifts in phrasing (โ๏ธ style, politeness, typos) can wreck the pipeline. Even advanced components donโt fix the issue.
Check out this extensive eval by @neelbhandari.bsky.social and @tianyucao.bsky.social!
Akhila Yerukola
1/๐จ ๐ก๐ฒ๐ ๐ฝ๐ฎ๐ฝ๐ฒ๐ฟ ๐ฎ๐น๐ฒ๐ฟ๐ ๐จ
RAG systems excel on academic benchmarks - but are they robust to variations in linguistic style?
We find RAG systems are brittle. Small shifts in phrasing trigger cascading errors, driven by the complexity of the RAG pipeline ๐งต