Inlay

Real user queries often look different from the clean, concise ones in academic benchmarks - ambiguity, full of typos, and much less readable. We show that even strong RAG systems quickly break under these conditions. Awesome project led by @neelbhandari.bsky.social and @tianyucao.bsky.social!!

1/🚨 𝗡𝗲𝘄 𝗽𝗮𝗽𝗲𝗿 𝗮𝗹𝗲𝗿𝘁 🚨 RAG systems excel on academic benchmarks - but are they robust to variations in linguistic style? We find RAG systems are brittle. Small shifts in phrasing trigger cascading errors, driven by the complexity of the RAG pipeline 🧵