Inlay

Profile

It's like Gmail for your papers - a modern reference manager. We love papers and post about publishing, academic productivity and everything related.

Paperpile

With Paperpile's AI assistant integration, you can send any paper in your library to Claude, NotebookLM, or ChatGPT in one click. Every AI answer links back to the exact quote in the original PDF. No hallucinated references. No "close enough" paraphrasing.

First, the tool: the Paperpile Citation Checker. Upload your BibTeX, and it verifies every reference in real time. When it's done, you can download a cleaned-up file or share the results URL with coauthors.

A NeurIPS submission got caught with hallucinated citations. The authors' admitted they asked ChatGPT to fill in BibTeX from paraphrased in-text citations. We ran a simulation to see how often this fails, and built a free tool to catch it. #AcademicSky

How does this happen? A case study on OpenReview spells it out: authors gave ChatGPT paraphrased citations (author-year in-text citations, or titles) and asked it to generate properly formatted BibTeX. Instead of looking up the real metadata, the model hallucinated it.

We also found a strong correlation between citation count and hallucination risk. Highly-cited papers appear more often in training data, so models get them right. But newer and less-cited references tend to be more prone to AI-induced hallucinations.

Our recommendation: using AI to generate a bibliography is not worth the risk. Full blog post: paperpile.com/blog/citation-checker-hallucinations/

Why now? This month alone, large-scale analyses in The Lancet and on arXiv highlighted how widespread hallucinated citations have become. And arXiv announced a severe penalty for submitting manuscripts with hallucinations.

We spot-checked today's top AI assistants on the same task. ChatGPT and Claude were relatively clean. Gemini had more hallucinations. The difference? ChatGPT and Claude ran dozens of web searches while generating the output. Web search is what grounds these tools.

We ran a simulation to understand this. We took 76 references, paraphrased them, and asked different models to convert them to BibTeX. - The “thinking” or “pro” model in an AI assistant outperforms the “instant” or “lite” model. - Including web search results can reduce hallucinations.

13d