//
sign in
Profile
by @danabra.mov
Profile
by @dansshadow.bsky.social
Profile
by @jimpick.com
AviHandle
by @danabra.mov
AviHandle
by @dansshadow.bsky.social
AviHandle
by @katherine.computer
EventsList
by @katherine.computer
ProfileHeader
by @dansshadow.bsky.social
ProfileHeader
by @danabra.mov
ProfileMedia
by @danabra.mov
ProfilePlays
by @danabra.mov
ProfilePosts
by @danabra.mov
ProfilePosts
by @dansshadow.bsky.social
ProfileReplies
by @danabra.mov
Record
by @atsui.org
Skircle
by @danabra.mov
StreamPlacePlaylist
by @katherine.computer
+ new component
Profile
Loading...





Where does one language model outperform the other? We examine this from first principles, performing unsupervised discovery of "abilities" that one model has and the other does not. Results show interesting differences between model classes, sizes and pre-/post-training.
Nice contribution to the understanding of Long CoT induction arxiv.org/abs/2502.03373 by Edward Yeo and colleagues (advised by @gneubig.bsky.social and @xiangyue96.bsky.social ). Its hard not to see this as mostly a negative result on induction on the 8B scale. πŸ‘‡
LLM agents can codeβ€”but can they ask clarifying questions? πŸ€–πŸ’¬ Tired of coding agents wasting time and API credits, only to output broken code? What if they asked first instead of guessing? πŸš€ (New work led by Sanidhya Vijay: www.linkedin.com/in/sanidhya-...)
We are now done with all classes for CMU CS11-711 Advanced NLP! Slides: phontron.com/class/anlp-f... Videos: youtube.com/playlist?lis... Hope this is useful to people πŸ˜€
1/ Introducing α΄α΄˜α΄‡Ι΄κœ±α΄„Κœα΄ΚŸα΄€Κ€: a retrieval-augmented LM to help scientists synthesize knowledge πŸ“š @uwnlp.bsky.social & Ai2 With open models & 45M-paper datastores, it outperforms proprietary systems & match human experts. Try out our demo! openscholar.allen.ai
πŸ’¬ Have you or a loved one compared LM probabilities to human linguistic acceptability judgments? You may be overcompensating for the effect of frequency and length! 🌟 In our new paper, we rethink how we should be controlling for these factors 🧡:
Jun 9, 2025
Feb 8, 2025
Feb 19, 2025
Nov 27, 2024
Nov 19, 2024
Nov 20, 2024
Video
The weekly event schedule.
phontron.com
Schedule
Scaling inference compute enhances reasoning in large language models (LLMs), with long chains-of-thought (CoTs) enabling strategies like backtracking and error correction. Reinforcement learning (RL)...
arxiv.org
Demystifying Long Chain-of-Thought Reasoning in LLMs
Graham Neubig
Ramon Astudillo
Xuhui Zhou
Graham Neubig
Akari Asai
Lindia Tjuatja
When it comes to text prediction, where does one LM outperform another? If you've ever worked on LM evals, you know this question is a lot more complex than it seems. In our new #acl2025 paper, we developed a method to find fine-grained differences between LMs: 🧡1/9
Jun 9, 2025
Lindia Tjuatja