Inlay

Profile

I felt honoured and was very happy to get an Outstanding Contribution in NLG award from @siggen.bsky.social !

Took lots of noce pictures at my Retroeval retirement symposium. One of my favourites was me with my first and last PhD students, Sandra Williams and Yujun Wang!

Someone asked me what were the highlights of my career, I responded with a list of papers which I was proud of. I did not mention grants, awards, jobs, etc. I know some people are proudest of their grants (etc), but for me it was always scientific outputs.

I'm looking forward to my retirement workshop, which starts on Monday 1 June!! Will be great to catch up with former students and colleagues, and also discuss NLG evaluation. retroeval.github.io

I remember seeing very dubious advice from OpenAI a few years ago on evaluation. So I was happy to see quite sensible recent advice from Anthropic on evaluation www.anthropic.com/engineering/...

10d

Really interesting scoping review that points out numerous flaws in LLM-as-Judge evaluation in healthcare, including minimal human oversight, absent bias testing, model monoculture, ignore implicit eval components, no check for consistency over time (etc) arxiv.org/abs/2604.25933

I wrote paper on "NLG Evaluation: Past, Present, Future" for Retroeval. Eval has changed enornously over my career! In future, I expect more on stuff relevant to real-world usage, including impact, qualitative studies, safety in worst/adversarial case arxiv.org/abs/2605.23715

New blog: I am worried by NLP research culture NLG and NLP are mostly much better in 2026 than when I got my PhD in 1990. Unfortunately research culture has gotten *worse” in this period, which really worries me as I retire. ehudreiter.com/2026/06/08/n...

On Tuesday at the #RetroEval symposium the SIGGEN board awarded Ehud the Inaugural Prize for Outstanding Contributions to Natural Language Generation, future editions of which will bear his name as the Ehud Reiter Prize for Outstanding Contributions to Natural Language Generation. Thank you, Ehud!

We had a fantastic time last week discussing the current challenges in NLG evaluation and celebrating the career of @ehudreiter.bsky.social. Pictures and a few video clips are now available: retroeval.github.io/_pages/media I would like to thank @uniofaberdeen.bsky.social for their support.

11d

19d

16d

18d

19d

20d

10d

In most ways NLG and NLP are much better in 2026 than when I got my PhD in 1990. Unfortunately research culture has gotten *worse” in this period, which really worries me as I retire. We have…

ehudreiter.com

I am worried by NLP research culture

SIGGEN & INLG

Saad Mahamood