Key to this is to decouple environment interaction from language generation while maintaining the reasoning capabilities of pre-trained models.
Project page: expa-rl.github.io
Pre-print: arxiv.org/abs/2510.07581
PS. Nick is on the job market!
Expanding the Action Space of LLMs to Reason Beyond Language