LLMs are autoregressive and slow? No! Parallel Token Prediction decodes multiple consistent tokens in one model call. PTP allows arbitrary dependencies in one call, unlike discrete diffusion. Practical: 2.4x speedup
github.com/mandt-lab/ptp
ICLR: Apr 23, morning poster P3-#608
Video
Felix Draxler