Inlay

LLMs are autoregressive and slow? No! Parallel Token Prediction decodes multiple consistent tokens in one model call. PTP allows arbitrary dependencies in one call, unlike discrete diffusion. Practical: 2.4x speedup github.com/mandt-lab/ptp ICLR: Apr 23, morning poster P3-#608