Inlay

🍏 New preprint alert! 🍏 PoM: Efficient Image and Video Generation with the Polynomial Mixer arxiv.org/abs/2411.12663 This is my latest "summer project" and it was so big I had to call in reinforcements (Thanks @nicolasdufour.bsky.social) TL;DR Transformers are for boomers, welcome to the future 🧵👇

Diffusion models based on Multi-Head Attention (MHA) have become ubiquitous to generate high quality images and videos. However, encoding an image or a video as a sequence of patches results in costly...