๐ New preprint alert! ๐
PoM: Efficient Image and Video Generation with the Polynomial Mixer
arxiv.org/abs/2411.12663
This is my latest "summer project" and it was so big I had to call in reinforcements (Thanks @nicolasdufour.bsky.social)
TL;DR Transformers are for boomers, welcome to the future
๐งต๐
Diffusion models based on Multi-Head Attention (MHA) have become ubiquitous to generate high quality images and videos. However, encoding an image or a video as a sequence of patches results in costly...