Inlay

Check this recent work by my PhD student Moayed. He has been doing amazing work on Generative AI for images, video and audio. We introduce AV-Link ♾️, an unified approach for audio-video generation. Our generated audio is the best in terms of synchronization with video actions. Check thread below.

Can pretrained diffusion models be connected for cross-modal generation? 📢 Introducing AV-Link ♾️ Bridging unimodal diffusion models in one self-contained framework to enable: 📽️ ➡️ 🔊 Video-to-Audio generation. 🔊 ➡️ 📽️ Audio-to-Video generation. 🌐: snap-research.github.io/AVLink/ ⤵️ Results