Check this recent work by my PhD student Moayed. He has been doing amazing work on Generative AI for images, video and audio. We introduce AV-Link âžď¸, an unified approach for audio-video generation. Our generated audio is the best in terms of synchronization with video actions. Check thread below.
Vicente Ordonez
Can pretrained diffusion models be connected for cross-modal generation?
đ˘ Introducing AV-Link âžď¸
Bridging unimodal diffusion models in one self-contained framework to enable:
đ˝ď¸ âĄď¸ đ Video-to-Audio generation.
đ âĄď¸ đ˝ď¸ Audio-to-Video generation.
đ: snap-research.github.io/AVLink/
â¤ľď¸ Results