Can pretrained diffusion models be connected for cross-modal generation?
đ˘ Introducing AV-Link âžď¸
Bridging unimodal diffusion models in one self-contained framework to enable:
đ˝ď¸ âĄď¸ đ Video-to-Audio generation.
đ âĄď¸ đ˝ď¸ Audio-to-Video generation.
đ: snap-research.github.io/AVLink/
â¤ľď¸ Results
Video
Moayed Haji ALi