//
sign in
Profile
by @danabra.mov
Profile
by @dansshadow.bsky.social
Profile
by @jimpick.com
AviHandle
by @danabra.mov
AviHandle
by @dansshadow.bsky.social
AviHandle
by @katherine.computer
EventsList
by @katherine.computer
ProfileHeader
by @dansshadow.bsky.social
ProfileHeader
by @danabra.mov
ProfileMedia
by @danabra.mov
ProfilePlays
by @danabra.mov
ProfilePosts
by @danabra.mov
ProfilePosts
by @dansshadow.bsky.social
ProfileReplies
by @danabra.mov
Record
by @atsui.org
Skircle
by @danabra.mov
StreamPlacePlaylist
by @katherine.computer
+ new component
ProfilePosts








Check this recent work by my PhD student Moayed. He has been doing amazing work on Generative AI for images, video and audio. We introduce AV-Link ♾️, an unified approach for audio-video generation. Our generated audio is the best in terms of synchronization with video actions. Check thread below.
Jan 14, 2025
Besides Video to Audio (📽️ ➡️🔊), we also support Audio to Video (🔊➡️📽️) generation under the same unified framework.
Compared to Meta Movie Gen Video to Audio, we achieve significantly better temporal synchronization with a 90% smaller scale model.
Vicente Ordonez
Can pretrained diffusion models be connected for cross-modal generation? 📢 Introducing AV-Link ♾️ Bridging unimodal diffusion models in one self-contained framework to enable: 📽️ ➡️ 🔊 Video-to-Audio generation. 🔊 ➡️ 📽️ Audio-to-Video generation. 🌐: snap-research.github.io/AVLink/ ⤵️ Results
Jan 14, 2025
ICLR rejections go brrrr
After x (aka good old twitter) kept shadow-banning me for no apparent reason, I decided to give Blue Sky a try. Posting this tweet to test my reach
Jan 14, 2025
Jan 14, 2025
Jan 22, 2025
While current approaches uses external pretrained features (e.g. Meta CLIP, BEATs), we found that diffusion activations hold rich, semantically and temporally aware features, making them perfect for cross-modal generation in a self-contained framework. 🔊➡️📽️ Example:
Jan 10, 2025
Jan 14, 2025
A great collaboration with W. Menapace, A. Siarohin, I. Skorokhodov, A. Canberk, K.S Lee, V. Ordonez, and S. Tulyakov. Please repost to support our work and check out our Arxiv preprint: arxiv.org/abs/2412.15191 Webpage: snap-research.github.io/AVLink/
Moayed Haji ALi
Jan 14, 2025
Moayed Haji ALi
Moayed Haji ALi
Moayed Haji ALi
Moayed Haji ALi
Moayed Haji ALi
recise temporal synchronization remains a significant challenge for current video-to-audio models. AV-Link addresses this by leveraging diffusion features to accurately capture both local and global temporal events, such as hand slides on a guitar and fretboard pitch changes.
Jan 14, 2025
Moayed Haji ALi
Moayed Haji ALi
Can pretrained diffusion models be connected for cross-modal generation? 📢 Introducing AV-Link ♾️ Bridging unimodal diffusion models in one self-contained framework to enable: 📽️ ➡️ 🔊 Video-to-Audio generation. 🔊 ➡️ 📽️ Audio-to-Video generation. 🌐: snap-research.github.io/AVLink/ ⤵️ Results
Jan 14, 2025
Moayed Haji ALi