//
sign in
Profile
by @danabra.mov
Profile
by @dansshadow.bsky.social
Profile
by @jimpick.com
AviHandle
by @danabra.mov
AviHandle
by @dansshadow.bsky.social
AviHandle
by @katherine.computer
EventsList
by @katherine.computer
ProfileHeader
by @dansshadow.bsky.social
ProfileHeader
by @danabra.mov
ProfileMedia
by @danabra.mov
ProfilePlays
by @danabra.mov
ProfilePosts
by @danabra.mov
ProfilePosts
by @dansshadow.bsky.social
ProfileReplies
by @danabra.mov
Record
by @atsui.org
Skircle
by @danabra.mov
StreamPlacePlaylist
by @katherine.computer
+ new component
Profile
Loading...









Loading...
Research Scientist @ IBM Research. Postdoc @ Berkeley AI. PhD @ Tel Aviv University. Working on Compositionality, Multimodal Foundation Models, and Structured Physical Intelligence. 🔗 https://roeiherz.github.io/ 📍Bay area 🇺🇲
Roei Herzig
CVPR panel at the What is Next in Multimodal Foundation Models? workshop kicks off soon! 11:30AM, R207 A–D (Level 2) Don't miss an amazing discussion with: Ludwig Schmidt, @andrewowens.bsky.social , Arsha Nagrani, and Ani Kembhavi 🔥 @cvprconference.bsky.social sites.google.com/view/mmfm3rd...
Jun 12, 2025
The best friend of Auto-regressive Robotic Models is 4D representations...🤖😻❤️
For example, VLAs use language decoders, which are pretrained on tasks like visual question answering and image captioning. This presents a discrepancy between the models’ high-level pre-training objective and the need for robotic models to predict low-level actions.
Our workshop "What is Next in Multimodal Foundation Models?" has been accepted to #CVPR for its 3rd time! We are cooking amazing talks and an excellent panel for you, so stay tuned! @cvprconference.bsky.social
Oh no, I have a NeurIPS @neuripsconf.bsky.social FOMO🙃😃🤗 Or is it actually more of Taylor Swift?🫠
What happens when vision🤝 robotics meet? 🚨 Happy to share our new work on Pretraining Robotic Foundational Models!🔥 ARM4R is an Autoregressive Robotic Model that leverages low-level 4D Representations learned from human video data to yield a better robotic model. BerkeleyAI 😊
Pretraining has significantly contributed to recent Foundational Model success. However, in robotics, progress has been limited due to a lack of robotic annotations and insufficient representations that accurately model the physical world.
Feb 20, 2025
Our paper: arxiv.org/pdf/2502.13142. Our project page and code will be released soon! Team: \w Dantong Niu, Yuvan Sharma, Haoru Xue, Giscard Biamby, Junyi Zhang, Ziteng Ji, and Trevor Darrell.
Feb 24, 2025
For all our @neuripsconf.bsky.social friends🤖🦋, our work is presented NOW at POSTER #3701. Come hear us talk our work on many-shot in-context learning and test-time scaling by leveraging the activations! You won't be disappointed😎 #Multimodal-InContextLearning #NeurIPS
We found that 4D representations maintain a shared geometric structure between the points and robot state representations up to a linear transformation, and thus enabling efficient transfer learning from human video data to low-level robotic control.
Dec 21, 2024
Dec 10, 2024
Feb 24, 2025
Feb 24, 2025
Feb 24, 2025
Dec 12, 2024
Feb 24, 2025
Roei Herzig
Roei Herzig
Roei Herzig
Roei Herzig
Roei Herzig
Roei Herzig
Hilde Kuehne
Roei Herzig
Roei Herzig
Roei Herzig
Video