//
sign in
Profile
by @danabra.mov
Profile
by @dansshadow.bsky.social
Profile
by @jimpick.com
AviHandle
by @danabra.mov
AviHandle
by @dansshadow.bsky.social
AviHandle
by @katherine.computer
EventsList
by @katherine.computer
ProfileHeader
by @dansshadow.bsky.social
ProfileHeader
by @danabra.mov
ProfileMedia
by @danabra.mov
ProfilePlays
by @danabra.mov
ProfilePosts
by @danabra.mov
ProfilePosts
by @dansshadow.bsky.social
ProfileReplies
by @danabra.mov
Record
by @atsui.org
Skircle
by @danabra.mov
StreamPlacePlaylist
by @katherine.computer
+ new component
Profile
Loading...







1/ 🚗 🌏 What if an autonomous vehicle could move to a new city without collecting a single human demonstration in that city? I am so excited to introduce our new work: Learning to Drive in New Cities Without Human Demonstrations.
3mo
Zilin Wang
2/ 🤔 Autonomous vehicles now outperform humans within specific operating regions, but their deployment to new cities remains costly and slow. Key bottleneck: Collecting human demonstrations from new cities.
3mo
8/ :heart: Thanks to my amazing collaborators @saeedrmd.bsky.social , @daphne-cornelisse.bsky.social @bidiptas13.bsky.social @alexdgoldie.bsky.social @jfoerst.bsky.social @shimon8282.bsky.social. Thanks to all colleagues for the helpful discussions. If you’re into AVs / RL, we’d love your thoughts!
4/ 🚄 We introduce NO data Map-based self-play for Autonomous Driving (NOMAD), which adapts a driving policy to a new city using only the city's lane-level map + meta info. The policy is trained via KL-regularized self-play in a target-city simulator.
3/ 💡 However, the lane-level map and traffic meta-information are prevalent and inexpensive. Our solution: Adapting driving policies to new cities by map-based self-play multi-agent reinforcement learning. ❌ Target-city human trajectories. ❌ Expensive data collection loop.
3mo
3mo
7/ 🔮 NOMAD substantially narrows cross-city generalization gaps, supporting scalable deployment of autonomous driving systems across diverse environments and highlighting the promise of self-play MARL for improving safety and robustness. Paper: arxiv.org/abs/2602.15891
6/ 🔬 We also analyze: • role of behavioral priors • necessity of target-city maps • comparison to methods that do use target-city demos • generalization across cities • sensitivity to KL strength • evaluation under non-self-play agents • effect of map mirroring
3mo
5/ 🚀 The result (closed-loop Boston → Singapore): - Success: 42% → >90% - Realism (WOSAC): 0.679 → ~0.765 The resulting policies form an upper envelope that Pareto-dominates baselines. We use a simple reward function and the same hyperparameters for all city pairs.
3mo
Zilin Wang
3mo
3mo
Zilin Wang
Zilin Wang
Zilin Wang
Zilin Wang
Zilin Wang
Zilin Wang