1/ 🚗 🌏 What if an autonomous vehicle could move to a new city without collecting a single human demonstration in that city?
I am so excited to introduce our new work: Learning to Drive in New Cities Without Human Demonstrations.
Zilin Wang
2/ 🤔 Autonomous vehicles now outperform humans within specific operating regions, but their deployment to new cities remains
costly and slow.
Key bottleneck: Collecting human demonstrations from new cities.
8/ :heart: Thanks to my amazing collaborators @saeedrmd.bsky.social , @daphne-cornelisse.bsky.social @bidiptas13.bsky.social @alexdgoldie.bsky.social @jfoerst.bsky.social @shimon8282.bsky.social. Thanks to all colleagues for the helpful discussions.
If you’re into AVs / RL, we’d love your thoughts!
4/ 🚄 We introduce NO data Map-based self-play for Autonomous Driving (NOMAD), which adapts a driving policy to a new city using only the city's lane-level map + meta info. The policy is trained via KL-regularized self-play in a target-city simulator.
3/ 💡 However, the lane-level map and traffic meta-information are prevalent and inexpensive.
Our solution: Adapting driving policies to new cities by map-based self-play multi-agent reinforcement learning.
❌ Target-city human trajectories. ❌ Expensive data collection loop.
7/ 🔮 NOMAD substantially narrows cross-city generalization gaps, supporting scalable deployment of autonomous driving systems across diverse environments and highlighting the promise of self-play MARL for improving safety and robustness.
Paper: arxiv.org/abs/2602.15891
6/ 🔬 We also analyze:
• role of behavioral priors
• necessity of target-city maps
• comparison to methods that do use target-city demos
• generalization across cities
• sensitivity to KL strength
• evaluation under non-self-play agents
• effect of map mirroring
5/ 🚀 The result (closed-loop Boston → Singapore):
- Success: 42% → >90%
- Realism (WOSAC): 0.679 → ~0.765
The resulting policies form an upper envelope that Pareto-dominates baselines.
We use a simple reward function and the same hyperparameters for all city pairs.