Inlay

Profile

1/ 🚗 🌏 What if an autonomous vehicle could move to a new city without collecting a single human demonstration in that city? I am so excited to introduce our new work: Learning to Drive in New Cities Without Human Demonstrations.

3mo

Zilin Wang

2/ 🤔 Autonomous vehicles now outperform humans within specific operating regions, but their deployment to new cities remains costly and slow. Key bottleneck: Collecting human demonstrations from new cities.

3mo

8/ :heart: Thanks to my amazing collaborators @saeedrmd.bsky.social , @daphne-cornelisse.bsky.social @bidiptas13.bsky.social @alexdgoldie.bsky.social @jfoerst.bsky.social @shimon8282.bsky.social. Thanks to all colleagues for the helpful discussions. If you’re into AVs / RL, we’d love your thoughts!

4/ 🚄 We introduce NO data Map-based self-play for Autonomous Driving (NOMAD), which adapts a driving policy to a new city using only the city's lane-level map + meta info. The policy is trained via KL-regularized self-play in a target-city simulator.

3/ 💡 However, the lane-level map and traffic meta-information are prevalent and inexpensive. Our solution: Adapting driving policies to new cities by map-based self-play multi-agent reinforcement learning. ❌ Target-city human trajectories. ❌ Expensive data collection loop.

3mo

7/ 🔮 NOMAD substantially narrows cross-city generalization gaps, supporting scalable deployment of autonomous driving systems across diverse environments and highlighting the promise of self-play MARL for improving safety and robustness. Paper: arxiv.org/abs/2602.15891

6/ 🔬 We also analyze: • role of behavioral priors • necessity of target-city maps • comparison to methods that do use target-city demos • generalization across cities • sensitivity to KL strength • evaluation under non-self-play agents • effect of map mirroring

3mo

5/ 🚀 The result (closed-loop Boston → Singapore): - Success: 42% → >90% - Realism (WOSAC): 0.679 → ~0.765 The resulting policies form an upper envelope that Pareto-dominates baselines. We use a simple reward function and the same hyperparameters for all city pairs.

3mo

Zilin Wang

3mo

Zilin Wang