5/ 🚀 The result (closed-loop Boston → Singapore):
- Success: 42% → >90%
- Realism (WOSAC): 0.679 → ~0.765
The resulting policies form an upper envelope that Pareto-dominates baselines.
We use a simple reward function and the same hyperparameters for all city pairs.