We evaluate FMSPs in Car Tag, an asymmetric continuous-control game (see gifs above). FMSP variants write code-based policies (go left; q-learning; etc). Below are PCA plots of policy embeddings showing that QDSP has the highest QD-Score vs the other FMSPs and a non-LLM baseline