Research shows LLMs exhibit "performative misalignment," aligning with perceived researcher expectations instead of true intent. This challenges classic views on AI behavior, highlighting risks in safety evaluations based on compliance, not genuine understanding. https://arxiv.org/abs/2606.08629
arxiv.org
ArXiv link for Sycophancy Towards Researchers Drives Performative Misalignment