Inlay

Demographic cues (eg, names, dialect) are widely used to study how LLM behavior may change depending on user demographics. Such cues are often assumed interchangeable. 🚨 We show they are not: different cues yield different model behavior for the same group and different conclusions on LLM bias. 🧵👇