Demographic cues (eg, names, dialect) are widely used to study how LLM behavior may change depending on user demographics. Such cues are often assumed interchangeable.
๐จ We show they are not: different cues yield different model behavior for the same group and different conclusions on LLM bias. ๐งต๐