//
sign in
Profile
by @danabra.mov
Profile
by @dansshadow.bsky.social
Profile
by @jimpick.com
AviHandle
by @danabra.mov
AviHandle
by @dansshadow.bsky.social
AviHandle
by @katherine.computer
EventsList
by @katherine.computer
ProfileHeader
by @dansshadow.bsky.social
ProfileHeader
by @danabra.mov
ProfileMedia
by @danabra.mov
ProfilePlays
by @danabra.mov
ProfilePosts
by @danabra.mov
ProfilePosts
by @dansshadow.bsky.social
ProfileReplies
by @danabra.mov
Record
by @atsui.org
Skircle
by @danabra.mov
StreamPlacePlaylist
by @katherine.computer
+ new component
ProfilePosts









Loading...
🚨New Paper: LLM developers aim to align models with values like helpfulness or harmlessness. But when these conflict, which values do models choose to support? We introduce ConflictScope, a fully-automated evaluation pipeline that reveals how models rank values under conflict. (📷 xkcd)
8mo
🚀 Apply to CMU LTI’s Summer 2026 “Language Technology for All” internship! 🎓 Open to pre‑doctoral students new to language tech (non‑CS backgrounds welcome). 🔬 12–14 weeks in‑person in Pittsburgh — travel + stipend paid. 💸 Deadline: Feb 20, 11:59pm ET. Apply → forms.gle/cUu8g6wb27Hs...
Thanks to my collaborators @kghate.bsky.social @monadiab77.bsky.social @daniel-fried.bsky.social @atoosakz.bsky.social @maxkw.bsky.social for their support in making this work possible!
8mo
4mo
Andy Liu
Andy Liu
Maarten Sap
ConflictScope can also be used to evaluate different approaches toward steering models. We find that including detailed target rankings in system prompts consistently improves model alignment with the target ranking while under conflict, but with plenty of room for improvement.
8mo
Andy Liu
8mo
To address issues with multiple-choice evaluation, we focus on open-ended evaluation with a simulated user. Annotation studies show strong correlation between LLM and human judgments of which action a model took in a given scenario, allowing us to automate open-ended evaluations.
Given a set of values, ConflictScope generates scenarios in which an LLM-based assistant faces a conflict between a pair of values in the set. It then evaluates which value a target LLM supports more in each scenario before combining scenario-level judgments into a value ranking.
8mo
Andy Liu
8mo
We introduce new metrics to measure how morally challenging a dataset is for models. We find that ConflictScope produces datasets that elicit more disagreement and stronger preferences than moral dilemma datasets, while alignment data frequently elicits indifference from models.
Please reach out if you'd like to chat about this work! We hope ConflictScope helps researchers study how models handle value conflicts that matter to their communities. Code and data: github.com/andyjliu/con... Arxiv: www.arxiv.org/abs/2509.25369
8mo
Andy Liu
🚨New paper: Reward Models (RMs) are used to align LLMs, but can they be steered toward user-specific value/style preferences? With EVALUESTEER, we find even the best RMs we tested exhibit their own value/style biases, and are unable to align with a user >25% of the time. 🧵
8mo
Andy Liu
Andy Liu