//
sign in
Profile
by @danabra.mov
Profile
by @dansshadow.bsky.social
Profile
by @jimpick.com
AviHandle
by @danabra.mov
AviHandle
by @dansshadow.bsky.social
AviHandle
by @katherine.computer
EventsList
by @katherine.computer
ProfileHeader
by @dansshadow.bsky.social
ProfileHeader
by @danabra.mov
ProfileMedia
by @danabra.mov
ProfilePlays
by @danabra.mov
ProfilePosts
by @danabra.mov
ProfilePosts
by @dansshadow.bsky.social
ProfileReplies
by @danabra.mov
Record
by @atsui.org
Skircle
by @danabra.mov
StreamPlacePlaylist
by @katherine.computer
+ new component
Profile
Loading...
NLP & Ling; Phd student @UTAustin @UT_Linguistics website: https://kaijiemo-kj.github.io/








📄 Paper: arxiv.org/abs/2606.05616 💻 Github: github.com/KaijieMo-kj/... w/ @kaijie-mo.bsky.social, Thomas Yang, @chantalsh.bsky.social, @qyao.bsky.social, William Rudman, @ramezkouzy.bsky.social, @kanishka.bsky.social , @byron.bsky.social, @jessyjli.bsky.social (5/5)
(4/5) Where does the shortcut live? Activation patching localizes it to early-mid layers (~2–10). For affix-class drugs, affixes alone reproduce most of the effect. A single low-rank direction can flip fake-drug acceptance. Affix signals emerge early in training; holistic knowledge comes later.
(3/5) How much of a drug’s meaning is just its affix? We decompose recognition into Affix, Stem, and Holistic signals. Many drugs are affix-driven, and models sometimes confuse drugs sharing the same affix. Reliance varies by task and training exposure, with stronger shortcuts for rarer drugs.
(2/5) Using matched triplets (ampicillin → dimicillin → dimiglimto), we isolate the effect of the affix. Models treat fake drugs as real far more often than nonce names, especially larger and medical-tuned models. Even in names like “tablecillin,” the affix still drives the prediction.
“Dimicillin” isn’t real. We made it up. Yet many LLMs still call it an antibiotic. Across 9 models and 653 drugs, we find that drug-name affixes alone can drive pharmacological reasoning. Models often rely on morphology over facts. We trace this shortcut from behavior to mechanism. 🧵
Hello world 👋 My first paper at UT Austin! We ask: what happens when medical “evidence” fed into an LLM is wrong? Should your AI stay faithful, or should it play it safe when the evidence is harmful? We show that frontier LLMs accept counterfactual medical evidence at face value.🧵
Setup (2/4) We introduce MedCounterFact, a counterfactual medical QA dataset built on RCT-based evidence synthesis. – Replace real interventions in evidence with nonce, mismatched medical, non-medical, or toxic terms – Evaluate 9 frontier LLMs under evidence-grounded prompts
Results (3/4) – With evidence, models strongly adhere to it with high confidence, even for toxic or nonsensical interventions – Implausibility awareness is transient; once evidence appears, models rarely flag problems – Scaling, medical fine-tuning, and skeptical prompting offer little protection
📎Paper: arxiv.org/abs/2601.11886 🧑‍💻Code/data: github.com/KaijieMo-kj/... w/ @kaijie-mo.bsky.social @sidvenkatayogi.bsky.social @chantalsh.bsky.social @ramezkouzy.bsky.social @cocoweixu.bsky.social @byron.bsky.social @jessyjli.bsky.social
1d
1d
1d
4mo
4mo
1d
1d
4mo
4mo
The morphological form of a word can often give cues to its meaning, but purely relying on these mappings can lead to overgeneralization in high-stakes domains. In the medical domain, for instance, LL...
arxiv.org
What's in a Name? Morphological Shortcuts by LLMs in Pharmacology
In high-stakes domains like medicine, it may be generally desirable for models to faithfully adhere to the context provided. But what happens if the context does not align with model priors or safety ...
arxiv.org
Faithfulness vs. Safety: Evaluating LLM Behavior Under Counterfactual Medical Evidence