Setup (2/4)
We introduce MedCounterFact, a counterfactual medical QA dataset built on RCT-based evidence synthesis.
ā Replace real interventions in evidence with nonce, mismatched medical, non-medical, or toxic terms
ā Evaluate 9 frontier LLMs under evidence-grounded prompts