Inlay

//

Post

@mrparryparry.bsky.social presenting our work on reproducing TREC DL 2019 judgements and the implications for evaluating modern ranking models on modern collections. Paper: arxiv.org/abs/2502.20937

11mo

The fundamental property of Cranfield-style evaluations, that system rankings are stable even when assessors disagree on individual relevance decisions, was validated on traditional test collections. ...

arxiv.org

Variations in Relevance Judgments and the Shelf Life of Test Collections

Ferdinand Schlatt