Have you heard of soft metrics such as soft micro F1? In this article, the authors argue that for evaluating model predictions with human label variation, the standard metrics may not be sufficient. Read more at: doi.org/10.1162/COLI... @kmkurn.sigmoid.social.ap.brid.gy #NLProc #NLP