Inlay

📢 New paper: Can unsupervised metrics extracted from MT models detect their translation errors reliably? Do annotators even *agree* on what constitutes an error? 🧐 We compare uncertainty- and interp-based WQE metrics across 12 directions, with some surprising findings! 🧵 1/