it’s OCR week! learn how we use verifiable rewards against unit tests to improve olmOCR’s PDF understanding
state of the art OCR, fully open model:
Luca Soldaini 🎀
We’re updating olmOCR, our model for turning PDFs & scans into clean text with support for tables, equations, handwriting, & more. olmOCR 2 uses synthetic data + unit tests as verifiable rewards to reach state-of-the-art performance on challenging documents. 🧵