Baidu has recently released and open-sourced a 3B parameter end-to-end OCR model called Unlimited OCR, specifically designed for long document parsing scenarios such as books and papers. After its release, the project quickly topped four trending lists on GitHub and HuggingFace, and within five days of being open-sourced, it surpassed 10,000 GitHub Stars.
Technically, Unlimited OCR activates approximately 570M parameters during inference, andintroduces the Reference Sliding Window Attention (R-SWA) mechanism for the first time. This mechanism breaks the traditional "page-by-page parsing + stitching" limitation, enabling the continuous parsing of dozens of pages in one go; at the same time, it keeps the KV Cache in the decoding phase at a constant scale, so that memory usage and computational costs no longer surge with the increase in output length.
In the OmniDocBench v1.6 benchmark test, the model set a new record with a score of 93.92%. In real-world scenarios, its inference speed is about 12.7% faster than DeepSeek OCR, and the speed advantage increases to 35% at a 6000Tokens output length, providing a new approach for massive document digitization and large model long-term memory management.