On July 7, the Baidu AI team announced the official release of PaddleOCR 3.1, which achieved three major upgrades in multilingual recognition, complex document translation, and large model connectivity. The new version supports text recognition in 37 languages, with an average accuracy improvement of over 30%. It also introduces a document translation pipeline and MCP server functionality to help developers efficiently build AI applications.
Addressing multilingual needs in global scenarios, PaddleOCR 3.1 adds the PP-OCRv5 multilingual model, covering 37 languages such as French, Spanish, and Russian. By integrating the visual and text understanding capabilities of the ERNIE 4.5 multimodal large model, the model can automatically complete high-confidence text detection and data annotation, solving the problem of scarce multilingual data. Test data shows that the new model improves recognition accuracy by more than 30% in Latin and East Slavic language scenarios. For example, the error rate for Korean recognition dropped from 8.7% to 2.1%, and the parsing speed for complex Russian layout documents increased by two times.
Combined with the PP-StructureV3 document parsing engine and the ERNIE large model, PaddleOCR 3.1 introduces the PP-DocTranslation translation pipeline. This tool can intelligently recognize complex elements such as tables, formulas, and handwritten text in PDFs and images, and convert them into Markdown format for multilingual translation. For professional fields such as law and medicine, the system allows users to upload terminology comparison tables to achieve precise translation of "key vocabulary." For example, after using this feature, a multinational pharmaceutical company improved the efficiency of drug instruction translations by 40%, achieving 99.2% consistency in professional terminology.
To lower the barriers to AI application development, PaddleOCR 3.1 introduces the MCP (Model Context Protocol) server function, which supports seamlessly integrating OCR capabilities into downstream applications through a standardized protocol. Developers can quickly set up an MCP service with just a few steps, and access core functions such as image text recognition and document layout analysis through local Python libraries, the PaddlePaddle Starry Sky Community, or self-hosted services.
Open Source Address:https://github.com/PaddlePaddle/PaddleOCR