Article Content

Baidu Launches PP-ChatOCR, a General Image Key Information Extraction Tool Based on Wenxin Large Model

Published in Latest AI News

Time :Aug 11, 2023

Read :1minute

The Baidu PaddlePaddle team has announced the launch of PP-ChatOCR, a universal image key information extraction tool based on the Wenxin large model. This tool integrates OCR text recognition and large model technology, enabling the extraction of key information from images in various scenarios. PP-ChatOCR is now available on PaddleX, allowing developers to fine-tune their training on this platform and supports high-performance deployment. The tool has demonstrated excellent accuracy and stability across multiple scenarios.

Related Recommendations

Tencent Releases HunyuanOCR Open-Source Model, Achieving Multiple SOTA Performances with Only 1B Parameters

Tencent releases HunyuanOCR, a 1B-parameter open-source model based on the Hunyuan multimodal architecture, achieving SOTA in OCR. It features end-to-end design with core components: native resolution video encoder, adaptive vision adapter, and lightweight language model.....

Nov 25, 2025

189.3k

NotebookLM Upgraded to Support Image Import, Whiteboard Notes Become Searchable Knowledge Base

Google introduces image recognition features for NotebookLM, allowing users to upload whiteboard notes, textbooks, or table images, and automatically recognize text and perform semantic analysis. Users can directly search the content of images using natural language. This feature is free across all platforms and will soon add local processing options to protect privacy. The system uses multimodal technology to distinguish between handwritten and printed text, analyze table structures, and intelligently link with existing notes.

Nov 17, 2025

270.7k

Baidu PaddleOCR-VL Model Tops Global OCR Rankings, Continues to Lead Huggingface Trending List for Five Consecutive Days

On October 16, Baidu PaddlePaddle released the vision language model PaddleOCR-VL, achieving a score of 92.56 in the authoritative evaluation OmniDocBench V1.5 with 0.9B parameters, surpassing mainstream models such as DeepSeek-OCR and topping the global OCR rankings. As of October 21, the top three positions on the Huggingface trending list were all occupied by OCR models, with Baidu PaddlePaddle ranking first.

Oct 24, 2025

327.0k

Enterprise Search Technology Showdown: Vision-RAG vs. Text-RAG

Comparison of Vision-RAG and Text-RAG for enterprise info retrieval shows Vision-RAG's direct visual processing may outperform Text-RAG's error-prone OCR conversion, offering insights for optimizing search strategies.....

Sep 25, 2025

139.5k

Baidu Qianfan-VL Open-Source Release Kunlun Chip Enables New Breakthroughs in Multimodal AI

Baidu open-sources the visual understanding model Qianfan-VL, launching three versions: 3B, 8B, and 70B, suitable for different application scenarios. The model is trained on the self-developed Kunlun Chip P800, demonstrating the strength of domestic chips in AI. As a multimodal large model, Qianfan-VL can understand both images and text, enabling cross-modal intelligent processing.

Sep 25, 2025

148.0k

Intelligent Future, Your Artificial Intelligence Solution Think Tank

English 简体中文繁體中文にほんご