Article Content

Baidu releases ERNIE-4.5-VL-28B-A3B-Thinking: Accurately locates image details to solve complex problems

Published in Latest AI News

Time :Nov 13, 2025

Read :3minute

Baidu has recently launched its latest multimodal artificial intelligence model - ERNIE-4.5-VL-28B-A3B-Thinking, a new type of AI model that deeply integrates images into the reasoning process. Baidu claims that this model performs well in multiple multimodal benchmark tests, occasionally surpassing top commercial models such as Google's Gemini 2.5 Pro and OpenAI's GPT-5 High.

Baidu, search (3)

Combining Lightweight and High Performance

Although the total number of parameters in this model is 28 billion, due to the use of a routing architecture, it only uses 3 billion active parameters for reasoning. With this efficient architecture, ERNIE-4.5-VL-28B-A3B-Thinking can run on a single device equipped with 80GB GPU (such as the Nvidia A100). Baidu has released the model under the Apache 2.0 license, allowing it to be used for commercial projects free of charge. However, the performance claimed by Baidu has not yet been independently verified.

Core Capabilities: "Image Thinking" and Precise Localization

The standout feature of this model is its **"Image Thinking"** function, which allows it to dynamically process images during reasoning to highlight key details. For example, the model can automatically zoom in on blue signs in an image and accurately identify the text on them, effectively using an image editing tool internally.

Other tests have demonstrated its strong multimodal capabilities:

It can precisely locate people in images and return their coordinates.
It can solve complex mathematical problems by analyzing circuit diagrams.
It can recommend the best time for sightseeing based on chart data.
For video input, it can extract subtitles and match scenes with specific timestamps.
It can access external tools, such as web-based image searches, to identify unfamiliar organisms.

Industry Background and Function Comparison

Related Recommendations

Google Gemini Faces Large-Scale Model Distillation Attack, With Over 100,000 Prompts Leaking Core Logic in a Single Instance

Google's AI chatbot Gemini faced a large-scale 'distillation attack,' where attackers used over 100,000 repeated queries to extract its internal mechanisms, aiming to clone or enhance their own AI systems. Google attributed the attack to commercial motives, raising industry-wide concerns over large model security.....

Feb 15, 2026

171.4k

Google Launches WAXAL, an African Speech Dataset to Help Africa Regain AI Data Sovereignty

Google launched the WAXAL speech dataset in Africa, covering 21 African languages, aiming to improve the accuracy of AI systems in recognizing African languages. Its core breakthrough lies in returning data ownership to African local institutions rather than having it controlled by Google, addressing the issue of data sovereignty.

Feb 12, 2026

153.6k

Autodesk Sues Google AI Software for Infringing on Flow Trademark

Autodesk sues Google AI software for allegedly infringing its 'Flow' trademark, stating that it has been using the brand in fields such as visual effects since 2022, while Google's name-brand AI software released in 2025 targets the same customer base, suspected of infringement.

Feb 11, 2026

136.7k

Spending Billions to Strengthen Infrastructure! Alphabet Plans to Issue $20 Billion in Bonds for Deep Expansion of AI Data Centers

Alphabet plans to issue $20 billion in bonds, exceeding market expectations, to fund its AI strategy and demonstrate strong commitment in the AI computing race.....

Feb 10, 2026

132.1k

Google and Apple Team Up Strongly! The Next Generation AI Model is About to Be Released

Google becomes Apple's preferred cloud provider, integrating Gemini AI for Apple's next-gen models. Google plans $175-185B infrastructure investment by 2026 to boost AI development.....

Feb 5, 2026

188.3k

Intelligent Future, Your Artificial Intelligence Solution Think Tank

English 简体中文繁體中文にほんご