vivo New Multimodal Model Launches! AI's Ability to Understand GUI Interfaces is Upgraded Again!

AIbase基地

Published in AI News · 3 minute read · Jul 10, 2025

Recently, the vivo AI Lab released its latest edge-side multimodal model - BlueLM-2.5-3B. This model is not only compact and efficient but also has the ability to understand graphical user interfaces (GUI), marking an important step forward for artificial intelligence in processing text and images.

The unique feature of BlueLM-2.5-3B is that it can flexibly switch between short and long thinking modes and introduces a thinking budget control mechanism, helping AI better balance the depth and efficiency of thinking. This makes the model perform well in various text and multimodal evaluation tasks, especially in understanding and reasoning, with the potential to exceed many similar products.

vivo 001

In more than 20 evaluations, BlueLM-2.5-3B demonstrated strong text processing capabilities, successfully alleviating the common "forgetting problem" in multimodal models. Under the long thinking mode, the model's performance in reasoning tasks, such as mathematical and logical reasoning, was significantly better than other models of similar scale. In addition, its performance in multimodal understanding was also impressive, able to rival larger scale models, demonstrating its strength.

Beyond that, BlueLM-2.5-3B showed remarkable performance in understanding GUI, thanks to training on a large amount of Chinese application screenshots. In this aspect, its score exceeded many competitors, showcasing vivo's strength in the field of artificial intelligence.

To support such outstanding performance, BlueLM-2.5-3B adopts a sophisticated model structure, with only 2.9B parameters, and relatively low training and inference costs. Through optimized data utilization strategies and efficient training processes, the model has significantly improved data utilization efficiency, laying a solid foundation for the popularization and application of AI.

The release of BlueLM-2.5-3B not only brings users a more intelligent application experience, but also adds new momentum to the advancement of artificial intelligence technology.

Amazon Plans to Increase Investment in Anthropic and Build the World's Largest Data Center Together!

Amazon plans additional investment in AI startup Anthropic to strengthen their partnership. After investing $8B, the new round could make Amazon a major shareholder. They will collaborate on the world's largest data center project to provide computing power for Anthropic and sell its tech to AWS customers. Anthropic, founded by ex-OpenAI employees, competes with ChatGPT via its Claude model. Amazon also aims to invest $11B in Indiana data centers....

Aliyun Open Sources ThinkSound: AI Automatically Adds Sound Effects to Videos, Bringing a Major Transformation to Film and Game Creation!

Alibaba open sources the audio generation model ThinkSound, which supports multimodal inputs such as video, text, and audio, and can automatically generate high-fidelity sound effects that highly match the visuals. The model uses chain reasoning technology to achieve precise synchronization between audio and video, and is applicable to fields such as film and games. As an open source project, ThinkSound lowers the barriers to sound effect creation, and developers can freely access it through multiple platforms. This is Alibaba's latest breakthrough in the field of multimodal AI, and will drive the development of sound generation technology.

Using AI to Simulate User Behavior, Blok Helps Developers Improve App Experience

Blok is a startup that focuses on AI testing tools. Its innovative technology can simulate user roles for application testing, helping developers predict user behavior in advance. The founding team consists of experienced entrepreneurs and has raised 7.5 million USD in funding. Compared to traditional testing tools, Blok is more forward-thinking and can provide improvement suggestions before coding. The product is currently in closed beta, primarily serving industries such as finance and healthcare, which require high testing accuracy. It is expected to generate millions of dollars in revenue this year.

vivo New Multimodal Model Launches! AI's Ability to Understand GUI Interfaces is Upgraded Again!

Related AI News

ChatGPT Business Recommendations May Pose Risks Due to Unreliable Information Sources, Experts Urge Users to Use with Caution

Keling AI Launches Keltu 2.1 Model, Will Be Free for All Members for 7 Days

Google AI Advertising Tool Launches Strongly in India, Driving New Changes in Digital Marketing!

Amazon Plans to Increase Investment in Anthropic and Build the World's Largest Data Center Together!

Aliyun Open Sources ThinkSound: AI Automatically Adds Sound Effects to Videos, Bringing a Major Transformation to Film and Game Creation!

Using AI to Simulate User Behavior, Blok Helps Developers Improve App Experience