Recently, the vivo AI Lab released its latest edge-side multimodal model - BlueLM-2.5-3B. This model is not only compact and efficient but also has the ability to understand graphical user interfaces (GUI), marking an important step forward for artificial intelligence in processing text and images.
The unique feature of BlueLM-2.5-3B is that it can flexibly switch between short and long thinking modes and introduces a thinking budget control mechanism, helping AI better balance the depth and efficiency of thinking. This makes the model perform well in various text and multimodal evaluation tasks, especially in understanding and reasoning, with the potential to exceed many similar products.
In more than 20 evaluations, BlueLM-2.5-3B demonstrated strong text processing capabilities, successfully alleviating the common "forgetting problem" in multimodal models. Under the long thinking mode, the model's performance in reasoning tasks, such as mathematical and logical reasoning, was significantly better than other models of similar scale. In addition, its performance in multimodal understanding was also impressive, able to rival larger scale models, demonstrating its strength.
Beyond that, BlueLM-2.5-3B showed remarkable performance in understanding GUI, thanks to training on a large amount of Chinese application screenshots. In this aspect, its score exceeded many competitors, showcasing vivo's strength in the field of artificial intelligence.
To support such outstanding performance, BlueLM-2.5-3B adopts a sophisticated model structure, with only 2.9B parameters, and relatively low training and inference costs. Through optimized data utilization strategies and efficient training processes, the model has significantly improved data utilization efficiency, laying a solid foundation for the popularization and application of AI.
The release of BlueLM-2.5-3B not only brings users a more intelligent application experience, but also adds new momentum to the advancement of artificial intelligence technology.