Microsoft recently released a new open-source AI model called Phi-4-Reasoning-Vision-15B in its developer community. This model not only has high-resolution visual perception capabilities but can also perform deep reasoning, marking an important breakthrough in the Phi-4 series. As the first "small language model" (SLM) with the characteristics of "seeing clearly" and "thinking deeply," the release of Phi-4 will open up new intelligent application scenarios for developers.

Different from traditional vision models, Phi-4 does not merely passively identify content in images; it can perform structured and multi-step reasoning. It can understand the visual structure in images and combine it with text context to draw actionable conclusions. This capability allows developers to create various intelligent applications, ranging from data chart analysis to user interface automation.

image.png

Phi-4's design features include its flexible reasoning mode. When faced with tasks requiring in-depth analysis, such as math problems or logical reasoning, the model switches to "reasoning mode," activating a multi-step reasoning chain. In scenarios requiring quick responses, such as OCR (optical character recognition) or element positioning, it can quickly output results to reduce latency. This flexibility greatly enhances the model's practicality and efficiency.

image.png

Non-reasoning mode

Additionally, Phi-4 has significant application potential, especially in scenarios involving computer agents. Users need only provide a screenshot and natural language instructions, and the model can output standardized bounding box coordinates for the required UI elements. Other intelligent agent models can then perform interactive operations such as clicking or scrolling based on this information. Thus, Phi-4 will offer users a more convenient experience.

image.png

Reasoning mode

In summary, Phi-4-Reasoning-Vision-15B not only represents a technical breakthrough but also provides strong support for the development of intelligent applications. With the release of this model, we look forward to more developers utilizing its advanced features to create more amazing application scenarios.