DeepSeek has officially launched the large-scale image recognition mode for internal testing, marking that this domestic large model has fully entered the era of multimodal interaction between text and images. After a small-scale gray-scale test in late April, DeepSeek significantly expanded access to the "Image Recognition Mode" on May 9th, and most test accounts can now access this feature through an independent entry point in the chat interface. Although the system is still labeled as "in internal testing," its layout, which is listed alongside "Quick Mode" and "Expert Mode" above the input box, indicates that multimodal understanding has become a key component of its core product matrix.

QQ20260509-142648.jpg

Differing from traditional simple OCR text extraction, the core of DeepSeek's latest upgrade lies in deep image recognition and semantic understanding capabilities. In practical tests, this mode can logically decompose and perceive visual information, supporting users to achieve complex cross-media interactions by directly uploading images. This move fills the gap in DeepSeek's multimodal understanding field, marking a substantial step forward in its pursuit of international top models such as GPT-4o.