Article Content

Qwen3-VL Tops SpatialBench: Spatial Reasoning Achieves 13.5 Points, Setting a New SOTA for 3D Detection Capabilities

Published in Latest AI News

Time :Nov 26, 2025

Read :3minute

Alibaba Qwen's vision model secured the top two positions in the third-party spatial reasoning benchmark SpatialBench: Qwen3-VL with 13.5 points, and Qwen2.5-VL with 12.9 points, significantly outperforming Gemini 3.0 Pro Preview (9.6 points) and GPT-5.1 (7.5 points), getting closer to the human baseline of 80 points.

Features of the Benchmark

SpatialBench focuses on 2D/3D spatial, structural, and path reasoning, including complex tasks such as circuit analysis, CAD engineering, and molecular biology, and is regarded as the "litmus test for embodied intelligence."

Model Highlights

- 3D Detection Upgrade: Qwen3-VL adds rotated bounding box output and depth estimation heads, increasing AP by 18% in occluded scenes, allowing the model to determine object orientation and perspective changes

- Visual Programming: Input a sketch or a 10-second video to generate executable Python + OpenCV code, achieving "what you see is what you get."

- Diverse Scale: Offers dense models of 2B/4B/8B/32B, as well as MoE versions such as 30B-A3B and 235B-A22B, with the inference version surpassing Gemini 2.5-Pro by an average of 6.4 points across 32 core capabilities.

Open Source Schedule

Qwen2.5-VL has been fully open-sourced; Qwen3-VL is expected to release weights and toolchain in Q2 2025, while also launching the Qwen App for free experience.

Implementation Progress

Alibaba Cloud revealed that Qwen3-VL has already entered proof-of-concept stages in logistics robots, AR assembly, smart ports, and other scenarios, with spatial positioning error less than 2cm. In 2026, it will launch an "vision-action" end-to-end model to provide real-time visual servoing capabilities for robots.

Related Recommendations

Aliyun Qwen3-VL Adds Two Model Sizes: 2B and 32B, Easily Run on Mobile Devices

Qwen3-VL adds 2B and 32B dense models for lightweight to high-performance vision-language tasks, supporting mobile devices. Instruct models offer fast, stable responses for dialogues and tools, while Thinking models focus on reasoning, enhancing development ease and flexibility.....

Oct 22, 2025

136.8k

Qwen3-VL Family Adds 2B and 32B Models! Open Source Matrix Gets a Major Upgrade

Alibaba Cloud launches two new Qwen3-VL models (2B and 32B), expanding the series to 24 open-source models with a comprehensive tech matrix from lightweight to large-scale.....

Oct 22, 2025

139.5k

Alibaba Launches Compact Qwen3-VL Model to Enhance Multimodal AI Efficiency and Accelerate Edge Device Deployment

Alibaba releases Qwen3-VL compact vision-language models (400M/800M params), enhancing STEM reasoning, visual QA, OCR & video understanding, matching large models' performance.....

Oct 15, 2025

149.8k

Silicon-Based Flow Platform Launches Alibaba Qwen3-VL Model, Significantly Enhancing Visual Cognition Capabilities

The Silicon-Based Flow Platform has launched the Alibaba Qwen3-VL open-source model, which shows significant progress in visual understanding, temporal analysis, and multimodal reasoning. It can effectively address challenges such as image blurriness and complex videos, enhancing visual cognition capabilities. It supports OCR in 32 languages, accurately processes weak visual information, and helps users easily handle complex visual tasks.

Oct 13, 2025

137.7k

Australia's Macquarie Dictionary Announces AI Slop as Word of the Year for 2025

Macquarie Dictionary names 'AI slop' as 2025 Word of the Year, referring to low-value AI-generated content, highlighting concerns over its prevalence and societal reliance on search engines for meaningful information.....

Nov 26, 2025

98.4k

Intelligent Future, Your Artificial Intelligence Solution Think Tank

English 简体中文繁體中文にほんご