
In core performance tests,
This open source includes Base and Thinking versions. Thanks to the innovative parallel coordination reasoning mechanism (PaCoRe), the model performs particularly stably in tasks such as high-precision OCR, complex counting, and spatial topological understanding. This means that complex multimodal reasoning capabilities that previously required cloud computing can now be deployed more cost-effectively on edge devices like phones and computers, greatly improving the interaction efficiency of edge-side agents.
Project Homepage: https://stepfun-ai.github.io/Step3-VL-10B/
Paper Link: https://arxiv.org/abs/2601.09668
HuggingFace: https://huggingface.co/collections/stepfun-ai/step3-vl-10b
ModelScope: https://modelscope.cn/collections/stepfun-ai/Step3-VL-10B
Key Points:
🚀 Small Parameters Outperforming Larger Models:
challenges and surpasses 200B-scale models with 10B parameters, achieving an optimal leverage ratio of performance and scale.Step3-VL-10B 🧠 Deep Logic and Perception: Introducing the PaCoRe mechanism and large-scale reinforcement learning, it reaches world-class levels in competition-level mathematics, complex GUI perception, and 3D spatial reasoning.
📱 Edge Intelligence Deployment: Supports high-performance multimodal capabilities on low-computing-power devices, providing a strong foundation for "active understanding and interaction" in smartphones and industrial embedded devices.
