At a critical stage where AI agents are evolving toward complex, multi-step tasks, the open-source community welcomes a new rising star. The Jan team has officially released Jan-v2-VL-Max—a 30-billion parameter multimodal large model specifically designed for long-term, high-stability automated execution scenarios. It has surpassed Google's Gemini 2.5 Pro and DeepSeek R1 in key metrics, injecting strong momentum into the open-source agent ecosystem.

image.png

Focusing on the "error accumulation" problem, it effectively tackles "off-track" issues in multi-step execution

Currently, multimodal agents often face the "error accumulation" issue when executing long sequences of operations (such as automated UI operations or cross-application task flows), where small deviations in intermediate steps lead to significant task deviations later on. Jan-v2-VL-Max introduces a LoRA-based RLVR (Reinforced Long-horizon Vision-Language Reasoning) technology, significantly improving the consistency and interference resistance of the reasoning chain while maintaining the capabilities of the Qwen3-VL-30B base model, ensuring accurate execution even after dozens of steps.

Top in the "Hallucination-Decay Return" test, defining a new benchmark for agents

The model performs exceptionally well in the new evaluation benchmark "Hallucination-Decay Return (HDR)," which specifically measures how quickly the return rate decreases due to hallucinations or logical breakdowns as the task length increases. Jan-v2-VL-Max maintains high return stability in long-sequence tasks, surpassing Gemini 2.5 Pro and DeepSeek R1, verifying its reliability in real-world automation scenarios.

image.png

Ready to use, supporting efficient local deployment

To lower the entry barrier, the Jan team provides:

- A web-based interactive interface, allowing users to upload images, input instructions, and test multi-step automation processes;

- A vLLM-optimized local deployment solution that supports efficient operation on consumer-grade GPUs, making it easy for developers to integrate into their own agent systems.

A breakthrough in "long thinking" for the open-source community

Although Jan-v2-VL-Max achieves only a "minor improvement" in long-sequence execution compared to the base model, in the agent field, every 1% increase in stability represents a qualitative change in usability. This achievement marks that the open-source community is moving from "single-step response" to "long-term planning," providing a practical open-source foundation for high-value scenarios such as UI automation, robot control, and multi-tool collaboration.

AIbase believes that when the competition among large models shifts from "who is smarter" to "who is more reliable," the Jan team's focus on execution stability is timely. As agents are about to become the main interaction paradigm of AI, Jan-v2-VL-Max may become a key piece for developers to build "never-failing" intelligent agents.