SenseNova-MARS by SenseTime Open Source: Agentic VLM Empowers AI with Independent Thinking and Action Capabilities

On January 30, 2026, SenseTime officially open-sourced its first Agentic VLM model that supports the deep integration of dynamic visual reasoning and text-image search - SenseNova-MARS. The model comes in two versions: 8B and 32B. By simulating the logic of "detective investigation," it has achieved a breakthrough for AI from mere "understanding" to autonomous "execution."

Performance Leap: Exceeding GPT-5.2 on Multiple Benchmarks

In the latest industry benchmark tests, SenseNova-MARS demonstrated an impressive performance:

Search Reasoning Leader: It topped the MMSearch (core evaluation for text-image search) with 74.27 points, significantly surpassing GPT-5.2's 66.08 points.

Detailed Search Leadership: It scored 54.43 points on HR-MMSearch (high-definition detail search evaluation), creating a gap with mainstream closed-source models.

Multi-dimensional Capability Validation: On multiple authoritative visual understanding benchmarks such as FVQA and InfoSeek, it achieved SOTA (state-of-the-art) results among open-source models.

Core Black Technology: Collaborating with Tools Like Humans

The unique feature of SenseNova-MARS lies in its "autonomous planning" capability, which can automatically solve complex long-chain tasks involving "detail identification + information retrieval + logical reasoning":

Image Detail Cropping: It can focus on small details that account for less than 5% of the image (such as logos on race suits) and automatically zoom in for analysis.

Dynamic Text-Image Search: As soon as it identifies an object or person, it automatically matches relevant global information, such as equipment models or industry data.

Multi-hop Deep Reasoning: It no longer struggles with tasks that require "first zooming in, then identifying, and finally checking background information," demonstrating strong "intuition for using tools."

Training Secret: "Tailored Teaching" Dual-phase Evolution

SenseTime research team enhanced the model's logical chain through a two-phase training process:

First Phase (Building the Foundation): Using an automated data synthesis engine to create a "high-difficulty case library," allowing AI to learn basic multi-hop search logic, ensuring it starts with real and complex scenarios.

Second Phase (Practicing in Real Situations): Introducing the BN-GSPO algorithm for reinforcement learning, similar to training a detective, smoothing fluctuations through a reward mechanism, enabling the model to maintain stable progress when handling various problems.

Embracing Open Source: Supporting Global Developers

Currently, SenseTime has fully open-sourced the model, code, and dataset of SenseNova-MARS. Developers can directly download it via Hugging Face and jointly explore the endless possibilities of embodied intelligence and autonomous agents.

SenseTime NEO Open Source: Achieve Top Multimodal Model Performance with 1/10 of the Data Volume, Ending the Era of Patchwork AI

SenseTime and NTU S-Lab launch open-source multimodal model NEO, achieving deep vision-language integration via architectural innovation. With only 39M image-text pairs (1/10 of similar models), it attains top-tier visual perception without massive data or extra encoders, advancing efficiency and versatility.....

SenseTime Collaborates with Chenghai to Create the First AI + Toy Industrial Base

SenseTime signed a strategic agreement with Chenghai District in Shantou to jointly build the country's first AI + Toy Industrial Base. As a globally renowned toy production base, Chenghai has an annual output value exceeding 50 billion yuan, and will leverage SenseTime's AI technology to achieve intelligent industrial upgrades. The cooperation includes establishing three centers and ten platforms, covering three directions: AI solutions, industry development, and innovative integration, promoting the intelligent transformation of toy design and manufacturing. This collaboration will help Chenghai move from a manufacturing powerhouse to a smart manufacturing powerhouse, creating a Chinese model for intelligent transformation in the global toy industry.

SenseNova-MARS by SenseTime Open Source: Agentic VLM Empowers AI with Independent Thinking and Action Capabilities

Related Recommendations

SenseTime Launches SenseNova-MARS: A New Chapter in Multimodal Autonomous Reasoning

China's First Cloud宇星空 Large Model Released, Aiding Intelligent Urban Planning!

SenseTime NEO Open Source: Achieve Top Multimodal Model Performance with 1/10 of the Data Volume, Ending the Era of Patchwork AI

Sensetime Launches Open-Source Spatial Intelligence Large Model SenseNova-SI

SenseTime Collaborates with Chenghai to Create the First AI + Toy Industrial Base