On January 30, 2026, SenseTime officially open-sourced its first Agentic VLM model that supports the deep integration of dynamic visual reasoning and text-image search - SenseNova-MARS. The model comes in two versions: 8B and 32B. By simulating the logic of "detective investigation," it has achieved a breakthrough for AI from mere "understanding" to autonomous "execution."
Performance Leap: Exceeding GPT-5.2 on Multiple Benchmarks
In the latest industry benchmark tests, SenseNova-MARS demonstrated an impressive performance:
Search Reasoning Leader: It topped the MMSearch (core evaluation for text-image search) with 74.27 points, significantly surpassing GPT-5.2's 66.08 points.
Detailed Search Leadership: It scored 54.43 points on HR-MMSearch (high-definition detail search evaluation), creating a gap with mainstream closed-source models.
Multi-dimensional Capability Validation: On multiple authoritative visual understanding benchmarks such as FVQA and InfoSeek, it achieved SOTA (state-of-the-art) results among open-source models.
Core Black Technology: Collaborating with Tools Like Humans
The unique feature of SenseNova-MARS lies in its "autonomous planning" capability, which can automatically solve complex long-chain tasks involving "detail identification + information retrieval + logical reasoning":
Image Detail Cropping: It can focus on small details that account for less than 5% of the image (such as logos on race suits) and automatically zoom in for analysis.
Dynamic Text-Image Search: As soon as it identifies an object or person, it automatically matches relevant global information, such as equipment models or industry data.
Multi-hop Deep Reasoning: It no longer struggles with tasks that require "first zooming in, then identifying, and finally checking background information," demonstrating strong "intuition for using tools."
Training Secret: "Tailored Teaching" Dual-phase Evolution
SenseTime research team enhanced the model's logical chain through a two-phase training process:
First Phase (Building the Foundation): Using an automated data synthesis engine to create a "high-difficulty case library," allowing AI to learn basic multi-hop search logic, ensuring it starts with real and complex scenarios.
Second Phase (Practicing in Real Situations): Introducing the BN-GSPO algorithm for reinforcement learning, similar to training a detective, smoothing fluctuations through a reward mechanism, enabling the model to maintain stable progress when handling various problems.
Embracing Open Source: Supporting Global Developers
Currently, SenseTime has fully open-sourced the model, code, and dataset of SenseNova-MARS. Developers can directly download it via Hugging Face and jointly explore the endless possibilities of embodied intelligence and autonomous agents.
