China's artificial intelligence field has once again achieved a technical breakthrough. On the evening of April 28, SenseTime officially unveiled and open-sourced its latest research achievement - the "SenseNova U1" series. This series is defined as a "native understanding generation unified model," with its core value lying in breaking away from the traditional approach of "assembling" different modules in multimodal models.

Discard the "assembly" logic and achieve a unified architecture

For a long time, multimodal large models mostly adopted a mode of combining visual modules and language modules. This "assembled" design often led to information loss when passing between different representation spaces. The SenseNova U1 series, based on SenseTime's independently developed NEO-unify architecture in March of this year, successfully achieved deep unification of multimodal understanding, reasoning, and generation within a single model framework.

This shift in technical approach has built a unified representation space. This means that the model can achieve more efficient collaboration between language and visual signals when processing information. In practical performance, this architecture not only enhances the model's depth of perception for complex information but also significantly improves the naturalness and accuracy of generation capabilities.

The lightweight version is open-sourced first, with promising future potential

To promote the common development of the open-source community, SenseTime has first released the lightweight version of SenseNova U1, called SenseNova U1 Lite. This version includes two model specifications, aiming to meet the performance balance needs in different application scenarios. Currently, the relevant code and files of the model have been officially launched on the corresponding open-source platforms.