SenseTime Enters the Intelligent Agent Arena: New All-Modal Base Model is Ready to Launch

Today, as large-scale artificial intelligence model technologies continue to evolve rapidly, the industry competition has quietly shifted from simple "language processing" to more practically valuable "agent" fields. Recently, a major announcement came from SenseTime's shareholder meeting: the company is currently working on developing the industry's first native multimodal agent foundation with a unified core of "understanding, generation, and action." This product is positioned to directly compete with OpenAI's GPT-Image 2.

The breakthrough in agent technology lies in enabling AI to transition from passive answering to active execution. The foundation system developed by SenseTime aims to deeply integrate multimodal processing capabilities with complex task execution logic. This means that in the future, the foundation will not only be able to deeply understand user intent, but also complete more complex digital world interaction tasks independently through the loop of generation and action, thus demonstrating stronger practicality in real-world application scenarios.

According to related disclosures, this cutting-edge technology research and development is progressing smoothly, and SenseTime plans to officially release this significant foundation in the second half of 2026.

Industry analysts believe that SenseTime's increased investment in the multimodal agent foundation is a key step in its large model strategy. During this critical window period when the AI industry is transitioning from "foundation models" to "Agent (agent) ecosystems," manufacturers who can break down the barriers between understanding, generation, and action will have a better chance to occupy a central position in the future intelligent production and service systems. With the development and implementation of this foundation, SenseTime is expected to further consolidate its early advantages in underlying algorithm architecture and intelligent applications.

SenseTime Secretly Developing Multimodal Model U1Pro: Led by Lin Dahua, Expected to Launch Internal Testing in July, Targeting OpenAI

SenseTime is secretly developing the multimodal large model U1Pro, targeting design scenarios, led by Chief Scientist Lin Dahua. The model belongs to the "Ri Ri Xin" family, aiming to compete with OpenAI's GPT-Image2, emphasizing long-range logic and thinking capabilities, and expected to launch internal testing and commercial use in July.

SenseTime Open Sources SenseNova U1, Achieving a Multimodal Native Unified Architecture

SenseTime has released and open-sourced the SenseNova U1 series of models, based on its self-developed NEO-unify architecture, achieving deep unification of multimodal understanding, reasoning, and generation, marking a transition from an integrated approach to a native unified one. The architecture discards the modular design, eliminating visual encoders and variational autoencoders, thereby improving model efficiency and performance.

SenseTime Launches the Industry's First Multi-Series Generative AI Agent Seko2.0, Domestic AI Chip Successfully Integrates the Full Multimodal AIGC Pipeline

SenseTime launches Seko2.0, the world's first AI agent for multi-scene video generation, enabling continuous narratives from single clips. It ensures high consistency in characters, scenes, and style, advancing plot coherence and visual uniformity, scalable for short videos, ads, and education, powered by its proprietary multimodal model.....

New GoT-R1 Multimodal Model Released: Making AI Drawing Smarter, the New Era of Image Generation!

Recently, a research team from the University of Hong Kong, The Chinese University of Hong Kong, and SenseTime has released a groundbreaking framework - GoT-R1. This new multimodal large model significantly enhances the semantic and spatial reasoning capabilities of AI in visual generation tasks by introducing reinforcement learning (RL), successfully generating high-fidelity and semantically consistent images from complex text prompts. This advancement marks another leap in image generation technology. Currently, although existing multimodal large models have made significant progress in generating images based on text prompts

SenseTime Enters the Intelligent Agent Arena: New All-Modal Base Model is Ready to Launch

Related Recommendations

SenseTime Secretly Developing Multimodal Model U1Pro: Led by Lin Dahua, Expected to Launch Internal Testing in July, Targeting OpenAI

SenseTime Open Sources SenseNova U1, Achieving a Multimodal Native Unified Architecture

SenseTime Launches the Industry's First Multi-Series Generative AI Agent Seko2.0, Domestic AI Chip Successfully Integrates the Full Multimodal AIGC Pipeline

Former ByteDance AI Executive Liao Qian Leaves to Become a Vendor: Secures Millions of Dollars in Half a Month, Aiming to Make Marketing Agencies Operate Like 007

New GoT-R1 Multimodal Model Released: Making AI Drawing Smarter, the New Era of Image Generation!