SALMONN Framework: Expanding General Auditory Capabilities of Large Language Models


Academic team releases OpenSeeker-v2, breaking industry dominance in deep search. It achieves top-tier agent capabilities with high-quality data, bypassing resource-heavy pipelines (pretraining, CPT, SFT, RL), offering a new paradigm for LLMs.....
At the Beijing Auto Show, Ruan Chong, a former core researcher of DeepSeek's multimodal technology, appeared as the chief scientist of Yuanrong Qixing, marking the company's shift in autonomous driving technology. CEO Zhou Guang stated that multimodal large models achieved breakthroughs in early 2026, and the advantages of the autonomous driving route based on large models are significant, surpassing previous technologies.
Xiaohongshu open-sources the RelaX reinforcement learning training engine, designed specifically for multimodal and agent scenarios, supporting unified processing of text, images, audio, and video, accurately aligning with the development trends of the AI industry.
ByteDance's Volcano Engine opened public API applications for the Seedance2.0 multimodal video generation model on April 2, transitioning from limited testing to broader availability. The model supports text, image, audio, and video inputs, enabling character consistency, director-level shot control, and physical simulation.....
ZhiXiang Future launches HiDreamClaw, a multimodal native app integrated into its creative platform, now available overseas. It features strong compatibility and combines proprietary and advanced models, advancing the company's AI creative ecosystem.....