SALMONN Framework: Expanding General Auditory Capabilities of Large Language Models


On the eve of Chinese New Year in 2026, Alibaba opened-source the new generation large model Qwen3.5-Plus, whose performance rivals that of Gemini3Pro, becoming the world's strongest open-source large model. The model adopts a revolutionary underlying architecture, with 397 billion parameters but only 17 billion activated, surpassing the Qwen3-Max with trillions of parameters at a smaller scale. The deployment memory usage is reduced by 60%, and the long context reasoning throughput is increased by 19 times. The API cost is as low as 0.8 yuan per million Tokens, just 1/18th of Gemini3Pro.
Kuaishou's Keling AI 3.0 revolutionizes video creation with multimodal input/output, enabling deep narrative generation and pioneering multi-image/video subject reference for precise control, empowering everyone to become a director.....
Andrej Karpathy used AI to automatically score 930 Hacker News discussions from 2015, demonstrating AI's ability to analyze historical public discourse and prompting reflection on future online discussion quality.....
Starcloud company successfully trained the nano-GPT model and completed the Gemma model inference using satellites equipped with NVIDIA H100 GPUs in space, marking an important advancement in the development of space data centers.
December 6th to 7th, the 10th Advanced Forum on Language Services was held at Guangzhou University. During the event, the Cantonese Corpus Construction and Large Model Evaluation Lab launched the AI-DimSum Multimodal Cantonese Corpus Platform, aiming to break through the digital challenges of Cantonese as a low-resource language. The platform is centered around the needs of digital Chinese construction and the digitalization of the Greater Bay Area culture, building a multimodal corpus to promote the protection and development of Cantonese in the era of artificial intelligence.