AI Daily: GLM-4.5V Vision Reasoning Model by Zhipu Open-Sourced; DAMO Academy Opens Three Core Technologies in Embodied Intelligence; 360 ZhiNao Launches the Light-IF Series Models

Welcome to the "AI Daily" section! This is your guide to exploring the world of artificial intelligence every day. Every day, we present you with the latest content in the AI field, focusing on developers to help you understand technical trends and innovative AI product applications.

Fresh AI products Click for more information:https://www.aibase.cn/

1. Zhipu GLM-4.5V Open-Sourced: The Best 100B-Level Visual Reasoning Model Globally

Zhipu announced the release and open-sourcing of GLM-4.5V, the best-performing open-source visual reasoning model at the 100B level globally. This marks another important exploratory achievement in the company's journey toward Artificial General Intelligence (AGI).

[AiBase Summary:]
🤖 GLM-4.5V has a total of 106B parameters and achieves SOTA performance on 41 visual multimodal benchmarks
🎯 It possesses full-scenario visual reasoning capabilities, including image reasoning, video understanding, GUI tasks, etc.
💡 New "thinking mode" switch to balance efficiency and effectiveness
💰 API price as low as 2 yuan per M tokens input, 6 yuan per M tokens output

2. Alibaba DAMO Academy Opens Three Core Technologies of Embodied Intelligence

At the World Robot Conference, Alibaba DAMO Academy announced the open-sourcing of three self-developed core technologies: the VLA model RynnVLA-001-7B, the world understanding model RynnEC, and the robot context protocol RynnRCP. This initiative aims to promote compatibility between data, models, and robot hardware, enabling a complete workflow for embodied intelligence development.

[AiBase Summary:]
🚀 Open-sourcing three core technologies: VLA model, world understanding model, robot context protocol
🔗 RynnRCP enables a complete workflow from sensor data to robot action execution
👁️ RynnVLA-001 learns human operational skills from first-person perspective videos
🌍 RynnEC comprehensively analyzes scene objects from 11 dimensions without relying on 3D models
Details: https://github.com/alibaba-damo-academy/RynnRCP

3. Apple to Upgrade Apple Intelligence to GPT-5, Driving Intelligent Development of Siri and Writing Tools

Apple recently announced that it plans to upgrade the ChatGPT core model within Apple Intelligence to the latest GPT-5 version in the upcoming iOS 26, iPadOS 26, and macOS Tahoe 26 system updates.

[AiBase Summary:]
🚀 Apple will upgrade the ChatGPT model to GPT-5 in the iOS 26 system update, enhancing the performance of Siri, writing tools, and visual intelligence.
🚀 The new version will introduce multilingual real-time translation and screen content analysis features, enhancing device capabilities in cross-language communication and information processing.
🚀 Apple will also open device-side APIs for the first time, supporting third-party application integration, providing low-latency and high-privacy AI experiences.

4. Gaode Maps Fully Integrates Tongyi Large Model, Launches the First AI-Native Agent for Maps

Alibaba Group's Gaode Maps jointly launched the world's first AI-native map, introducing the "Xiao Gao Teacher" agent, achieving end-to-end voice interaction and complex task reasoning navigation.

[AiBase Summary:]
🎙️ Built-in "Xiao Gao Teacher" agent supports multimodal interaction such as audio/text and supports full-duplex voice with interruption at any time.
🧠 Based on a Qwen large model pre-trained on 36 trillion token, it realizes deep spatial semantic understanding and efficiently coordinates nearly 100 internal tools.
🗂️ Jointly launched a complex POI reasoning agent, capable of decomposing multiple constraints and integrating real-time information to provide accurate recommendations and navigation.
🔍 Relying on the self-developed DeepResearch framework, it has complete agent capabilities such as planning, reflection, and tool invocation.

5. Yu Shu Technology to Participate in the First World Humanoid Robot Games, Hardware to Be Used by Multiple Teams

Yu Shu Technology will participate in the first World Humanoid Robot Games from August 14 to 17. Yu Shu revealed that, in addition to its own team, multiple teams on the competition field will use Yu Shu's robot hardware, but they will be paired with their own self-developed algorithms.

[AiBase Summary:]
🤖 In addition to Yu Shu's own team, multiple teams will use Yu Shu's robot hardware on the competition field, but they will be paired with their own developed algorithms.
🏟️ This event brings together domestic top humanoid robot companies such as Tiantong, Accelerate Evolution, Songyan Power, Fuliye, Xinghai Map, and 280 teams from 16 countries including the United States, Germany, Australia, Brazil, and Japan.
🔧 Yu Shu's participation not only demonstrates its strength in humanoid robot hardware but also reflects the extensive application and competitiveness of its equipment in an open ecosystem.

6. Claude AI Launches Historical Conversation Memory Function, Supports Multiple Background Switching

Anthropic launched the "Memory Function" for Claude AI, which automatically remembers and reuses background information from user historical conversations, enabling seamless cross-session connections and supporting multiple background isolation switches, currently available only to paying users.

[AiBase Summary:]
🔄 Set up independent backgrounds for different projects, switch between work/life scenarios with one click, and maintain contextual continuity.
💰 Initially available to Claude Max, Team, and Enterprise paying users, Pro version to follow, free users not yet available.
⚙️ Users can manually enable or view memory content in "Settings - Search and Reference Chat."
🤖 Unlike ChatGPT's manual presetting, Claude uses an automatic extraction mechanism, offering a more "seamless" experience but slightly less controllability.

7. 360 ZhiNao Launches Light-IF Series Models, Significantly Improving Complex Instruction Following Ability

360 ZhiNao released the Light-IF series models, using a "preview-self-check reasoning + information entropy control" framework to tackle "lazy reasoning," leading in four benchmarks, achieving supermodel performance with small parameters, and all models are open-sourced.

微信截图_20250812101839.png

[AiBase Summary:]
🎯 Innovative Light-IF Framework: Difficulty-aware instruction generation → Zero-RL reinforcement → Reasoning mode filtering → Entropy maintenance cold start → Entropy adaptive regularization, significantly suppressing "only repeating without checking" lazy reasoning.
📈 Authoritative evaluation rankings: Light-IF-32B achieved 0.575 on SuperCLUE, leading the second place by 13.9 percentage points; Light-IF-1.7B small model outperformed Qwen3-235B-A22B and other super-large models.
🔓 Full open source: Model weights will be gradually uploaded to Hugging Face, cold-start datasets and training code will be published on GitHub, and jointly launched the Chinese evaluation benchmark SuperCLUE-CPIFOpen with SuperCLUE.

8. ByteDance Launches Video Subtitle Seamless Removal Solution Based on DiT Large Model

ByteDance launched the world's first "video subtitle seamless removal" solution based on the DiT large model, achieving pixel-level repair, multilingual support, and one-click "removal-translation-lip sync," helping short dramas go global and e-commerce cross-border globalization.

微信截图_20250812103606.png

[AiBase Summary]
🎞️ Two core components: DiT video subtitle removal large model + font-level segmentation model, precise pixel-level repair, bidding farewell to mosaic/fuzzy/flashing.
🌐 Multilingual support: Breaking the limits of Chinese and English, covering minority languages, forming a "removal-translation-lip sync" one-stop closed loop.
⚙️ Engineering implementation: Verified with millions of data sets, success rate 100%; distributed shot calculation, efficiency improved several times.
Details: https://console.volcengine.com/vod/

9. Kunlun Wanwei Releases Open-Source World Model Matrix-Game2.0: Real-Time Generation of High-Continuity Minutes-Level Videos

Kunlun Wanwei released the world's first open-source interactive world model Matrix-Game2.0, generating minutes-level 25fps high-continuity videos in real time, supporting pure visual-driven interaction without language prompts, already applied in scenes like GTA and Minecraft.

[AiBase Summary]
🚀 Open-source launch: The first general scenario real-time long sequence open-source world model in the industry, continuously iterated and fully open.
📹 Minutes-level generation: 25fps continuous video, significant improvement in physical laws and scene semantics understanding, directly applicable to games, films, and VR.
🎮 Visual-driven interaction: Abandoning language prompts, 3D causal VAE + multi-modal diffusion Transformer responds frame by frame to user actions, adapting across domains to various styles of scenes.

10. Kunlun Wanwei Open Sources Matrix-3D Large Model: Generate High-Quality Panoramic Videos from a Single Image

Kunlun Wanwei open-sources Matrix-3D: generate 360° navigable 3D panoramic videos from a single image, with consistent trajectory and geometric accuracy, fully open code and dataset.

[AiBase Summary]
🌐 Single-image generation of 3D world: Breaking the dependence on multiple perspectives, directly producing high-quality panoramic videos and exploreable 3D scenes from a single image.
🎥 Trajectory-guided consistency: Using Mesh rendering images to drive diffusion models, ensuring spatiotemporal consistency under camera trajectories, reducing artifacts and occlusions.
⚙️ Dual-path reconstruction: Super-resolution + structure optimization for meticulous work; Transformer feedforward network for rapid inference, balancing quality and efficiency.
Details: https://github.com/SkyworkAI/Matrix-3D

AI Daily: GLM-4.5V Vision Reasoning Model by Zhipu Open-Sourced; DAMO Academy Opens Three Core Technologies in Embodied Intelligence; 360 ZhiNao Launches the Light-IF Series Models

Related Recommendations

Breaking the Barrier of Multimodal Switching! Google Brings Native Computer Operations into Gemini 3.5 Flash

New Turning Point in the Computational Power Battle: OpenAI Teams Up with Broadcom to Launch the First Self-Developed Inference Chip, Jalapeño

Samsung Electronics Globally Promotes ChatGPT and Codex to Enhance Employee Work Efficiency

Getty Images Collaborates with OpenAI, Official Image Library Content Integrated into ChatGPT Search Scenarios

US regulations on Anthropic come at the right time, as the European version of OpenAI takes off and advocates for reducing reliance on the US