AI Daily: Kuaishou's Keling AI Fully Integrates with DeepSeek-R1; Baidu Releases Ernie 4.5 and X1 Large Models; Xiaomi's Large Model Team Tops Audio Inference MMAU Leaderboard

Welcome to the 【AI Daily】column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with the hottest AI news, focusing on developers and helping you understand technology trends and innovative AI product applications.

New AI Products Learn More: https://top.aibase.com/

1. Kuaishou's Keling AI Fully Integrates with DeepSeek-R1; DeepSeek Inspiration Version Launched

Kuaishou's Keling AI recently fully integrated with DeepSeek-R1, bringing significant convenience to users in video and image generation. DeepSeek-R1 allows users to easily translate inspiration into professional prompts, lowering the creative barrier and improving efficiency. Furthermore, the DeepSeek Inspiration version works with Keling AI's inspiration library function, helping users better control video details, enabling even ordinary users to create high-quality content. These innovations maintain Keling AI's leading position in the industry.

【AiBase Summary:】
🌟 Kuaishou's Keling AI fully integrates with DeepSeek-R1, helping users translate inspiration into professional prompts.
🔥 Keling AI continues to iterate and upgrade, further lowering the creative barrier after integrating DeepSeek-R1.
🎬 DeepSeek Inspiration version and the "Inspiration Library" work together to enhance users' control over video details.

2. Baidu Releases Wenxin 4.5 and X1 Large Models; Significantly Reduced Prices Attract Attention

Baidu's newly launched Wenxin 4.5 and X1 large models mark significant advancements in multi-modal understanding and logical reasoning. Wenxin 4.5, with its significant price advantage and superior performance, surpasses GPT-4.5, attracting the attention of many developers. X1 focuses on Chinese knowledge question answering and literary creation, possessing strong reasoning capabilities and multi-modal functions.

【AiBase Summary:】
💡 Wenxin 4.5 is Baidu's first native multi-modal large model, surpassing GPT-4.5 in performance, with API call prices only 1% of the latter's.
🧠 Wenxin large model X1 focuses on Chinese knowledge question answering and logical reasoning, possessing long reasoning chains and multi-modal capabilities, able to understand and generate images.
💰 The input and output prices of Wenxin 4.5 and X1 are highly competitive, marking Baidu's strong layout in the large model field.

3. Xiaomi's Large Model Team Tops the Audio Reasoning MMAU Leaderboard, Inspired by DeepSeek-R1

Xiaomi's large model team has made significant progress in the field of audio reasoning, using reinforcement learning algorithms to successfully increase the model's accuracy to 64.5%, ranking among the top in the internationally authoritative MMAU evaluation leaderboard. The team's research shows that the real-time feedback mechanism of reinforcement learning is more effective in model training, and they have open-sourced the relevant technology, promoting further research in academia and industry.

【AiBase Summary:】
🔍 Xiaomi's large model team achieved a breakthrough in audio reasoning using reinforcement learning algorithms, achieving 64.5% accuracy.
📈 The MMAU evaluation set is an important standard for audio reasoning capabilities; the current human expert accuracy is 82.23%.
💡 Research results show that the real-time feedback mechanism of reinforcement learning is more effective for model training; future research still needs in-depth exploration.
Details link: https://github.com/xiaomi-research/r1-aqa

4. DingTalk Launches AI Customer Service Assistant, Automatically Integrates with Company Websites and Official Accounts

On March 17, 2025, DingTalk launched its AI customer service assistant, aimed at improving the efficiency of enterprise customer service. This function can automatically integrate with company websites and official accounts, supporting multi-round conversations, accurately understanding user needs, and providing professional responses. Since its launch, over 700 companies have integrated it, providing 7x24-hour online service with fast response times and multi-platform deployment, greatly facilitating communication between companies and users.

【AiBase Summary:】
💡 The AI customer service assistant automatically integrates with websites and official accounts, enhancing enterprise service capabilities.
🛠️ With only three steps of configuration, companies can quickly launch the AI assistant, simplifying knowledge system construction.
🌐 Supports multi-platform deployment, allowing companies to provide services to users through multiple channels.

5. Image Effect Conversion Technology LBM: Remove Unwanted People with One Click, and Adjust Lighting

LBM (Latent Bridge Matching) is an image processing tool developed by the gojasper team that efficiently achieves image effect conversion. It not only has powerful object removal capabilities, allowing users to easily remove unnecessary elements from photos, but also flexibly adjusts lighting to create the desired atmosphere. LBM's innovative concept lies in the operation of latent space, making image editing simpler and more efficient, suitable for photography enthusiasts and professionals.

【AiBase Summary:】
🖌️ LBM has powerful object removal capabilities; users can remove interfering elements from photos with a single click, simplifying the image editing process.
☀️ The tool supports lighting adjustment; users can create sunny effects in photos taken on cloudy days, enhancing the visual appeal of the photos.
🔧 LBM excels in various image conversion tasks such as normal and depth estimation, demonstrating its broad application potential and scalability.
Details link: https://top.aibase.com/tool/lbm

6. Anthropic to Release Harmony Feature: Seamlessly Integrate AI Assistant with Local Files

Anthropic is developing a new feature called Harmony, aimed at integrating local file directories into Claude's work environment. This innovation will allow users to interact more smoothly with files, with the AI assistant directly reading, indexing, and analyzing the contents of the directory. Harmony not only supports file analysis and modification but also provides keyword-based search functionality, showcasing the powerful potential of AI coding assistants.

【AiBase Summary:】
✅ The Harmony feature will allow users to seamlessly access local files, enhancing AI interaction capabilities.
🔍 Claude successfully identified multiple code security vulnerabilities in testing, demonstrating its powerful analytical capabilities.
🧭 Anthropic is also developing the Compass feature, which may support in-depth research and information integration.

7. Open-Source Image Upscaling Model Thera: Enhances Image Clarity, Making Blur a Thing of the Past

Thera is an open-source super-resolution model developed by ETH Zurich and the University of Zurich, capable of enhancing image clarity at any magnification. It not only restores life to blurry photos but also, through a built-in physical observation model, reduces image distortion and presents more natural details.

【AiBase Summary:】
✨ Thera supports super-resolution scaling at any scale; users can customize the magnification factor to flexibly meet various needs.
🔍 Built-in physical observation model simulates the real image formation process, reducing distortion and presenting more realistic details.
🌍 As an open-source project, Thera is provided under the Apache-2.0 license to promote technology sharing and development, providing pre-trained models for easy user access.
Details link: https://top.aibase.com/tool/thera

8. Google Gemini 2.0 Flash's Image Watermark Removal Feature Raises Copyright Concerns

Google's newly launched Gemini 2.0 Flash model has sparked controversy over its ability to remove image watermarks, especially concerning content from well-known image libraries like Getty Images. While the model excels in image generation and editing, its lack of usage restrictions raises copyright concerns. Gemini 2.0 Flash's functionality seems more powerful, but under US copyright law, removing watermarks without consent may still be considered illegal.

【AiBase Summary:】
🚫 Gemini 2.0 Flash can remove image watermarks, a powerful feature but sparking copyright controversy.
💬 Other AI models like Claude 3.7 Sonnet and GPT-4o refuse to remove watermarks, considering it unethical and potentially illegal.
⚖️ Under US copyright law, removing watermarks without the original owner's consent is usually considered illegal; Google has not responded promptly to the questions.

9. Cohere Releases AI Model Command A: Efficient Operation with Two GPUs, Reducing Enterprise Deployment Costs by 50%

Cohere's Command A model breaks down the traditional barriers of high-performance AI with its low hardware requirements of only two GPUs and cost savings of up to 50%. Its 111 billion parameter design combined with an optimized Transformer architecture allows enterprises to enjoy extra-long context windows and multilingual support when handling complex tasks.

【AiBase Summary:】
💻 The Command A model only requires two GPUs for efficient operation, significantly reducing the hardware requirements for enterprises.
🌍 Supports up to 23 languages and regional dialects, helping enterprises expand into global markets.
💰 Private deployment costs are reduced by up to 50%, bringing significant financial advantages to enterprises.
Details link: https://huggingface.co/CohereForAI/c4ai-command-a-03-2025

10. The First Domestic Agent Development Framework! Cangjie Community Releases Cangjie Magic, Native Support for All Platforms Including HarmonyOS!

Cangjie Magic is an innovative agent development framework based on Huawei's self-developed Cangjie programming language, aimed at reshaping the way agents are developed. The framework provides comprehensive agent lifecycle management through a unique Agent DSL architecture, native support for the MCP communication protocol, and an intelligent scheduling engine.

【AiBase Summary:】
🛠️ The unique Agent DSL architecture implements declarative programming for agent modeling, improving development efficiency.
🌐 Native support for the MCP communication protocol ensures efficient communication and collaboration between agents.
📱 Plans to implement agent calling capabilities for Android and iOS in the third quarter, expanding mobile application scenarios.
Details link: https://gitcode.com/Cangjie-TPC/CangjieMagic

11. OpenAI Executive Predicts: AI Will Surpass Human Programmers by the End of 2025

In a recent podcast, OpenAI's Chief Product Officer, Kevin Vill, stated that artificial intelligence is expected to surpass human programmers by the end of 2025, especially in coding benchmark tests. He highlighted the rapid progress of AI coding models and mentioned that advanced models from Anthropic and OpenAI are driving the automation of coding. With improved reasoning capabilities, AI's performance in the programming field is constantly improving, and in the future, almost all code may be generated by AI.

【AiBase Summary:】
🌟 AI is expected to surpass human programmers by the end of 2025, becoming a better coder.
💻 Advanced models from Anthropic and OpenAI are driving coding automation; in the future, almost all code may be generated by AI.
🚀 OpenAI's upcoming new models are steadily rising in competitive coding rankings, marking AI's continued progress in the programming field.

AI Daily: Kuaishou's Keling AI Fully Integrates with DeepSeek-R1; Baidu Releases Ernie 4.5 and X1 Large Models; Xiaomi's Large Model Team Tops Audio Inference MMAU Leaderboard

Related Recommendations

The financial large model market has grown by 90% in a year, and Baidu Intelligent Cloud once again holds the top position

Samsung Joins Hands with Baidu Cloud to Launch Galaxy AI for Significant Registration!

Google Launches New Feature in My Ad Center: Automatically Disclosing Generative AI Ad Information

Microsoft Tests New Version of Word for iPad: Deep Integration with Copilot AI Assistant to Assist in Document Editing

Google Photos Launches AI Video Mixing Feature: Powered by Gemini Omni, Focus on Cinematic Editing in Seconds