AI Daily: New version of GPT-4o launched; Wall-facing AI open-source mobile version 'GPT-4V'; Huawei unveils new 3D digital human framework EmoTalk3D; Alibaba launches Olympic Moments poster workflow

Welcome to the AI Daily section! Here is your daily guide to exploring the world of artificial intelligence. Every day, we bring you the hottest topics in the AI field, focusing on developers, helping you understand technological trends and discover innovative AI product applications.

Discover fresh AI products click here: https://top.aibase.com/

1. Developers rejoice! GPT-4o new version launched, API faster and cheaper

OpenAI recently introduced a new structured output feature designed to ensure the model's output strictly adheres to the JSON schema provided by developers, enhancing the reliability and matching accuracy of the output. This feature provides a crucial foundation for developers to build reliable applications, streamlining the development process and helping developers more easily create outstanding applications.

AiBase Highlights:

🌟 The structured output feature makes model outputs more reliable, following the JSON schema provided by developers.

🔍 The new model gpt-4o-2024-08-06 achieved a perfect 100% score in evaluations with complex JSON schemas.

🔧 Python and Node SDKs have been updated to support structured outputs, simplifying the developer workflow.

Details link: https://openai.com/index/introducing-structured-outputs-in-the-api/

2. Mianbi Intelligence Open-sources MiniCPM-V2.6, a "GPT-4V" that can run on mobile phones

MiniCPM-V2.6 is an edge-side multimodal AI model with only 8B parameters but has achieved SOTA results in single-image, multi-image, and video understanding under 20B, fully benchmarking against GPT-4V. The model has comprehensively surpassed core capabilities such as single-image, multi-image, and video understanding at the edge, with extremely high pixel density and operational efficiency, supporting multiple languages and reasoning frameworks.

AiBase Highlights:

🚀 MiniCPM-V2.6 has achieved SOTA results in single-image, multi-image, and video understanding under 20B, fully benchmarking against GPT-4V.

💡 The model has extremely high pixel density and operational efficiency, achieving high efficiency on edge devices.

🌐 MiniCPM-V2.6 supports multiple languages and reasoning frameworks, enabling smooth expansion from single-image to multi-image and video through OCR capabilities.

Details link: https://github.com/OpenBMB/MiniCPM-V HuggingFace: https://huggingface.co/openbmb/MiniCPM-V-2_6llama.cpp, ollama, vllm

3. Huawei and Fudan University jointly create a new 3D digital human framework, EmoTalk3D: lifelike expressions of joy, anger, sorrow, and happiness

The research team from Nanjing University, Fudan University, and Huawei Noah's Ark Lab has created the EmoTalk3D framework, addressing the challenges of multi-view consistency and emotional expressiveness. They proposed a new method for synthesizing controllable emotional digital humans, constructing a mapping framework from speech to geometry and appearance, and establishing the EmoTalk3D dataset.

AiBase Highlights:

💥 Proposed a new method for synthesizing controllable emotional digital humans.

🎯 Constructed a "from speech to geometry and appearance" mapping framework.

👀 Established the EmoTalk3D dataset and is preparing to open it.

Details link: https://nju-3dv.github.io/projects/EmoTalk3D/

4. Alibaba Cloud PAI Artlab adds Olympic Highlights Poster Workflow

Alibaba Cloud PAI Artlab's ComfyUI has added an Olympic Highlights Poster workflow, allowing users to generate personalized Olympic-themed posters in just three steps. Users need to register on the Alibaba Cloud official website and complete real-name authentication, then visit the PAI ArtLab platform, claim free resources, and unlock more poster designs through ComfyUI's Olympic workflow.

AiBase Highlights:

🌟 Users can generate personalized Olympic-themed posters in just three steps.

🚀 Need to upload image data, load and fine-tune the AI model, adjust the generation content's Prompt, save the workflow, and generate the JSON file.

💡 Other users can quickly generate posters through the generated JSON file, enabling sharing and exchange.

Product entry: https://x.sm.cn/5hd9PfM

Details here: https://www.aibase.com/zh/news/10857

5. Tencent Yuanbao AI Assistant launches long-form reading, supports up to nearly 500,000 characters of input

Tencent Yuanbao AI Assistant has introduced a long-form reading feature, allowing users to enter deep reading mode after uploading professional content, providing an overview of core content, modular analysis, and summary charts, helping users quickly understand key information. Utilizing Tencent's混元大模型 processing capabilities, it supports up to nearly 500,000 characters of input, generating content rich in images and text. Users can evaluate paper quality, view professional charts, and review the reading content offline. Tencent's混元大模型 has been fully open-sourced, demonstrating outstanding multimodal understanding capabilities.

AiBase Highlights:

📚 The long-form reading feature provides a deep reading mode, an overview of core content, modular analysis, and summary charts.

🔍 Utilizing Tencent's混元大模型 processing capabilities, it supports up to nearly 500,000 characters of input, generating content rich in images and text.

💡 Users can evaluate paper quality, view professional charts, and review the reading content offline.

6. The Dark Side of the Moon Kimi Open Platform: Context Cache Storage Fees Reduced by 50%

The Kimi Open Platform has announced a 50% reduction in context cache storage fees, providing users with more economical services. Context caching is an efficient data management technology that can improve system efficiency and save time resources.

AiBase Highlights:

🔑 Context cache storage fees reduced by 50%, from 10 yuan/1M tokens/min to 5 yuan/1M tokens/min.

⏳ Context caching is an efficient data management technology that can pre-store large amounts of data that may be frequently requested, improving system efficiency.

💡 Context caching is especially suitable for scenarios with frequent requests and repeated references to large initial contexts, reducing long-text model costs and improving efficiency.

7. Figure Inc. releases ultra-powerful physical ChatGPT robot Figure02

Figure Inc.'s latest Figure02 robot marks a significant breakthrough in AI technology, signaling a new era in human-computer interaction. The robot has undergone comprehensive innovation in hardware and software, featuring flexible hand operations, powerful dialogue visual capabilities, and 3 times the computational reasoning power.

AiBase Highlights:

🤖 The Figure02 robot is a significant breakthrough in AI technology, signaling a new era in human-computer interaction.

🔊 Voice dialogue function, advanced vision system, revolutionary hand design are its core features.

💡 Figure02 integrates OpenAI's large model, combining voice commands and visual information for deep reasoning.

8. AI-designed wearable from Yiwu manufacturing: AI-designed wearable takes Paris Olympics by storm

This article introduces a story of a wearable designed by AI and produced in Yiwu that sparked discussions on the streets of Paris, showcasing the new vitality injected into Yiwu's manufacturing industry. The wearable products designed through AI technology have caused a sensation in Paris, proving Yiwu's innovative strength and market acumen.

AiBase Highlights:

🔥 AI-designed wearable takes Paris by storm, becoming a new favorite in the fashion world, injecting vitality into Yiwu manufacturing.

💡 LumiNail is a foolproof AI wearable design product, simple yet powerful, improving design efficiency and injecting creative vitality.

🚀 Yiwu merchants have begun to try AI-assisted production, with over 10,000 merchants using AI technology to optimize operations, opening up new development directions.

9. Shanghai Artificial Intelligence Laboratory releases new version of the Sheng·Pu language series model InternLM2.5

The Shanghai Artificial Intelligence Laboratory released a new version of the Sheng·Pu language series model, InternLM2.5, at the WAIC Scientific Frontier Main Forum on July 4, 2024. This version has seen comprehensive enhancements in reasoning capabilities in complex scenarios, supports ultra-long contexts, and can autonomously conduct internet searches to integrate information. The model parameter versions include 1.8B, 7B, and 20B, catering to different application scenarios and developer needs.

AiBase Highlights:

⚙️ InternLM2.5 released three parameter versions of the model, including 1.8B, 7B, and 20B, meeting different application scenario needs.

🔍 InternLM2.5 has iterated on multiple data synthesis technologies, significantly enhancing the model's reasoning capabilities, especially achieving an accuracy rate of 64.7% on the MATH evaluation set.

🛠️ InternLM2.5 has achieved seamless integration with downstream reasoning and fine-tuning frameworks, including the XTuner fine-tuning framework, LMDeploy reasoning framework, and other community frameworks.

Details link: https://internlm.intern-ai.org.cn

10. Israeli company launches open-source speech recognition model Whisper Medusa with 50% faster speed

aiOla's Whisper Medusa open-source speech recognition model has made a significant breakthrough in processing speed, 50% faster than OpenAI's Whisper model, attracting widespread attention in the industry. This innovation will bring profound impacts to the development of speech recognition technology, opening up new possibilities for the application of artificial intelligence in the field of speech recognition.

AiBase Highlights:

⚙️ The core innovation of Whisper Medusa lies in the introduction of a multi-head attention mechanism, allowing the model to predict ten tokens at a time, significantly improving speech prediction speed and generation runtime.

🔍 Whisper Medusa has not sacrificed performance while increasing speed, with the main system built on Whisper, ensuring the model's accuracy and stability.

🎓 aiOla uses weakly supervised machine learning methods to train Whisper Medusa, further improving the model's learning efficiency and accuracy.

Details link: https://github.com/aiola-lab/whisper-medusa

11. New traffic code? AI video crash goes viral by accident: a bizarre scene attracts 20 million views

AI-generated content has permeated our lives, but recently, a video of an AI crash has become a hot topic on the internet, attracting nearly 20 million views, revealing people's complex attitudes towards AI technology. This video showcases the失控的一面 of AI image generation technology, eliciting strong reactions from netizens. The public's attitude towards AI technology is undergoing subtle changes, requiring a sense of humor and an open mindset.

AiBase Highlights:

AI Daily: New version of GPT-4o launched; Wall-facing AI open-source mobile version 'GPT-4V'; Huawei unveils new 3D digital human framework EmoTalk3D; Alibaba launches Olympic Moments poster workflow

Related Recommendations

New Method for Panoramic Image Generation PanoFree: Generate Multi-View Images Without Fine-Tuning

AI Assessments Can Save Teachers Time While Improving Student Performance

OpenAI ChatGPT Application Revenue Hits New High with Net Income of 28 Million USD in July

Reddit User Experiment: GTP-4o Defeats Gemini 1.5 Pro in Chess

OpenAI to Participate in Hardware Developer Opal's New $60 Million Financing Round