AI Era for Voice Dubbing: Tongyi Lab Opensources Fun-CineForge, First to Solve the Challenge of Multi-Person Dialogue

Traditional AI voice dubbing often encounters bottlenecks when dealing with high-standard scenarios such as films and animations, as it struggles to match complex emotional outbursts and precise lip movements. To address this pain point, Tongyi Lab officially released and open-sourced the first film-level multi-scenario multimodal large model for voice dubbing—Fun-CineForge.

Breaking "Audio-Visual Disconnection": Four Strict Dimensions of Collaboration

Unlike traditional models that rely solely on text-to-speech, Fun-CineForge aims to overcome four core challenges in film production:

Lip Sync: Achieve a high level of consistency between synthesized speech and the mouth movements in the video.
Emotional Expression: Combine facial features and instruction descriptions to give the voice a human-like emotional depth.
Voice Consistency: Maintain a stable voice for specific characters in complex multi-character dialogues.
Time Alignment: Even when the speaker is obscured or not in the frame, the speech can be inserted at millisecond-level precise time points.

Core Technology: Introducing the "Time Modality" and High-Quality Dataset

The technical breakthrough of Fun-CineForge lies in its unique "data + model" integrated design:

CineDub High-Quality Dataset: Tongyi Lab also open-sourced the CineDub automated dataset construction process. This process uses a chain-of-thought error correction mechanism, reducing the transcription error rate of Chinese and English texts to around 1% - 2%, and significantly lowering the speaker separation error rate to 1.2%.
Four-Modality Fusion Architecture: The model introduces "time modality" for the first time, combining visual (lip shape and expression), text (dialogue emotion), and audio (voice reference) for joint modeling. This allows the model to achieve precise synchronization even in complex scenes where faces are not visible.

Outstanding Performance: Filling the Gap in Multi-Person Dialogue Dubbing

Experimental data shows that Fun-CineForge significantly outperforms baseline models such as DeepDubber-V1 in word error rate (WER/CER), lip sync (LSE-C/D), and voice similarity. Notably, the model achieves precise support for duet and multi-person dialogue scenes for the first time, demonstrating strong robustness in video clips within 30 seconds.

GitHub: https://github.com/FunAudioLLM/FunCineForge
HuggingFace: https://huggingface.co/FunAudioLLM/Fun-CineForge
ModelScope: https://www.modelscope.cn/models/FunAudioLLM/Fun-CineForge/

Google Photos Launches AI Wardrobe Assistant, One-Click Extraction of Pieces and Support for Virtual Fitting

Google Photos introduces a new AI feature that transforms messy photos into an organized digital wardrobe. This feature uses visual recognition technology to automatically extract clothing images from photos and create smartly categorized folders by type, reviving outfits from old photos and helping users easily manage their personal clothing library.

China's AI data volume is expected to reach 199.48 EB by 2025, with a year-on-year growth of 42.86%

According to data from the National Data Administration, by 2025, the total volume of AI training and inference data in China is expected to reach 199.48 EB, representing a year-on-year growth of 42.86%. Among this, inference data has exceeded training data for the first time, reaching 101.34 EB, indicating the wide application of AI across industries. At the same time, the data volume generated by system software and AI is expected to reach 26.92 ZB.

Research Report: More Than One-Third of New Websites in 2025 Will Be AI-Generated, Semantic Similarity Increased by 33%

A joint report by Imperial College London, the Internet Archive, and Stanford University on April 28, 2026, reveals that by mid-2025, about 35% of new global website content is AI-generated, up from nearly zero before ChatGPT's launch in late 2022. Analyzing web samples from 2022 to 2025 over 33 months, the study confirms AI's deep integration into the online ecosystem, reshaping internet discourse.....

AI Browser Comet Officially Launched, Fully Supports Multi-Tasking on iPad

Perplexity's AI browser Comet launches an iPad version, fully compatible with iPadOS, supporting multi-window browsing and multi-tasking, and deeply integrating mainstream AI models such as OpenAI and Anthropic to enhance the intelligent internet experience and strengthen user interaction with AI chatbots.

DeepSeek V4 Chinese Large Model Evaluation: Achieving the Glory of Domestic First Again!

In the evaluation of DeepSeek V4 Chinese large model, the Pro version regained the top position in the country with a score of 70.98, followed closely by the Flash version with 68.82 points. The evaluation covers six dimensions: mathematical reasoning, scientific reasoning, code generation, intelligent agent task planning, instruction following, and hallucination control, marking a new breakthrough in domestic open-source model technology.