Article Content

Researchers Launch LPM1.0 Model: Achieving Real-Time Interactive Digital Human Video from a Single Image

Published in Latest AI News

Time :Apr 14, 2026

Read :4minute

Recently, researchers officially released the LPM1.0 model. This research project aims to generate videos of people performing actions such as speaking, listening, and singing using a single reference image in real time. The core breakthrough of LPM1.0 lies in its multimodal processing capabilities, which can synchronize and integrate text, audio, and image inputs to generate dynamic scenes with precise lip synchronization, delicate facial expressions, and natural emotional transitions. The model supports direct integration with mainstream voice AI platforms such as ChatGPT and Doubao, thereby upgrading traditional voice conversations into real-time interactive experiences with visual feedback.

On the technical level, LPM1.0 introduces a "multi-granularity identity conditioning" technology, which extracts details from reference materials with multiple angles and expressions, without requiring the model to generate complex features such as teeth, wrinkles, or side contours on its own. This significantly enhances cross-style processing capabilities. Whether it's photo-realistic human faces, animations, or 3D game characters, the model can achieve instant driving without secondary training. In addition, the model supports streaming transmission technology, maintaining system stability even when generating videos up to 45 minutes long.

In terms of interaction logic, LPM1.0 can accurately identify three dialogue states: when listening, it generates reactive expressions such as nodding or shifting gaze; when speaking, it drives body movements and lip movements based on audio; when idle, it produces natural leisure behaviors according to text instructions. Project manager Zeng Ailing pointed out that LPM1.0 is not only suitable for real-time conversations but also supports offline audio-driven video generation, providing technical redundancy for podcasts and film and television production.

Although it shows strong application potential, the development team emphasized that LPM1.0 is currently only a research project, and there are no plans to publicly release code or weights at this stage. Researchers admitted that there is still a qualitative gap between the generated videos and real footage, and the deepfake risks inherent in the technology cannot be ignored. The significance of this research lies in clarifying the future direction of AI system evolution: moving from single logical interaction to a multidimensional interaction form with emotional response, eye contact, and visual embodiment.

Related Recommendations

Human Game Experience Upgraded! Free and Open-Source AI Chess Engine Maia 3 Officially Released

The Maia Chess team released the open-source chess engine Maia 3, trained on 250 million human games, with an Elo rating of approximately 1800 points, an increase of nearly 300 points from the previous version. The engine is free and open-source, supports local deployment, and focuses on simulating human decision-making patterns, promoting the popularization of AI chess engines.

May 26, 2026

1,223.9k

Microsoft Webwright Open Source: Web Agent Evolves from Click-Based to Code-Based

Microsoft Research has open-sourced the Webwright web agent framework, abandoning the traditional 'screenshot/DOM click' model and adopting a 'terminal-first' design, allowing AI models to directly write Playwright code and execute Bash commands in the terminal to efficiently complete complex web tasks. The framework's code is only about 1000 lines, with an extremely simple core architecture, emphasizing terminal operations over abstract interfaces.

May 26, 2026

328.8k

OpenAI Codex Adds Screen Lock Execution Feature: Mac Desktop Proxy Enters a New Era of Automation

The OpenAI desktop proxy Codex has added the "Screen Lock Execution" feature, allowing remote commands to control local applications when the Mac is locked or the screen is off, breaking the traditional restrictions that required unlocking and keeping the screen on. The core breakthrough enables developers to execute long and complex engineering tasks, such as large-scale code compilation or GUI automation testing, without needing to monitor the computer.

May 26, 2026

232.2k

Face Off Against Ansopek! Google Launches Secure AI Tool to Fight the Code Battle

At the I/O Developer Conference, Google announced that it has invited specific experts to conduct API testing on its CodeMender code security AI agent. Developed by the DeepMind team, this tool has attracted significant attention since its debut in October last year. This external launch aims to fix global vulnerabilities.

May 20, 2026

197.5k

Alibaba Launches Zhenwu M890128 Card Ultra-Node Server with Hundred-Nanosecond Latency to Support the Agentic Era

At the 2026 Alibaba Cloud Summit, Alibaba launched a 128-card ultra-node server based on Pingtouge's Zhenwu M890 AI chip, featuring a self-developed ICN Switch 1.0 interconnect chip, achieving ultra-low communication latency of hundreds of nanoseconds. This server integrates 128 chips into a supercomputer through an ultra-node architecture, aiming to address the extreme computing power challenges of the intelligent agent era, and will focus on serving massive AI agents in the future.

May 20, 2026

205.2k

Intelligent Future, Your Artificial Intelligence Solution Think Tank

English 简体中文繁體中文にほんご