ByteDance Launches StoryMem: Equipping AI Videos with Long-Term Memory, Completely Solving the Problem of Character Consistency

Concerned with the long-standing issues of "character distortion" and "environmental flickering" in the AI video generation field, ByteDance and the Nanyang Technological University research team recently jointly launched an innovative system called StoryMem. This system successfully achieves high consistency in long video cross-scene creation by introducing a mechanism similar to human memory, solving the visual bias problems that models like Sora and Kling often encounter during multi-shot storytelling.

The core logic of StoryMem lies in its unique "hybrid memory bank" design. Researchers pointed out that forcing all scenes into a single model leads to a sharp increase in computational costs, while segmental generation causes loss of context. To address this, StoryMem selectively stores key frames from previous scenes as references. The algorithm uses dual filters, first selecting visual core frames through semantic analysis, then eliminating blurry images through quality checks. When generating new scenes, these key frames are input into the model along with a technique called RoPE (Rotary Position Embedding). By assigning memory frames "negative time indices," the system guides AI to recognize them as "past events," ensuring character images and background details remain stable throughout the story progression.

Notably, StoryMem's implementation is highly efficient. It runs on the LoRa version of Alibaba's open-source model Wan2.2-I2V, adding only about 7 billion parameters to a base model with 140 billion parameters, significantly lowering the training threshold. In the ST-Bench benchmark test containing 300 scene descriptions, StoryMem improved cross-scene consistency by 28.7% compared to the base model and outperformed existing cutting-edge technologies such as HoloCine in aesthetic scores and user preferences.

In addition, the system demonstrates high practical value, supporting users to upload custom photos as "memory start points" to generate coherent stories and enabling smoother scene transitions. Although there are still limitations in handling multiple characters simultaneously and large-scale action transitions, the team has already released weight data on Hugging Face and launched a project page for developers to explore.

Address: https://kevin-thu.github.io/StoryMem/

https://huggingface.co/Kevin-thu/StoryMem

2026 Spring Recruitment Trends: AI Job Demand Surges 14 Times, Large Model R&D Enters the Era of Millions in Annual Salary

In 2026, competition in the large model sector intensifies, shifting from traffic to talent acquisition. AI-related job openings surge 14-fold year-on-year, with top firms offering high salaries, such as ByteDance's 1.28 million yuan annual pay for a 'Large Model Application Architecture Expert', highlighting urgent demand for top talent.....

Anthropic Makes a Major Update! Claude Code Remote Control Feature Launched, Turning Your Phone into a Computer Terminal Powerhouse

Claude Code now features a remote control function, allowing developers to seamlessly take over coding tasks on a computer terminal through their phone or tablet, enabling programming anytime and anywhere. Users just need to enter commands in the terminal to generate a unique link and QR code, and by scanning it, they can continue operating on their mobile device, ensuring no loss of context.

UK AI Unicorn Wayve Secures $1.05 Billion in Funding, Led by SoftBank to Enter the Autonomous Driving Arena

UK autonomous driving startup Wayve completed a $1.05 billion Series C funding round, led by SoftBank, with NVIDIA and Microsoft participating, becoming one of the largest single investments in the AI sector in Europe. The company adopts an 'end-to-end' embodied AI approach, differing from solutions like Tesla that rely on high-precision maps.

After the 8 Billion Yuan Red Envelope, Will There Be a Trend of Unloading Large Models? The AI Application Faces a Life-or-Death Test from Traffic to Retention

During the 2026 Spring Festival, Chinese Internet giants launched an "AI Red Envelope Battle" with a total of 8 billion yuan to capture the market, and AI applications once dominated the app store rankings. However, as the holiday ended, how to improve user retention became the key challenge for AI assistants.

ByteDance Launches StoryMem: Equipping AI Videos with Long-Term Memory, Completely Solving the Problem of Character Consistency

Related Recommendations

2026 Spring Recruitment Trends: AI Job Demand Surges 14 Times, Large Model R&D Enters the Era of Millions in Annual Salary

Anthropic Makes a Major Update! Claude Code Remote Control Feature Launched, Turning Your Phone into a Computer Terminal Powerhouse

UK AI Unicorn Wayve Secures $1.05 Billion in Funding, Led by SoftBank to Enter the Autonomous Driving Arena

After the 8 Billion Yuan Red Envelope, Will There Be a Trend of Unloading Large Models? The AI Application Faces a Life-or-Death Test from Traffic to Retention

Wake-up Words Enter a Three-Way Standoff: Samsung Galaxy S26 Deeply Integrates Perplexity, Marking a New Era of Multiple Intelligent Agents