Recently, the LongCat team under Meituan announced the open source of its latest video generation model - LongCat-Video-Avatar, marking another important breakthrough in virtual human technology. The model performs excellently in long video generation and has multiple core features, attracting widespread attention from developers.

LongCat-Video-Avatar is based on the previous LongCat-Video model and continues the design philosophy of "one model supports multiple tasks," natively supporting functions such as audio-text-to-video (AT2V), audio-text-image-to-video (ATI2V), and video continuation. Compared to the previous product InfiniteTalk, this model has achieved significant improvements in action realism, video stability, and identity consistency, aiming to provide developers with more efficient and practical creative solutions.

QQ20251219-105318.png

One of the core innovations of this model is the use of a training strategy called Cross-Chunk Latent Stitching, which effectively solves the problem of visual quality degradation in long video generation. By replacing features in the latent space, LongCat-Video-Avatar not only eliminates image quality loss caused by repeated decoding but also significantly improves generation efficiency.

In addition, to maintain character consistency in long videos, LongCat-Video-Avatar introduces a reference frame injection mode with position encoding and a Reference Skip Attention mechanism. This innovation keeps the identity semantics stable during the generation process while avoiding common issues like repetitive and stiff actions.

Evaluation on authoritative public datasets such as HDTF, CelebV-HQ, EMTD, and EvalTalker shows that LongCat-Video-Avatar achieves SOTA levels on multiple key metrics, especially excelling in lip-synchronization accuracy and consistency indicators. At the same time, through large-scale human evaluation, the model has received positive feedback in terms of naturalness and realism, demonstrating strong application potential.

QQ20251219-105325.png

The LongCat team stated that LongCat-Video-Avatar is another iteration of their digital human generation technology, aimed at solving practical issues faced by developers in long video generation. The team has always adhered to the open-source philosophy and hopes to continuously optimize and iterate this technology through community participation and feedback.

The release of LongCat-Video-Avatar not only provides broader possibilities for the application of virtual human technology but also opens up new paths for creators in digital content creation. Developers can obtain the model through platforms such as GitHub and Hugging Face to start exploring the "many faces" digital world.

Project Address:

GitHub:

https://github.com/meituan-longcat/LongCat-Video

Hugging Face: 

https://huggingface.co/meituan-longcat/LongCat-Video-Avatar

Project:

https://meigen-ai.github.io/LongCat-Video-Avatar/