Recently, the TSAIL Lab at Tsinghua University has collaborated with Shengshu Technology to release a new open-source video generation acceleration framework TurboDiffusion. This breakthrough technology framework has successfully increased the inference speed of end-to-end diffusion generation by 100 to 200 times while ensuring no loss in video generation quality.

image.png

AIbase learned that, in order to achieve optimal generation efficiency, the framework integrates SageAttention and SLA (sparse linear attention mechanism). These technologies significantly reduce the computational cost when processing high-resolution video content. In addition, the development team introduced the rCM (temporal step distillation) technology, which greatly reduces the number of sampling steps in the diffusion process, allowing video generation to achieve extremely low computational latency while maintaining visual consistency.

According to the test data published on GitHub, TurboDiffusion's acceleration performance is astonishing. On a single RTX5090 GPU, generating a 5-second video used to take 184 seconds, but with this framework, it can now be completed in just 1.9 seconds. For larger parameter models, the improvement is even more significant: a 720P video generation task that originally took about 1.2 hours is now compressed into just 38 seconds, far exceeding current market-level acceleration solutions.

image.png

Currently, TurboDiffusion is open source and provides multiple model weights for users to download. The team has provided quantized and non-quantized optimization schemes for consumer-grade GPUs (such as RTX4090/5090) and industrial-grade GPUs (such as H100). This means that both individual creators and enterprise users can significantly improve the production efficiency of AI videos through this tool.

github:https://github.com/thu-ml/TurboDiffusion

Key Points:

  • Performance Leap: The open-source framework from Tsinghua University accelerates AI video generation up to 200 times, enabling the production of a 5-second video in just 1.9 seconds on an RTX5090 GPU.

  • 🛠️ Core Technology: By using SageAttention, sparse linear attention mechanism, and temporal step distillation technology, it significantly reduces computing requirements without compromising image quality.

  • 🌐 Comprehensive Compatibility: The framework has released model weights and provides quantization optimization schemes for GPUs with different memory capacities, greatly lowering the barrier for high-performance AI video generation.