Zhizhu AI Launches Next-Generation Video Generation Model CogVideoX, Available for Free Experience via 'Qingying'

ZhipuAI has launched the next-generation video generation model CogVideoX, marking another significant advancement in the company's development of multimodal technology.

WeChat Screenshot_20240726111755.png

The core technical features of CogVideoX include:

3D Variational Autoencoder (3D VAE): ZhipuAI's proprietary structure compresses raw video data to 2% of its original size, reducing training costs and difficulties. Combined with the 3D RoPE positional encoding module, it enhances the ability to capture inter-frame relationships in the temporal dimension, establishing long-term dependencies in videos.

End-to-end video understanding model: Enhances the model's understanding of text and adherence to instructions, ensuring that the generated videos more closely meet user needs and can handle extremely long and complex prompt instructions.

Text, time, and space integrated 3D transformer architecture: Innovatively designed the Expert Block to align text with video modal spaces and optimized inter-modal interactions through the Full Attention mechanism.

The CogVideoX model is now available on ZhipuAI's PC, mobile app, and mini-program platforms. Users can experience AI-driven text-to-video and image-to-video services for free via the "Qingying" (Ying) feature. Qingying's main features include rapid generation, efficient instruction following, content coherence, and flexible scene scheduling.

Additionally, the Zhipu Big Model Open Platform bigmodel.cn has also deployed "Qingying," allowing businesses and developers to use its features through API calls. ZhipuAI has validated the effectiveness of Scaling Law in the field of video generation and will continue to expand data and model sizes, research new model architectures, to more efficiently compress video information and more comprehensively integrate text and video content.

Experience URL: https://top.aibase.com/tool/qingying-ai-shipinshengchengfuwu

The latest open-source video model CogVideoX v1.5 from Zhipu AI, features "New Clear Shadow" in 10 seconds 4K

Today, the Zhipu Technology team released its latest video generation model, CogVideoX v1.5, and made it open source. This version is another significant advancement in the CogVideoX series launched by the Zhipu Technology team since August. The update greatly enhances video generation capabilities, including support for 5-second and 10-second video lengths, 768P resolution, and the ability to generate 16 frames. Additionally, the I2V (Image-to-Video) model supports arbitrary aspect ratios, further improving the understanding of complex semantics.

CogVideoX v1.5 Open Source AI Video Generation Model Supports 5/10 Second Video Generation

Beijing Zhizhu Huazhang Technology Co., Ltd. announced that its CogVideoX series models have launched the latest version - CogVideoX v1.5, which is now open source. Since its release in early August, this series of models has become a leader in the video generation field due to its industry-leading technology and developer-friendly features. The new version, CogVideoX v1.5, has been significantly upgraded, enhancing video generation capabilities and now supports 5/10 second, 768P, 16-frame video generation.

Zhipu Releases Next-Generation Foundation Model GLM-4-Plus and Upgrades Video Call Feature of Qingyan APP

Beijing Zhipu Huazhang Technology Co., Ltd. announced a series of significant technological updates on August 29, 2024, including the release of a new generation foundation model and new application services. At the KDD2024 conference, Zhipu introduced new foundation models including the language model GLM-4-Plus, the text-to-image model CogView-3-Plus, the image/video understanding model GLM-4V-Plus, and the video generation model CogVideoX. These models have achieved international leading standards in their respective fields.

Higher Quality, Better Visual Effects! Zhipu Open Source CogVideoX-5B Video Generation Model

The domestic open source video generation model CogVideoX-5B has been officially released in the Mota ModelScope community, significantly improving the quality and visual effects of video generation. Based on the large-scale DiT model, this model utilizes a 3D causal variational autoencoder and expert Transformer technology, achieving spatio-temporal joint modeling through 3D-RoPE positional encoding and a 3D full attention mechanism. The use of progressive training techniques allows the model to generate long videos with distinct motion features, coherence, and high quality.

Zhizhu AI Launches Next-Generation Video Generation Model CogVideoX, Available for Free Experience via 'Qingying'

Related Recommendations

Domestic Sora Has Arrived! Zhipu Qingying 2.0 Can Generate 1080P Videos with a Single Sentence and Also Comes with AI Sound Effects

The latest open-source video model CogVideoX v1.5 from Zhipu AI, features "New Clear Shadow" in 10 seconds 4K

CogVideoX v1.5 Open Source AI Video Generation Model Supports 5/10 Second Video Generation

Zhipu Releases Next-Generation Foundation Model GLM-4-Plus and Upgrades Video Call Feature of Qingyan APP

Higher Quality, Better Visual Effects! Zhipu Open Source CogVideoX-5B Video Generation Model