ByteDance Open Sources Bernini Framework: Achieving Perfect Unity in Video Generation and Precise Editing

ByteDance's commercialization technology team has officially open-sourced a new video generation and editing framework called Bernini. The framework's core focuses on a "first understand, then generate" collaborative mechanism, aiming to effectively solve industry pain points such as image instability and frame flickering caused by traditional models' inability to accurately understand complex instructions.

Bernini currently ranks among the top in ByteDance's internal testing. Its inference code and the second-stage model Bernini-R have been officially released, and the full version with all features will be fully open in the near future.

Separating Semantics and Rendering

Bernini innovates in its workflow by splitting the entire process into two independent parts: "semantic planning" and "visual rendering." The system first uses a multimodal large model planner to deeply analyze input materials and sketch out a "semantic sketch," which is then converted into stable and continuous video frames by the renderer.

Thanks to this clear division of labor, the framework demonstrates high practical value in controllable editing. Users can not only make natural changes to weather, season, and visual style in the scene through simple commands, but also achieve precise control over camera perspective, focus, and subject actions.

Rich Visual Reference Dimensions

In addition to traditional text control, Bernini also supports using images and videos as visual references, greatly improving the consistency of creation. In video editing scenarios, it can accurately embed specific materials or posters into target areas, ensuring no boundary breakage or distorted perspective.

In new video generation scenarios, the model supports single-image and multi-angle reference generation, and can evolve from keyframes to continuous shots. To solve the problem of models easily getting confused when connecting multiple visual segments, the team introduced a dedicated position encoding mechanism to ensure clear distinction between reference materials and output targets.

Project: https://bernini-ai.github.io/

Report: Zhiyuan Robotics Said to Be Striving for IPO with a Target Valuation of $20 Billion

Zhiyuan Robot, valued at ~$20B, is advancing its IPO with CITIC Securities as sponsor; projected 2026 revenue: RMB 4B. At WAIC 2026, it unveiled five new robots—Yuanzheng A3Ultra, Jingling G2Max, Lingxi X2EDU, Linjiedian dexterous hand, and Kutuo riding robot—embodying the "Three Intelligences in One" framework.....

Shenzhen Science Multimodal Foundation Model Makes Debut in Shanghai: 11 Billion Parameters Integrate Six Types of Scientific Data, One Model Understands DNA to Weather Fields

Shanghai Academy of AI for Science unveiled 'Shenzhen', a multimodal foundation model, at WAIC 2026. Named after Journey to the West, it serves as a compact, open super brain for multidisciplinary research, enabling diverse scientific tasks. It invites researcher validation and co-construction, and powers the previously launched 'Dasheng' scientific agent.....

Wang He, Founder of Galaxy General-Purpose Robot: The ChatGPT Moment of Embodied Intelligence Will Arrive by 2028!

Galaxy General Robot CTO Wang He predicted at the 2026 World AI Conference that embodied intelligence will achieve a major breakthrough before 2028, with performance comparable to ChatGPT. The foundational model, trained on massive data, can reach a 70%-80% success rate on tasks not specifically trained for, similar to early digital models.....

ByteDance Open Sources Bernini Framework: Achieving Perfect Unity in Video Generation and Precise Editing

Separating Semantics and Rendering

Rich Visual Reference Dimensions

Related Recommendations

Report: Zhiyuan Robotics Said to Be Striving for IPO with a Target Valuation of $20 Billion

Tencent Hyra-1.0 Launches Research Intelligent Agent, Unifying AI Development and Scientific Discovery in a Single Framework

Shenzhen Science Multimodal Foundation Model Makes Debut in Shanghai: 11 Billion Parameters Integrate Six Types of Scientific Data, One Model Understands DNA to Weather Fields

Wang He, Founder of Galaxy General-Purpose Robot: The ChatGPT Moment of Embodied Intelligence Will Arrive by 2028!

Shen Dou of Baidu: Each Employee Is Given a Monthly Allowance of 1000 Yuan to Freely Experience Mainstream Large Models - Forcing the Adoption of AI in the Office Is Hard to Yield Results