ByteDance's commercialization technology team has officially open-sourced a new video generation and editing framework called Bernini. The framework's core focuses on a "first understand, then generate" collaborative mechanism, aiming to effectively solve industry pain points such as image instability and frame flickering caused by traditional models' inability to accurately understand complex instructions.

Bernini currently ranks among the top in ByteDance's internal testing. Its inference code and the second-stage model Bernini-R have been officially released, and the full version with all features will be fully open in the near future.

image.png

Separating Semantics and Rendering

Bernini innovates in its workflow by splitting the entire process into two independent parts: "semantic planning" and "visual rendering." The system first uses a multimodal large model planner to deeply analyze input materials and sketch out a "semantic sketch," which is then converted into stable and continuous video frames by the renderer.

Thanks to this clear division of labor, the framework demonstrates high practical value in controllable editing. Users can not only make natural changes to weather, season, and visual style in the scene through simple commands, but also achieve precise control over camera perspective, focus, and subject actions.

Rich Visual Reference Dimensions

In addition to traditional text control, Bernini also supports using images and videos as visual references, greatly improving the consistency of creation. In video editing scenarios, it can accurately embed specific materials or posters into target areas, ensuring no boundary breakage or distorted perspective.

In new video generation scenarios, the model supports single-image and multi-angle reference generation, and can evolve from keyframes to continuous shots. To solve the problem of models easily getting confused when connecting multiple visual segments, the team introduced a dedicated position encoding mechanism to ensure clear distinction between reference materials and output targets.

Project: https://bernini-ai.github.io/