Recently, Xiaohongshu (Little Red Book) and Fudan University jointly released their latest research achievement in the field of layout-to-image generation - InstanceAssemble. This technology aims to solve the long-standing "layout difficulty" problem in AI painting, achieving precise image generation from simple to complex scenes through an innovative mechanism. It is reported that the related paper has been accepted by the top AI conference NeurIPS 2025.

image.png

In the current AI painting field, although "text-to-image" generation has become mature, AI often struggles to place objects accurately according to user-defined spatial constraints (such as bounding boxes or segmentation masks), frequently resulting in misalignment or semantic disconnection. The emergence of InstanceAssemble marks a new stage in AI painting - "precise composition." This technology is based on the mainstream diffusion transformer architecture, with its core being the "instance assembly attention" mechanism.

When using this tool, users only need to provide the specific position (bounding box) and content description of each object, and the AI can generate image content that meets the requirements in the specified area. Whether it's a simple scene with just a few objects or a complex scene with dense instances, InstanceAssemble can maintain high layout accuracy and semantic consistency.

Notably, InstanceAssemble adopts a lightweight adaptation scheme. It does not require retraining the entire large model, but only needs a minimal number of additional parameters to adapt to existing models. For example, adapting to Stable Diffusion3-Medium requires about 3.46% additional parameters, while adapting to the Flux.1 model requires as low as 0.84%.

To better evaluate the technical performance, the research team also launched a benchmark dataset called "Denselayout" containing 90,000 instances and a new evaluation metric. Currently, InstanceAssemble has been open-sourced on GitHub, and both the code and pre-trained models are available for developers to download. It is expected to play an important role in industries such as design, advertising, and content creation.

github:https://github.com/FireRedTeam/InstanceAssemble

Key Points:

  • 🎯 Precise Layout Control: Through the "instance assembly attention" mechanism, AI can strictly generate objects according to user-defined positions, supporting complex layouts ranging from sparse to dense.

  • Low Adaptation Cost: With a lightweight design, no full model retraining is required, and only 0.84% to 3.46% additional parameters are needed to adapt to mainstream models like Flux.1 or SD3.

  • 🔓 Comprehensive Open Source Sharing: The project is open-sourced on GitHub, providing pre-trained models, and a new benchmark dataset Denselayout has also been released to promote industry standardization in evaluation.