At a critical stage where AIGC moves from "free creation" to "precise control," the Xiaohongshu AIGC team has open-sourced its new layout-controllable image generation framework, InstanceAssemble, specifically designed to address Layout-to-Image tasks with high density, multiple objects, and complex spatial relationships. The framework significantly improves spatial alignment accuracy and semantic consistency while maintaining a minimal parameter increase (as low as 0.84%), offering an industrial-level solution for high-demand scenarios such as e-commerce, design, and gaming.

Cascade Modeling + Assemble-Attention, Solving the "Multiple Objects Stacked" Challenge
Traditional Layout-to-Image models often encounter issues like object misalignment, overlapping, or semantic mismatch when dealing with complex layouts such as "10 product icons + text labels + background layers." InstanceAssemble innovatively adopts a cascade two-stage architecture:
1. Semantic Understanding Stage: Analyzing the semantic relationship between text descriptions and layout instructions;
2. Spatial Assembly Stage: Dynamically modeling the relative positions, occlusion relationships, and hierarchical structures between instances through the self-developed Assemble-Attention mechanism, ensuring that each element is "where it should be."
Experiments show that in scenarios such as dense product displays, multi-character illustrations, and UI generation, InstanceAssemble significantly outperforms existing methods in object positioning accuracy and edge clarity.

Ultra-lightweight Adaptation, Compatible with Mainstream Base Models
To reduce deployment barriers, the framework uses an ultra-lightweight LoRA adapter:
- Adapting Stable Diffusion3-Medium requires only 3.46% additional parameters;
- For Flux.1 model, it's as low as 0.84%.
This means users don't need to retrain large models, and can retain the powerful generation capabilities of the base model while flexibly injecting layout control abilities, supporting multimodal instructions such as text, reference images, and bounding boxes.
Self-built DenseLayout Benchmark, Promoting Evaluation Standardization
To accurately measure layout alignment quality, Xiaohongshu also released the DenseLayout evaluation dataset and the LGS (Layout Grounding Score) explainable metric. LGS quantifies the generation results from three dimensions: position accuracy, scale matching, and semantic consistency, solving the problem of traditional metrics (such as IoU) being inaccurate in dense scenarios.
AIbase believes that the release of InstanceAssemble marks that AIGC is moving from "looking good" to "placing accurately." When AI can not only generate beautiful images but also "place" each element according to precise layout instructions from designers, AIGC truly has the capability to integrate into professional production processes. Xiaohongshu's open-source initiative not only empowers community creators but also drives the entire industry toward controllable, reliable, and commercializable generative AI.
Paper Link: https://arxiv.org/abs/2509.16691
Project Page: https://github.com/FireRedTeam/InstanceAssemble
