Xiaohongshu Open Sources InstanceAssemble! A Lightweight Layout-Controlled Generation Framework, Further Breaking the Accuracy of Complex Multi-Instance Image Generation

At a critical stage where AIGC moves from "free creation" to "precise control," the Xiaohongshu AIGC team has open-sourced its new layout-controllable image generation framework, InstanceAssemble, specifically designed to address Layout-to-Image tasks with high density, multiple objects, and complex spatial relationships. The framework significantly improves spatial alignment accuracy and semantic consistency while maintaining a minimal parameter increase (as low as 0.84%), offering an industrial-level solution for high-demand scenarios such as e-commerce, design, and gaming.

Cascade Modeling + Assemble-Attention, Solving the "Multiple Objects Stacked" Challenge

Traditional Layout-to-Image models often encounter issues like object misalignment, overlapping, or semantic mismatch when dealing with complex layouts such as "10 product icons + text labels + background layers." InstanceAssemble innovatively adopts a cascade two-stage architecture:

1. Semantic Understanding Stage: Analyzing the semantic relationship between text descriptions and layout instructions;

2. Spatial Assembly Stage: Dynamically modeling the relative positions, occlusion relationships, and hierarchical structures between instances through the self-developed Assemble-Attention mechanism, ensuring that each element is "where it should be."

Experiments show that in scenarios such as dense product displays, multi-character illustrations, and UI generation, InstanceAssemble significantly outperforms existing methods in object positioning accuracy and edge clarity.

Ultra-lightweight Adaptation, Compatible with Mainstream Base Models

To reduce deployment barriers, the framework uses an ultra-lightweight LoRA adapter:

- Adapting Stable Diffusion3-Medium requires only 3.46% additional parameters;

- For Flux.1 model, it's as low as 0.84%.

This means users don't need to retrain large models, and can retain the powerful generation capabilities of the base model while flexibly injecting layout control abilities, supporting multimodal instructions such as text, reference images, and bounding boxes.

Self-built DenseLayout Benchmark, Promoting Evaluation Standardization

To accurately measure layout alignment quality, Xiaohongshu also released the DenseLayout evaluation dataset and the LGS (Layout Grounding Score) explainable metric. LGS quantifies the generation results from three dimensions: position accuracy, scale matching, and semantic consistency, solving the problem of traditional metrics (such as IoU) being inaccurate in dense scenarios.

AIbase believes that the release of InstanceAssemble marks that AIGC is moving from "looking good" to "placing accurately." When AI can not only generate beautiful images but also "place" each element according to precise layout instructions from designers, AIGC truly has the capability to integrate into professional production processes. Xiaohongshu's open-source initiative not only empowers community creators but also drives the entire industry toward controllable, reliable, and commercializable generative AI.

Paper Link: https://arxiv.org/abs/2509.16691

Project Page: https://github.com/FireRedTeam/InstanceAssemble

Xiaohongshu Open Sources InstanceAssemble! A Lightweight Layout-Controlled Generation Framework, Further Breaking the Accuracy of Complex Multi-Instance Image Generation

Related Recommendations

China's Generative AI User Base Exceeds 600 Million: Penetration Rate Surpasses 40 Percent, Computing Power Levels Rise to Global Front Rank

Refuse to Ruin Classics! State Administration of Radio, Film, and Television Takes Action: Douyin and Xiaohongshu Remove Thousands of AI-Messed-About Videos

Institution: User enthusiasm for AI applications is high, AIGC app monthly active users increased by more than 200 million

Melodic Expressiveness Far Exceeds Suno: Kuaishou Unveils Mureka V8 Music Large Model

Ali quietly launches AIGC design platform Wuli, supported by the Tongyi Qianwen image model family