Recently, the Apple research team launched the latest multimodal AI model, UniGen1.5, marking a significant breakthrough in image processing technology. The model is not only capable of understanding images but also generating and editing them. These three functions have been successfully integrated into one system, significantly improving work efficiency.

Different from traditional methods, UniGen1.5 adopts a unified framework that can complete image understanding, generation, and editing simultaneously. Researchers point out that this integrated design allows the model to fully utilize its strong image understanding capabilities when generating images, thus providing higher quality visual output.

image.png

In terms of image editing, UniGen1.5 innovatively introduced the "editing instruction alignment" technology. This technology requires the model to first generate a detailed text description based on the original image and instructions to capture the user's editing intent, rather than directly modifying the image. This "think before drawing" approach effectively improves the model's understanding and accuracy in executing complex modification requests.

Additionally, UniGen1.5 has made significant progress in reinforcement learning. The research team designed a unified reward system that can be applied to both image generation and editing training. This mechanism overcomes the inconsistency of quality standards in editing tasks, ensuring the model maintains high performance when handling various visual tasks.

In multiple industry standard tests, UniGen1.5 demonstrated strong competitiveness. In the GenEval and DPG-Bench tests, the model achieved scores of 0.89 and 86.83 respectively, far exceeding other popular models such as BAGEL and BLIP3o. In the specialized image editing test ImgEdit, UniGen1.5 scored 4.31, surpassing the open-source model OminiGen2 and matching some proprietary closed-source models like GPT-Image-1.

Although UniGen1.5 performs well, researchers also recognize that there is still room for improvement in certain areas. For example, the model tends to make errors when generating text in images. Additionally, in specific editing scenarios, the model may cause drift in the main features, such as changes in fur texture and color of animals. In the future, the Apple team will continue to work on optimizing these issues.

Paper: https://arxiv.org/abs/2511.14760

Key Points:  

🌟 UniGen1.5 is the latest multimodal AI model from Apple, integrating image understanding, generation, and editing functions.  

🛠️ The model improves the accuracy of image editing through the "editing instruction alignment" technology, effectively capturing user intent.  

📊 In industry tests, UniGen1.5 shows significant advantages over other popular models, demonstrating strong competitiveness.