Apple recently published an important paper showcasing their latest advancements in artificial intelligence. Unlike the diffusion models or autoregressive models widely adopted in the industry, Apple chose a path that has been largely overlooked —— Normalizing Flows technology. The core of this technology is to transform real-world data (such as images) into structured noise through mathematical transformations, and then restore it into clear image samples.

image.png

The greatest advantage of normalizing flows is their ability to accurately calculate the probability of generated images, which many diffusion models cannot achieve. This makes normalizing flows particularly important in tasks where probability is crucial. However, the development cost of this technology is relatively high, and early models often suffered from blurriness and lack of detail.

In this study, Apple introduced a new normalizing flow model called TarFlow (Transformer AutoRegressive Flow). The working principle of this model is to divide an image to be generated into multiple small blocks and generate corresponding pixel values block by block. The generation of each block depends on the content of the already generated parts, which effectively avoids quality loss caused by compressing the image into a fixed vocabulary.

However, TarFlow still faces challenges when generating high-resolution images. Therefore, Apple proposed an enhanced version called STARFlow (Scalable Transformer AutoRegressive Flow). This model works in the "latent space," first generating a compressed representation of the image, and then using a decoder to upscale it. This approach not only improves generation efficiency but also avoids predicting a large number of pixel values, focusing first on the overall structure of the image.

Additionally, STARFlow has made significant improvements in handling text prompts. It no longer relies on an embedded text encoder but can call existing language models, such as Google's small language model Gemma, allowing for more flexible processing of user language instructions. Through this method, STARFlow can focus on the generation and optimization of image details, further improving the quality of the generated images.

Apple's exploration in AI-generated image technology marks their continuous efforts in technological innovation and provides new ideas and directions for future image generation technology.

Key Points:   

🌟 Apple uses "Normalizing Flows" technology to develop a new AI image generation model, different from traditional diffusion models.  

🖼️ TarFlow model generates images by splitting them into blocks, avoiding quality loss caused by compression.  

🚀 STARFlow works in the latent space and supports calling existing language models to optimize text prompt processing.