Apple has announced a remarkable AI research breakthrough. The study demonstrates that through model fine-tuning with direct feedback from professional designers, generative AI can achieve a qualitative leap in interface design (UI) tasks. Surprisingly, the small parameter model Qwen3-Coder, optimized by this method, has successfully surpassed the currently leading GPT-5 in terms of logic and aesthetics in UI design.

image.png

For a long time, AI-generated interfaces have been in an awkward situation of "functional but not beautiful." Apple's research team found that traditional scoring feedback was too crude to convey complex design logic. To address this, they invited 21 senior designers and collected 1,460 improvement logs containing annotations, hand-drawn sketches, and direct modification suggestions. By converting these high-quality "visual feedback" into a reward model, AI can learn real-world aesthetic standards and layout logic.

image.png

Experimental data shows that this fine-tuning method is highly efficient. Qwen3-Coder achieved significant performance improvements with just 181 sketch feedback samples. The study also revealed an interesting fact: due to the high subjectivity of design, pure text scoring performed extremely poorly in evaluation consistency. Feedback presented through "sketches" or "direct modifications," however, significantly reduced subjective bias, increasing evaluation consistency from 49% to as high as 76%.