The research team from the University of Science and Technology of China and several other domestic universities recently released a new technical framework called "UniCorn." The core goal of this framework is to grant automated image processing systems a special ability: to identify and fix their own defects during the content generation process.
Researchers found that current image recognition and generation systems, although able to understand complex visual information, often show inconsistencies between cognition and expression when converting it into specific images. For example, a system can accurately determine that "the left side is a beach and the right side is a wave," but when generating new images on its own, it often makes basic errors such as reversing the spatial order.

Chinese researchers liken this phenomenon of "understanding but not expressing correctly" to "conduction aphasia" in the medical field—a neurological disorder where patients can understand language but cannot correctly repeat it. To bridge this cognitive gap, the UniCorn framework introduces an innovative collaborative mechanism.
The core idea of UniCorn is that since systems usually have better ability to evaluate image quality than to create images from scratch, this "aesthetic evaluation" standard should be used to guide the generation process. To achieve this, researchers divided the system into three complementary roles within the same operational space, making it take on the tasks of observer, executor, and quality inspector simultaneously.
Through this "role division," the system compares its own cognitive standards while outputting images. Once it detects deviations between the generated image and the original logic, the internal error correction mechanism immediately intervenes and makes adjustments. Preliminary tests show that this framework significantly improves the accuracy of automated systems in handling complex spatial logic and detailed textures.
