Samsung SAIL Montreal Lab researchers have recently introduced a new AI architecture called "Tiny Recursive Model" (TRM). This model has only 7 million parameters, far fewer than the billions of parameters in the smallest language models (LLMs), yet it demonstrates remarkable efficiency and outstanding performance on complex structured reasoning tasks such as Sudoku and ARC-AGI tests, successfully surpassing several large language models including Gemini2.5Pro and Claude3.7.

Recursive Reasoning Core Mechanism: A Tight Correction Loop
According to the research report "Less Is More: Recursive Reasoning Based on Tiny Networks," TRM achieved an accuracy of 45% on ARC-AGI-1 and 8% on ARC-AGI-2, outperforming larger models, including o3-mini-high (3.0% on ARC-AGI-2), Gemini2.5Pro (4.9%), DeepSeek R1 (1.3%), and Claude3.7 (0.7%). The authors stated that TRM achieved this with less than 0.01% of the parameters used by most large models. More specialized systems like Grok-4-thinking (16.0%) and Grok-4-Heavy (29.4%) still lead the way.
The authors emphasized that TRM achieved 45% and 8% accuracy on ARC-AGI-1 and ARC-AGI-2 respectively, using less than 0.01% of the parameters typically used by large models, outperforming multiple larger-scale general models. In other benchmark tests, TRM also performed well, raising the accuracy of Sudoku-Extreme from 55.0% to 87.4%, and the accuracy of Maze-Hard from 74.5% to 85.3%.
Research Significance and Limitations
The research results of TRM demonstrate the great potential of small, targeted models in handling narrow, structured reasoning tasks, as they can achieve high efficiency through step-by-step improvement and data augmentation. The study also shows that architecture selection for specific datasets (such as using simple MLP rather than attention mechanisms in a fixed-size grid) is key to success.
However, TRM is not a substitute for general LLMs. It runs on clearly defined grid problems and is not a generation system, so it is not suitable for open-ended, text-based, or multimodal general tasks.
Instead, TRM represents a promising building block in reasoning tasks, demonstrating a new direction in balancing computational efficiency with complex reasoning capabilities, and may expand its application areas in the future. Independent replication and testing are still ongoing.
The emergence of TRM highlights that in the field of AI, architectural innovation and algorithm optimization may be more important than simply pursuing model size. Do you think that this "small but precise" AI model is most likely to achieve large-scale application first in which vertical fields?
