Recently, Yuchen Jin, co-founder and CTO of Hyperbolic, revealed an impressive story on social platform X: Researcher Keller Jordan successfully joined OpenAI based solely on a blog post, and is very likely using the neural network optimizer Muon mentioned in the blog to train the latest GPT-5.
Keller Jordan's blog titled "Muon: Optimizer for Hidden Layers of Neural Networks" was published in December 2024 and quickly attracted attention from the industry. In this article, he detailed the design concept and practical achievements of Muon, emphasizing its enormous potential in improving training speed. Jordan demonstrated through experiments that using Muon could reduce the training time for the CIFAR-10 task to 79% of the original, and significantly improve the training speed in NanoGPT's rapid execution.
The core of Muon lies in its unique design: it optimizes the parameters of neural network hidden layers using the Newton-Schulz iteration method, which has shown excellent performance in practice. Jordan also pointed out that Muon can maintain high efficiency when performing large-scale training with modern GPUs, with operating costs below 1%. Additionally, he conducted a deep analysis of parameter settings and effects during the optimization process, offering many valuable insights.
In the blog, Jordan also criticized some problems in the current field of optimization research, believing that many newly proposed optimizers fail to effectively surpass existing standards like AdamW in practical applications. He called on the research community to focus on baseline adjustments, emphasizing the practical application effects of optimization algorithms.
This innovative optimizer not only helped Keller Jordan enter the door of OpenAI, but may also become an important part of GPT-5. With the continuous development of AI technology, the emergence of Muon marks a significant step forward for researchers in improving the efficiency of neural network training, possibly leading to more technological innovations in the future.