StepZen Star Company announced that its open-source native speech reasoning model, Step-Audio-R1.1, has achieved first place on a globally renowned artificial intelligence model evaluation ranking. This ranking was released by Artificial Analysis Speech Reasoning and focuses on evaluating the capabilities of speech models in audio processing and logical reasoning, covering multiple dimensions such as accuracy and response time.

Step-Audio-R1.1 surpassed leading closed-source models such as Grok, Gemini, and GPT-Realtime with an accuracy rate of 96.4%, setting a new historical record. In the comprehensive evaluation of performance and speed, Step-Audio-R1.1 demonstrated strong capabilities and has become a focal point in the industry.
This model features deep speech reasoning and real-time response capabilities, allowing it to understand speech content end-to-end without additional delay, with the characteristic of "thinking like a human when hearing a conversation." The latest version not only improves real-time dialogue capabilities but also enhances complex speech reasoning abilities. The complete real-time speech API is planned to be launched in February next year. Currently, users can experience the core functions of R1.1 through the open chat mode, supporting streaming inference that allows users to think and speak simultaneously.
At the launch event, StepZen demonstrated the model's capabilities in practical applications, such as analyzing cat fight sounds and understanding Korean lyrics. These examples showcase the analytical capabilities and speech comprehension level of Step-Audio-R1.1, further proving its excellent performance in complex audio environments.
The weights of Step-Audio-R1.1 have been uploaded to HuggingFace, and developers and researchers can freely download and use them. At the same time, users can also try it at StepZen's Open Platform Experience Center. For those interested in AI technology and speech models, this is undoubtedly an opportunity worth looking forward to.
huggingface: https://huggingface.co/stepfun-ai/Step-Audio-R1.1
Key Points:
🌟 Step-Audio-R1.1 ranks first globally with 96.4% accuracy in international evaluations!
📈 The model has deep speech reasoning and real-time response capabilities, supporting streaming inference.
💻 Users can freely download the model from HuggingFace and try it out on the open platform.
