Recently, StepZen officially released the next-generation real-time voice large model StepAudio 2.5 Realtime. The model is now fully available, and developers can access it through the StepZen open platform. StepAudio 2.5 Realtime aims to provide users with a more realistic conversational experience, featuring comprehensive technological improvements in paralanguage perception, character customization, and conversation capabilities.

image.png

The core innovation of StepAudio 2.5 Realtime lies in its ability to process paralanguage information. Paralanguage includes tone, speech rate, pauses, and non-verbal expressions such as sighs or laughter. These details are crucial for conveying emotions. By analyzing these elements, the model can perceive the user's mood and underlying intentions, such as identifying fatigue from a low tone or frustration from a rapid speech rate, and adjust the response tone and strategy accordingly to enhance the naturalness of the interaction.

In terms of character customization, StepAudio 2.5 Realtime allows developers to make flexible adjustments. Users can use the API to adjust the personality traits, background experiences, and language habits of the AI character. The model is based on over 10,000 high-quality native character profiles, generating a million-level character feature matrix through algorithms, and has been trained on a large amount of real conversation data. The development team also conducted reinforcement learning optimization to ensure the model maintains stable character consistency in extreme scenarios. Additionally, the model includes five preset character profiles for direct user experience.

In terms of overall conversation capabilities, StepAudio 2.5 Realtime emphasizes the dual enhancement of intelligence and emotional quotient. In addition to being able to deeply understand complex semantics and handle various situations during communication, the model can also call upon knowledge from multiple fields to provide a deeper conversation experience. It can serve as an emotional conversation partner or simulate a professional HR for formal interviews.

According to the latest official evaluation data, the model performs exceptionally well in five test dimensions. Particularly in the user experience score, StepAudio 2.5 Realtime achieved a score of 80.41, significantly higher than other similar products such as GPT-Realtime-1.5 and Gemini Live, demonstrating its strong performance and application potential.

Key Points:   

🌟 StepAudio 2.5 Realtime has advanced paralanguage processing capabilities, accurately perceiving user emotions.   

🎭 Users can customize the personality and background of the AI character via API, enhancing the personalization of interactions.   

📊 Official evaluations show that the model performs excellently in multiple tests, scoring far above similar products.