xAI officially launched the Grok Voice Agent API, providing developers with real-time voice interaction capabilities. This API is built on the Grok voice technology stack and has already served millions of users in mobile applications and Tesla vehicles. It is now fully open to developers worldwide.

Exceptional Cost-Effectiveness: Only $0.05 per minute  

The Grok Voice Agent API stands out with industry-leading cost efficiency, featuring a simple and affordable billing model—$0.05 per minute for connection time. This pricing is significantly lower than mainstream competitors, helping developers build high-performance voice applications at the lowest possible cost.

Number One in Audio Inference Benchmark  

In the authoritative audio inference benchmark Big Bench Audio, the Grok Voice Agent API ranks first. The average first audio response time of this API is less than one second, nearly five times faster than the closest competitor, demonstrating excellent real-time response and inference capabilities.

Overview of Core Capabilities  

- Real-time two-way voice communication: supports streaming audio input and output, enabling low-latency, natural conversation experiences.  

- Multilingual Support: covers dozens of languages including Chinese (officially over 100 languages), with native-level pronunciation, accent, and dialect recognition capabilities.  

- Automatic Language Detection and Switching: automatically detects the user's language without configuration and seamlessly switches; developers can also specify the response language through system prompts.  

- External Tool Integration: easily integrate custom tools or access xAI's real-time search capabilities, covering web and X platform data.  

- Real-time Internet Search and Reasoning: instantly query information and perform complex reasoning during conversations.  

- Emotional Prompt Control for Voice: adjust voice emotional expression through prompts to enhance the naturalness of interactions.  

- Multiple Voice Options: offers diverse voice choices, including classic characters such as Sal, Rex, Eve, and Leo, as well as companion-type personalities like Mika and Valentin.  

- Compatibility with OpenAI Realtime API Specifications: seamless migration of existing applications and support for xAI LiveKit plugins, facilitating quick integration.

Future Outlook  

xAI stated that this API will continue to iterate, and in the coming weeks, it will launch independent text-to-speech (TTS) and speech-to-text (STT) endpoints, as well as further optimized audio models to improve pronunciation accuracy and latency performance.