OpenAI has recently launched two key API updates for developers around the world, aiming to significantly enhance the performance of AI agents in voice interactions and complex task flows.

In terms of models, the new real-time model gpt-realtime-1.5 and its accompanying audio model have been officially released. Their core goal is to improve the reliability of voice commands. According to internal test data from OpenAI, the new model has improved the transcription accuracy of numbers and letters by about 10%, increased the accuracy of logical audio tasks by 5%, and also improved the accuracy of instruction execution by 7%, effectively solving the issue of deviations when AI listens to key phrases or executes complex voice commands.

OpenAI

In terms of architecture, the Responses API now supports the WebSocket protocol, marking a major transformation in AI communication. Unlike previous modes where the entire context had to be retransmitted with each request, WebSocket allows developers to establish a persistent connection, and the system only sends incremental data when new information is generated.

OpenAI noted that this improvement is particularly crucial for complex AI agents that frequently call a large number of tools, as it can directly increase their operating speed by 20% to 40%