OpenAI Releases GPT-Realtime Model and General Availability of Realtime API, Advancing Voice AI

Edited by: Veronika Radoslavskaya

OpenAI has announced the general availability of its Realtime API, alongside the introduction of its most advanced speech-to-speech model, GPT-Realtime. This development aims to enhance voice integration in applications, fostering more natural and efficient human-AI interactions.

The Realtime API, which was in beta since October 2024, now offers robust support for low-latency, multimodal conversations, handling both text and audio inputs and outputs, and crucially, supporting function calling. The GPT-Realtime model is engineered for direct audio processing, bypassing traditional multi-step conversion processes. This results in faster and more natural conversational exchanges, with the model capable of interpreting non-verbal cues, switching languages mid-sentence, and adjusting tone and accent.

Key enhancements to the Realtime API include WebRTC support for simplified integration, image input capabilities for visual analysis during conversations, and Session Initiation Protocol (SIP) for seamless phone call integrations. Additionally, reusable prompts are now available, offering developers greater efficiency. These advancements are designed to empower developers to create innovative voice-enabled applications across various sectors.

In terms of performance, GPT-Realtime demonstrates significant improvements in instruction following and function calling accuracy, with benchmarks indicating enhanced reasoning capabilities. OpenAI has also reduced the pricing for the Realtime API by 20% compared to previous tiers. The new rates are set at $32 per million audio input tokens and $64 per million audio output tokens, with cached audio input tokens priced at $0.40 per million. This pricing adjustment is intended to increase accessibility and encourage wider adoption.

The ability to process audio directly with reduced latency, combined with enhanced natural language understanding and generation, positions voice AI as a transformative force in customer service, telecommunications, and other fields. These capabilities are expected to enable more intuitive, personalized, and efficient interactions, fundamentally changing how businesses engage with customers and how individuals interact with technology.

Sources

  • WebProNews

  • Introducing gpt-realtime and Realtime API updates for production voice agents

  • o1 and new tools for developers

  • Realtime API | OpenAI Help Center

  • OpenAI updates the Realtime API with gpt-realtime, its most advanced voice AI model yet

  • OpenAI Introduces GPT-Realtime Speech Generation Model, Makes Realtime API Generally Available

Did you find an error or inaccuracy?

We will consider your comments as soon as possible.