AI Tools #OpenAI#voice AI#real-time#translation#speech-to-text

OpenAI Launches GPT-Realtime-2 Voice Models With GPT-5-Class Reasoning

OpenAI introduces three new audio models in the API: GPT-Realtime-2 for conversational reasoning, GPT-Realtime-Translate for live translation, and GPT-Realtime-Whisper for streaming transcription.

Monday May 11, 2026
OpenAI Launches GPT-Realtime-2 Voice Models With GPT-5-Class Reasoning

OpenAI launched three new real-time voice models in the API: GPT-Realtime-2 (first voice model with GPT-5-class reasoning), GPT-Realtime-Translate (live translation across 70+ languages), and GPT-Realtime-Whisper (streaming speech-to-text). These models separate reasoning, translation, and transcription into specialized components, enabling developers to build more natural voice agents.

What Makes GPT-Realtime-2 Different From Previous Voice Models?

GPT-Realtime-2 is OpenAI’s first voice model with GPT-5-class reasoning capabilities. It can handle multi-step instructions, call tools, handle interruptions, and maintain natural conversational flow without the cognitive lag typical of voice-to-text-to-voice pipelines. A 128K-token context window supports longer, more coherent sessions.

How Does the New Voice Architecture Work?

Instead of routing every interaction through one massive model, developers can now route specific tasks to specialized models:

What Are the Pricing and Availability Details?

GPT-Realtime-2 is priced at $32 per 1M audio input tokens and $64 per 1M audio output tokens. GPT-Realtime-Translate costs $0.034 per minute. GPT-Realtime-Whisper costs $0.017 per minute. All models are available in the Realtime API.

Key Takeaways

Frequently Asked Questions

Can GPT-Realtime-2 handle interruptions? Yes. The model is designed to handle corrections, interruptions, and context switches naturally, keeping the conversation moving while reasoning through requests.

Which languages does GPT-Realtime-Translate support? It supports more than 70 input languages and 13 output languages, useful for customer support, cross-border sales, education, events, and global platforms.

How does pricing compare to previous models? The new pricing reflects the specialized nature of each model. GPT-Realtime-2 is priced for premium conversational reasoning, while Whisper offers low-cost streaming transcription.

Back to all news