OpenAI Launches GPT-Realtime-2 Voice Models With GPT-5-Class Reasoning

OpenAI launched three new real-time voice models in the API: GPT-Realtime-2 (first voice model with GPT-5-class reasoning), GPT-Realtime-Translate (live translation across 70+ languages), and GPT-Realtime-Whisper (streaming speech-to-text). These models separate reasoning, translation, and transcription into specialized components, enabling developers to build more natural voice agents.

What Makes GPT-Realtime-2 Different From Previous Voice Models?

GPT-Realtime-2 is OpenAI’s first voice model with GPT-5-class reasoning capabilities. It can handle multi-step instructions, call tools, handle interruptions, and maintain natural conversational flow without the cognitive lag typical of voice-to-text-to-voice pipelines. A 128K-token context window supports longer, more coherent sessions.

How Does the New Voice Architecture Work?

Instead of routing every interaction through one massive model, developers can now route specific tasks to specialized models:

GPT-Realtime-2 for conversational reasoning and complex task orchestration
GPT-Realtime-Translate for live multilingual translation (70+ input languages, 13 output languages)
GPT-Realtime-Whisper for low-latency streaming transcription

What Are the Pricing and Availability Details?

GPT-Realtime-2 is priced at $32 per 1M audio input tokens and $64 per 1M audio output tokens. GPT-Realtime-Translate costs $0.034 per minute. GPT-Realtime-Whisper costs $0.017 per minute. All models are available in the Realtime API.

Key Takeaways

GPT-Realtime-2: first voice model with GPT-5-class reasoning
GPT-Realtime-Translate: live translation across 70+ input and 13 output languages
GPT-Realtime-Whisper: streaming transcription at $0.017/minute
128K token context window for longer sessions
Modular architecture lets developers route tasks to specialized models
Available now in the Realtime API

Frequently Asked Questions

Can GPT-Realtime-2 handle interruptions? Yes. The model is designed to handle corrections, interruptions, and context switches naturally, keeping the conversation moving while reasoning through requests.

Which languages does GPT-Realtime-Translate support? It supports more than 70 input languages and 13 output languages, useful for customer support, cross-border sales, education, events, and global platforms.

How does pricing compare to previous models? The new pricing reflects the specialized nature of each model. GPT-Realtime-2 is priced for premium conversational reasoning, while Whisper offers low-cost streaming transcription.