New Realtime API for speech-to-speech applications
We're launching the Realtime API (beta) — enabling low-latency, speech-to-speech experiences with GPT-4o.
Features
- Natural speech input and output (no separate STT/TTS step)
- Support for function calling during voice conversations
- 6 preset voices
- WebSocket-based streaming
This is currently in beta. Pricing: $5.00/1M input tokens, $20.00/1M output tokens for audio.
Check the Realtime API guide in our docs for WebSocket connection examples and best practices.
9.8k views43 replies176 likes
Log in to reply to this topic.