New Realtime API for speech-to-speech applications

Maria Santos
Maria SantosStaffOpenAI StaffOct 1, 2024

We're launching the Realtime API (beta) — enabling low-latency, speech-to-speech experiences with GPT-4o.

Features

  • Natural speech input and output (no separate STT/TTS step)
  • Support for function calling during voice conversations
  • 6 preset voices
  • WebSocket-based streaming

This is currently in beta. Pricing: $5.00/1M input tokens, $20.00/1M output tokens for audio.

Check the Realtime API guide in our docs for WebSocket connection examples and best practices.

9.8k views43 replies176 likes

Log in to reply to this topic.