OpenAI Developer Community

I've benchmarked the Whisper API against running whisper-large-v3 locally for our podcast transcription service. Here are the results.

Test setup

100 podcast episodes (30-90 min each)

English language

Various audio qualities (studio to phone recordings)

Results

| Metric | API (whisper-1) | Local (large-v3) | |--------|----------------|------------------| | WER (studio) | 4.2% | 3.8% | | WER (noisy) | 8.7% | 7.1% | | Latency (1hr audio) | 45s | 12min (RTX 4090) | | Cost (1hr audio) | $0.36 | ~$0.02 (electricity) |

The API is significantly faster but the local model is more accurate, especially on noisy audio. At scale (1000+ hours/month), local is much cheaper.

We ended up using the API for real-time transcription and local for batch processing.

Whisper API vs local Whisper model: latency and accuracy comparison

Test setup

Results