Whisper API vs local Whisper model: latency and accuracy comparison
Yuki TanakaSep 5, 2024
I've benchmarked the Whisper API against running whisper-large-v3 locally for our podcast transcription service. Here are the results.
Test setup
Results
| Metric | API (whisper-1) | Local (large-v3) | |--------|----------------|------------------| | WER (studio) | 4.2% | 3.8% | | WER (noisy) | 8.7% | 7.1% | | Latency (1hr audio) | 45s | 12min (RTX 4090) | | Cost (1hr audio) | $0.36 | ~$0.02 (electricity) |
The API is significantly faster but the local model is more accurate, especially on noisy audio. At scale (1000+ hours/month), local is much cheaper.
We ended up using the API for real-time transcription and local for batch processing.
6.8k views36 replies95 likes
Log in to reply to this topic.