Model Overview
Whisper is a general-purpose speech recognition model. You can also use it as a multitask model to perform multilingual speech recognition as well as speech translation and language identification.
Key Features
- Average performance (2/4 dots rating)
- Medium speed (3/5 lightning bolts rating)
- General-purpose speech recognition model
- Accepts audio input and produces text output
- Supports transcription and translation
- Multilingual capabilities
Technical Specifications
- Pricing: $0.006 per 1M tokens (transcription)
- Supports: Input: audio only; Output: text only
- Features: Transcription via v1/audio/transcriptions endpoint, translation via v1/audio/translations endpoint
Snapshots
Positioning and Use Cases
Whisper is a general-purpose speech recognition model, trained on a large dataset of diverse audio. It can be used for multilingual speech recognition, speech translation, and language identification.
Rate Limits
- Free tier: 3 RPM, 200 RPD
- Tier 1: 500 RPM
- Tier 2: 2,500 RPM
- Tier 3: 5,000 RPM
- Tier 4: 7,500 RPM
- Tier 5: 10,000 RPM
Documentation
Official Documentation