Model Overview
GPT-4o mini Transcribe is a speech-to-text model powered by GPT-4o mini. Use it to convert audio to text with the Transcription endpoint in the Audio API.
Key Features
- High performance (3/4 dots rating)
- Fast speed (4/5 lightning bolts rating)
- Speech-to-text model powered by GPT-4o mini
- Accepts audio and text input and produces text output
- 16,000 context window
- 2,000 max output tokens
- Jun 01, 2024 knowledge cutoff
Technical Specifications
- Pricing: Text tokens: $1.25 per 1M input tokens, $5.00 per 1M output tokens; Audio tokens: $3.00 per 1M input tokens
- Supports: Input: audio, text; Output: text only
- Features: Transcription supported via v1/audio/transcriptions endpoint
Snapshots
Positioning and Use Cases
GPT-4o mini Transcribe is a speech-to-text model that uses GPT-4o mini to transcribe audio. It offers improvements to word error rate and better language recognition and accuracy compared to original Whisper models. Use it for more accurate transcripts.
Rate Limits
- Free tier: Not supported
- Tier 1: 500 RPM, 50,000 TPM
- Tier 2: 2,000 RPM, 150,000 TPM
- Tier 3: 5,000 RPM, 600,000 TPM
- Tier 4: 10,000 RPM, 2,000,000 TPM
- Tier 5: 10,000 RPM, 8,000,000 TPM
Documentation
Official Documentation