Model Overview
GPT-4o Transcribe is a speech-to-text model powered by GPT-4o. Use it to convert audio to text with the Transcription endpoint in the Audio API.
Key Features
- Higher performance (4/4 dots rating)
- Medium speed (3/5 lightning bolts rating)
- Speech-to-text model powered by GPT-4o
- Accepts audio and text input and produces text output
- 16,000 context window
- 2,000 max output tokens
- Jun 01, 2024 knowledge cutoff
Technical Specifications
- Pricing: Text tokens: $2.50 per 1M input tokens, $10.00 per 1M output tokens; Audio tokens: $6.00 per 1M input tokens
- Supports: Input: audio, text; Output: text only
- Features: Transcription supported via v1/audio/transcriptions endpoint
Snapshots
Positioning and Use Cases
GPT-4o Transcribe is a speech-to-text model that uses GPT-4o to transcribe audio. It offers improvements to word error rate and better language recognition and accuracy compared to original Whisper models. Use it for more accurate transcripts.
Rate Limits
- Free tier: Not supported
- Tier 1: 500 RPM, 10,000 TPM
- Tier 2: 2,000 RPM, 100,000 TPM
- Tier 3: 5,000 RPM, 400,000 TPM
- Tier 4: 10,000 RPM, 2,000,000 TPM
- Tier 5: 10,000 RPM, 6,000,000 TPM
Documentation
Official Documentation