GPT-4o Transcribe Speech-to-text

Model Overview

GPT-4o Transcribe is a speech-to-text model powered by GPT-4o. Use it to convert audio to text with the Transcription endpoint in the Audio API.

Key Features

Higher performance (4/4 dots rating)
Medium speed (3/5 lightning bolts rating)
Speech-to-text model powered by GPT-4o
Accepts audio and text input and produces text output
16,000 context window
2,000 max output tokens
Jun 01, 2024 knowledge cutoff

Technical Specifications

Pricing: Text tokens: $2.50 per 1M input tokens, $10.00 per 1M output tokens; Audio tokens: $6.00 per 1M input tokens
Supports: Input: audio, text; Output: text only
Features: Transcription supported via v1/audio/transcriptions endpoint

Snapshots

gpt-4o-transcribe

Positioning and Use Cases

GPT-4o Transcribe is a speech-to-text model that uses GPT-4o to transcribe audio. It offers improvements to word error rate and better language recognition and accuracy compared to original Whisper models. Use it for more accurate transcripts.

Rate Limits

Free tier: Not supported
Tier 1: 500 RPM, 10,000 TPM
Tier 2: 2,000 RPM, 100,000 TPM
Tier 3: 5,000 RPM, 400,000 TPM
Tier 4: 10,000 RPM, 2,000,000 TPM
Tier 5: 10,000 RPM, 6,000,000 TPM

Documentation

Official Documentation

OpenAI

Pioneer in AI, globally renowned for GPT series models

GPT-4o Transcribe Speech-to-text

Parameters Unknow

Output tokens 2,000 tokens

GPT-4o Transcribe Speech-to-text model powered by GPT-4o

Official: $2.5 • $10 Our Price: $2 • $8 Save 20%

Back To List Try Now

Frequently Asked Questions

What is the uptime guarantee?

We guarantee 99.9% uptime with our enterprise-grade infrastructure and redundant systems.

How is pricing calculated?

Pricing is based on the number of tokens processed. Both input and output tokens are counted in the final cost.

What is the difference between GPT-4 and GPT-4 Turbo?

GPT-4 Turbo is the latest version with improved performance, longer context window, and more recent knowledge cutoff date.