GPT-4o Transcribe Speech-to-text

Model Overview

GPT-4o Transcribe is a speech-to-text model powered by GPT-4o. Use it to convert audio to text with the Transcription endpoint in the Audio API.

Key Features

  • Higher performance (4/4 dots rating)
  • Medium speed (3/5 lightning bolts rating)
  • Speech-to-text model powered by GPT-4o
  • Accepts audio and text input and produces text output
  • 16,000 context window
  • 2,000 max output tokens
  • Jun 01, 2024 knowledge cutoff

Technical Specifications

  • Pricing: Text tokens: $2.50 per 1M input tokens, $10.00 per 1M output tokens; Audio tokens: $6.00 per 1M input tokens
  • Supports: Input: audio, text; Output: text only
  • Features: Transcription supported via v1/audio/transcriptions endpoint

Snapshots

  • gpt-4o-transcribe

Positioning and Use Cases

GPT-4o Transcribe is a speech-to-text model that uses GPT-4o to transcribe audio. It offers improvements to word error rate and better language recognition and accuracy compared to original Whisper models. Use it for more accurate transcripts.

Rate Limits

  • Free tier: Not supported
  • Tier 1: 500 RPM, 10,000 TPM
  • Tier 2: 2,000 RPM, 100,000 TPM
  • Tier 3: 5,000 RPM, 400,000 TPM
  • Tier 4: 10,000 RPM, 2,000,000 TPM
  • Tier 5: 10,000 RPM, 6,000,000 TPM

Documentation

Official Documentation

OpenAI

Pioneer in AI, globally renowned for GPT series models

GPT-4o Transcribe Speech-to-text

Parameters Unknow
Output tokens 2,000 tokens

GPT-4o Transcribe Speech-to-text model powered by GPT-4o

Official: $2.5 • $10 Our Price: $2 • $8 Save 20%

Frequently Asked Questions

What is the uptime guarantee?
We guarantee 99.9% uptime with our enterprise-grade infrastructure and redundant systems.
How is pricing calculated?
Pricing is based on the number of tokens processed. Both input and output tokens are counted in the final cost.
What is the difference between GPT-4 and GPT-4 Turbo?
GPT-4 Turbo is the latest version with improved performance, longer context window, and more recent knowledge cutoff date.