Whisper

Model Overview

Whisper is a general-purpose speech recognition model. You can also use it as a multitask model to perform multilingual speech recognition as well as speech translation and language identification.

Key Features

  • Average performance (2/4 dots rating)
  • Medium speed (3/5 lightning bolts rating)
  • General-purpose speech recognition model
  • Accepts audio input and produces text output
  • Supports transcription and translation
  • Multilingual capabilities

Technical Specifications

  • Pricing: $0.006 per 1M tokens (transcription)
  • Supports: Input: audio only; Output: text only
  • Features: Transcription via v1/audio/transcriptions endpoint, translation via v1/audio/translations endpoint

Snapshots

  • whisper-1

Positioning and Use Cases

Whisper is a general-purpose speech recognition model, trained on a large dataset of diverse audio. It can be used for multilingual speech recognition, speech translation, and language identification.

Rate Limits

  • Free tier: 3 RPM, 200 RPD
  • Tier 1: 500 RPM
  • Tier 2: 2,500 RPM
  • Tier 3: 5,000 RPM
  • Tier 4: 7,500 RPM
  • Tier 5: 10,000 RPM

Documentation

Official Documentation

OpenAI

Pioneer in AI, globally renowned for GPT series models

Whisper

Parameters Unknow

Whisper General-purpose speech recognition model

Official: $0.006 Our Price: $0.0048 Save 20%

Frequently Asked Questions

What is the uptime guarantee?
We guarantee 99.9% uptime with our enterprise-grade infrastructure and redundant systems.
How is pricing calculated?
Pricing is based on the number of tokens processed. Both input and output tokens are counted in the final cost.
What is the difference between GPT-4 and GPT-4 Turbo?
GPT-4 Turbo is the latest version with improved performance, longer context window, and more recent knowledge cutoff date.