Whisper

Model Overview

Whisper is a general-purpose speech recognition model. You can also use it as a multitask model to perform multilingual speech recognition as well as speech translation and language identification.

Key Features

Average performance (2/4 dots rating)
Medium speed (3/5 lightning bolts rating)
General-purpose speech recognition model
Accepts audio input and produces text output
Supports transcription and translation
Multilingual capabilities

Technical Specifications

Pricing: $0.006 per 1M tokens (transcription)
Supports: Input: audio only; Output: text only
Features: Transcription via v1/audio/transcriptions endpoint, translation via v1/audio/translations endpoint

Snapshots

whisper-1

Positioning and Use Cases

Whisper is a general-purpose speech recognition model, trained on a large dataset of diverse audio. It can be used for multilingual speech recognition, speech translation, and language identification.

Rate Limits

Free tier: 3 RPM, 200 RPD
Tier 1: 500 RPM
Tier 2: 2,500 RPM
Tier 3: 5,000 RPM
Tier 4: 7,500 RPM
Tier 5: 10,000 RPM

Documentation

Official Documentation

OpenAI

Pioneer in AI, globally renowned for GPT series models

Whisper

Parameters Unknow

Whisper General-purpose speech recognition model

Official: $0.006 Our Price: $0.0048 Save 20%

Back To List Try Now

Frequently Asked Questions

What is the uptime guarantee?

We guarantee 99.9% uptime with our enterprise-grade infrastructure and redundant systems.

How is pricing calculated?

Pricing is based on the number of tokens processed. Both input and output tokens are counted in the final cost.

What is the difference between GPT-4 and GPT-4 Turbo?

GPT-4 Turbo is the latest version with improved performance, longer context window, and more recent knowledge cutoff date.