GPT-3.5 Turbo

Model Overview

GPT-3.5 Turbo is a legacy GPT model for cheaper chat and non-chat tasks.

Key Features

  • Low intelligence (1/4 dots rating)
  • Slow speed (2/5 lightning bolts rating)
  • 16,385 context window
  • 4,096 max output tokens
  • Sep 01, 2021 knowledge cutoff
  • Text input support
  • Text output support

Technical Specifications

  • Pricing: $0.50 per 1M tokens (input), $1.50 per 1M tokens (output)
  • Supports: Input: text; Output: text only
  • Features: Fine-tuning

Snapshots

  • gpt-3.5-turbo (alias for gpt-3.5-turbo-0125)
  • gpt-3.5-turbo-0125
  • gpt-3.5-turbo-1106
  • gpt-3.5-turbo-instruct

Positioning and Use Cases

GPT-3.5 Turbo models can understand and generate natural language or code and have been optimized for chat using the Chat Completions API but work well for non-chat tasks as well. As of July 2024, use gpt-4o-mini in place of GPT-3.5 Turbo, as it is cheaper, more capable, multimodal, and just as fast. GPT-3.5 Turbo is still available for use in the API.

Rate Limits

  • Free tier: Not supported
  • Tier 1: 3,500 RPM, 10,000 RPD, 200,000 TPM, 2,000,000 batch queue limit
  • Tier 2: 3,500 RPM, 2,000,000 TPM, 5,000,000 batch queue limit
  • Tier 3: 3,500 RPM, 800,000 TPM, 50,000,000 batch queue limit
  • Tier 4: 10,000 RPM, 10,000,000 TPM, 1,000,000,000 batch queue limit
  • Tier 5: 10,000 RPM, 50,000,000 TPM, 10,000,000,000 batch queue limit

Documentation

Official Documentation

OpenAI

Pioneer in AI, globally renowned for GPT series models

GPT-3.5 Turbo

Parameters

GPT-3.5 Turbo Legacy GPT model for cheaper chat and non-chat tasks

Official: $0.5 • $1.5 Our Price: $0.4 • $1.2 Save 20%

Frequently Asked Questions

What is the uptime guarantee?
We guarantee 99.9% uptime with our enterprise-grade infrastructure and redundant systems.
How is pricing calculated?
Pricing is based on the number of tokens processed. Both input and output tokens are counted in the final cost.
What is the difference between GPT-4 and GPT-4 Turbo?
GPT-4 Turbo is the latest version with improved performance, longer context window, and more recent knowledge cutoff date.