Gemini 2.0 Flash-Lite

Model Overview

Gemini 2.0 Flash-Lite is a lightweight version of Gemini 2.0 Flash optimized for cost efficiency and low latency.

Key Features

  • Medium intelligence (2/4 dots rating)
  • Very fast speed (5/5 lightning bolts rating)
  • 1,048,576 context window
  • 8,192 max output tokens
  • August 2024 knowledge cutoff
  • Audio, images, video, and text input support
  • Text output support

Technical Specifications

  • Model Code: gemini-2.0-flash-lite
  • Supports: Input: audio, images, video, text; Output: text only
  • Features: Structured outputs, caching, function calling
  • Pricing:
    • Input: $0.075 per 1M tokens
    • Output: $0.30 per 1M tokens
  • Free Tier: Available

Snapshots

  • gemini-2.0-flash-lite (latest)
  • gemini-2.0-flash-lite-001 (stable)

Positioning and Use Cases

Optimized for cost efficiency and low latency. Ideal for high-volume applications where speed and cost are more important than maximum intelligence.

Rate Limits

  • Standard rate limits apply

Documentation

Official Documentation

Google

Next-generation AI models backed by powerful technical expertise

Gemini 2.0 Flash-Lite

Parameters 5/5 lightning bolts rating
Output tokens 8,192 tokens

Gemini 2.0 Flash-Lite is a lightweight version of Gemini 2.0 Flash optimized for cost efficiency and low latency.

Official: $0.075 • $0.30 Our Price: $0.06 • $0.24 Save 20%

Frequently Asked Questions

What is the uptime guarantee?
We guarantee 99.9% uptime with our enterprise-grade infrastructure and redundant systems.
How is pricing calculated?
Pricing is based on the number of tokens processed. Both input and output tokens are counted in the final cost.
What is the difference between GPT-4 and GPT-4 Turbo?
GPT-4 Turbo is the latest version with improved performance, longer context window, and more recent knowledge cutoff date.