Gemini 2.5 Flash Native Audio

Model Overview

Gemini 2.5 Flash Native Audio provides interactive and unstructured conversational experiences with high quality, natural conversational audio outputs, available with or without thinking capabilities.

Key Features

High intelligence (3/4 dots rating)
Fast speed (4/5 lightning bolts rating)
128,000 context window
8,000 max output tokens
January 2025 knowledge cutoff
Audio, video, and text input support
Audio and text output support (interleaved)

Technical Specifications

Model Code: gemini-2.5-flash-preview-native-audio-dialog & gemini-2.5-flash-exp-native-audio-thinking-dialog
Supports: Input: audio, video, text; Output: audio and text
Features: Audio generation, function calling, search grounding, thinking, style and control prompting
Pricing:
- Input: $0.50 per 1M tokens (text), $3.00 per 1M tokens (audio/video)
- Output: $2.00 per 1M tokens (text), $12.00 per 1M tokens (audio)
Free Tier: Not available

Snapshots

gemini-2.5-flash-preview-native-audio-dialog (preview)
gemini-2.5-flash-exp-native-audio-thinking-dialog (experimental)

Positioning and Use Cases

Available through the Live API for low-latency bidirectional voice interactions. Ideal for conversational AI applications, voice assistants, and interactive audio experiences with natural speech generation.

Rate Limits

More restricted rate limits since it is an experimental/preview model

Documentation

Official Documentation

Google

Next-generation AI models backed by powerful technical expertise

Gemini 2.5 Flash Native Audio

Parameters

Output tokens 8,000 tokens

Official: $0.50 • $2.00 Our Price: $0.40 • $1.60 Save 20%

Back To List Try Now

Frequently Asked Questions

What is the uptime guarantee?

We guarantee 99.9% uptime with our enterprise-grade infrastructure and redundant systems.

How is pricing calculated?

Pricing is based on the number of tokens processed. Both input and output tokens are counted in the final cost.

What is the difference between GPT-4 and GPT-4 Turbo?

GPT-4 Turbo is the latest version with improved performance, longer context window, and more recent knowledge cutoff date.