Gemini 3.1 Flash Live: Next-Gen Audio AI for Voice-First Apps

Google has unveiled Gemini 3.1 Flash Live, marking a significant advancement in real-time conversational artificial intelligence. The new audio and voice model delivers enhanced speed and natural dialogue patterns designed to power the next generation of voice-first applications, with rollouts spanning developer tools, enterprise solutions, and consumer-facing products.

Google Launches Real-Time Audio AI Model

The model demonstrates substantial improvements in performance benchmarks that measure real-world audio processing capabilities. On ComplexFuncBench Audio, which evaluates multi-step function calling with various constraints, Gemini 3.1 Flash Live achieves a 90.8% score. On Scale AI's Audio MultiChallenge—a benchmark specifically designed to test complex instruction following amid the interruptions and hesitations characteristic of natural conversation—the model scores 36.1% with thinking mode enabled.

Strong Performance on Audio Benchmarks

For developers and enterprises, Gemini 3.1 Flash Live introduces enhanced tonal understanding that enables more nuanced dialogue interactions. The model demonstrates improved recognition of acoustic characteristics including pitch and pace, surpassing previous iterations in this capability. Notably, it shows better dynamic adjustment when responding to expressions of user frustration or confusion, making it more suitable for customer-facing applications in real-world environments.

Enhanced Conversation Quality for Applications

The developer preview is accessible through the Gemini Live API in Google AI Studio, while enterprise implementations are available via Gemini Enterprise for Customer Experience. Early adopters including Verizon, LiveKit, and The Home Depot have provided positive feedback regarding the model's natural conversation capabilities integrated into their workflows.

Extended Memory and Global Availability

Consumer applications benefit from faster response times and extended conversation memory—the model can now maintain dialogue context for twice as long as its predecessor. This capability supports longer brainstorming sessions while preserving conversational continuity. Gemini 3.1 Flash Live also enables Search Live expansion across more than 200 countries and territories through built-in multilingual support, allowing users worldwide to engage in real-time, multimodal conversations in their preferred languages.