Google Gemini Omni Flash Launches: The 'Nano Banana for Video' Creates and Edits Video From Any Input

Google launched Gemini Omni Flash at I/O 2026 — a multimodal model that combines reasoning with video creation. Omni accepts any combination of text, images, audio, and video as input and generates high-quality video output grounded in Gemini’s real-world knowledge. Google describes it as “Nano Banana for video,” enabling conversational multi-turn video editing.

What Is Gemini Omni?

Gemini Omni is Google’s new family of models that fuse the company’s Gemini reasoning capabilities with creative generation. The first model, Omni Flash, focuses on video — accepting mixed inputs (images, audio, existing video clips, text) and producing coherent video output. Unlike Veo, which is primarily text-to-video generation, Omni is designed as a creation-plus-editing system with persistent conversational context.

How Does Omni Video Editing Work?

Omni lets users edit videos conversationally across multiple turns. Each instruction builds on the last — a user can start by changing the environment, then shift the camera angle, then add objects, all within the same chat session without regenerating from scratch. This “vibe coding for video” approach is Omni’s key differentiator from tools like Runway or Kling that require dedicated editing interfaces.

What Can Omni Actually Generate?

In demos, Omni showed the ability to transform a simple video of someone drawing a circle into complex animations, change a violinist’s environment while keeping the musician intact, and generate specific objects for each letter of the alphabet using Gemini’s world knowledge. Clips are capped at 10 seconds in the initial rollout, with longer durations planned.

Who Gets Access?

Omni Flash is rolling out to Google AI Plus, Pro, and Ultra subscribers globally through the Gemini app and Google Flow. It’s also available at no cost on YouTube Shorts and the YouTube Create App. Developer API access is expected in the coming weeks. All Omni-generated videos include SynthID digital watermarks.

Key Takeaways

Omni accepts text, images, audio, and video as input, outputs video
Conversational multi-turn editing — change scenes, camera angles, objects via chat
Built on Gemini’s world knowledge for grounded, realistic generation
First model: Omni Flash with 10-second video clips
Available on Gemini app, Google Flow, YouTube Shorts
Digital avatars with verification onboarding coming soon

Frequently Asked Questions

Is Omni replacing Veo? No. Google positions Omni alongside Veo — Veo focuses on cinematic text-to-video, while Omni handles multimodal creation and conversational editing.

Does Omni generate audio? The initial release focuses on video output. Audio generation and other output modalities (image, audio) are planned for future Omni family models.

How much does it cost? Omni Flash is included in existing Google AI subscription tiers (Plus, Pro, Ultra at $100/month). YouTube Shorts users get access at no cost.