Converting YouTube Videos into Visual Explainers with Nano Banana Pro

By Google for Developers

Summary

## Key takeaways - **YouTube videos to 5-step explainers**: Using the model's reasoning capabilities, it takes any YouTube video input and turns it into a wiki house style explainer in five steps. For example, a release notes video on anti-gravity becomes a beautifully illustrated how-to tutorial. [00:03], [00:14] - **Text rendering benchmarks image quality**: Text rendering shows fine details failing in images, like people's faces in crowds, and improving it correlates with better small details overall. The model still struggles with very small text or full pages but shows remarkable progress. [03:24], [04:06] - **Non-English text rendering excels**: Progress on languages other than English is remarkable, with top languages evaluated and even Czech, spoken by 10 million, rendered awesome even in unspecific prompts like restaurant menus. [04:13], [04:43] - **New model adds 4K and search**: Feedback drove 4K resolution addition, with transparent backgrounds on the to-do list for future releases. The model now renders real-time information via search access. [01:42], [02:07] - **Visualize ideas beyond creativity**: These models enable visualizing any information to communicate ideas and spruce up slide decks, moving past just creative tasks. [02:21], [02:44]

Topics Covered

Visualize Real-Time Info via Search
Text Rendering Benchmarks Fine Details
Non-English Text Rendering Excels
AI Turns Videos into Illustrated Explainers

Full Transcript

So uh this one u this one's really fun.

Um so again using the model's reasoning capabilities what it can do is take any sort of input uh YouTube video an image or text and then turn it into a almost like a wiki house style explainer in

five steps. So a video I was watching

five steps. So a video I was watching last night before bed. One of my favorite things to do is release notes.

Yeah. So I love watching release before bed. And uh now I can take that video uh

bed. And uh now I can take that video uh and so this is the video about anti-gravity and the new agentic model or the development platform and I can actually just paste that in here and

then get an explainer generated for me again using Nano Banana and Gemini 3 Pro. So we'll have to wait for this one

Pro. So we'll have to wait for this one to render. But yeah, I'm excited to see

to render. But yeah, I'm excited to see this. While while it's rendering,

this. While while it's rendering, Nicole, do you want to come out and uh welcome welcome to the live stream?

>> Welcome. Grab a seat.

uh you were one of the one of the lead PMs for all of our gen media stuff now more more than just nano banana models.

Um so thank you for being here. Thanks

for hanging out. Anything anything top of mind like as we launch this model um that you want to you want to talk about?

>> You probably already talked about a bunch of things and sorry I'm late.

>> No, it's all good. There's we got to keep watching the model. We I we were saying before off camera that this is now the single point of failure for gen media PMs inside of deep mind. So

hopefully pretty much if anything goes down all the people who need to fix it are sitting literally right here.

>> Um >> besides the the hundreds of >> engineers hundreds of Indian stock we're all good.

>> Everything's I'm really excited to get this model out. I I think one of the things that um I'm super excited about is that we got a lot of feedback from everybody on Nano Banana including like

where's the 4K resolution and you know it's it's here. Um, and so it's been super awesome to see everybody's feedback. Like, keep giving it to us

feedback. Like, keep giving it to us because like chances are that in the next model release, we'll probably incorporate it. Um,

incorporate it. Um, >> I want transparent backgrounds. That's

my feature request. I saw a bunch. I've

already gotten a bunch of comments about this.

>> Yeah. Yeah.

>> We we really want to make transparent backgrounds work. Um, it's on our to-do

backgrounds work. Um, it's on our to-do list. Um, it's not in this release, but,

list. Um, it's not in this release, but, you know, stay tuned. Um and obviously I think you guys probably have already talked about the world knowledge and the fact that you can now render um things um you know in real time information

because the model has access to search.

Um that's really exciting and I I think you know we've been talking about the idea of sort of using these models for visualizing information um and doing more than just the creative tasks and the creative tasks are awesome and super

important but I feel like we're finally kind of getting to a point where we can start to do those things and that at some point like you will be able to visualize anything right um and like communicate ideas and spruce up your

slide decks and I'm just super excited about like what this opens up in terms of new capabilities.

>> Yeah. I have a quick question actually for all three of you on the model team which is last time we talked Nicole we were sitting down with um with Robert and Kosik and Mo um and talking about

the original nano banana model and how the team uses the like text rendering capability as sort of like one of the main like image generation model quality benchmarks and I wonder if there like I

don't I'm sure folks have forgot or not seen that conversation so do you want to give any context like why is that why is text rendering like such a important benchmark to try to hill climb for like

overall image generation quality.

>> Um so in in general text rendering is really where you see fine details failing right it's it's like an example of a fine detail that fails in an image and you can see failures in images and fine details that are not text

rendering. I think people's faces is

rendering. I think people's faces is another one of those. If you, you know, have an image of a crowd, um, and there's a lot of people, we call this a small faces problem. Um, you tend to really quickly notice the model's

failure modes. And text is just kind of

failure modes. And text is just kind of another one of those examples in terms of like the model's capacity to learn a lot of things and then really be able to like render very small accurate detail.

And there tends to be this like correlation when you know like if you improve that, you're also improving like other small details in images. Um, and

you just kind of notice fewer floss.

Obviously, you know, you'll notice that this model will still make mistakes with like very small text. And if you try to render like a full page um of text, it still fails. But like the progress that

still fails. But like the progress that we've made and especially the progress that we've made on languages other than English is actually like remarkable. I

don't know if you guys have already talked about this.

>> We haven't seen actually we looked at the the Spanish example >> for our Spanish toothpaste brand >> which we're now

a big fat. Um but what's what's fun is um you know we've we've evaluated kind of the top languages that we see people using in the apps and in other products.

My native language is Czech which is like nowhere near the top used languages in the world. You know 10 million people speak it but the model actually does an awesome job rendering Czech text and like that's been super exciting even

when you you know ask for like make me a picture of a menu and you don't actually specify what's on the menu. Um, so I encourage everyone to try it in languages that are maybe like not on the list that we published and again like send us feedback because we want to make

it better.

>> I love that. Yeah, ping all of you on uh on X. Don't don't ping me if you have

on X. Don't don't ping me if you have complaints.

>> We're happy to respond.

>> Well, we'll tag you when there are complaints about AI Studio.

>> That's perfect. That's what that's what we can fix. Omar, you had kicked off the example before. Do you want to show I

example before. Do you want to show I forgot you had put in a YouTube video?

>> Yeah, I put in the release notes video with anti-gravity. Um, and so now we

with anti-gravity. Um, and so now we have this walkthrough tutorial.

Essentially, it's taken all the content from the video, um, understood it, and now turned it into a how-to. So, I think you might have used an Airbnb for dogs example.

>> Verun did. Verun did, right? And so, his side hustle if folks don't know besides building.

>> Everybody's in getting investing an Airbnb for dogs. So, ask him about if you see him in the office.

>> Well, this is crazy, right? So, it

illustrated that and it basically drew it in a prompt box. Uh, and shows you you got to start a new conversation, summon your digital intern, essentially the agent. Um, and it's and it's

the agent. Um, and it's and it's actually showing you review the artifacts.

>> Yeah. Which is a feature of anti-gravity.

>> Uh, be a backseat driver. You can just go ahead and, uh, you know, fill this out. Um, and it's explaining everything

out. Um, and it's explaining everything to you in that video in a very simple format, beautifully illustrated as well.

Awesome.

>> Uh, with all the details and, uh, yeah, it just makes it so much easier to gro complex concepts. uh with stuff like

complex concepts. uh with stuff like this.

>> We should get YouTube to build this as a feature. You go in there and it just

feature. You go in there and it just turns the video automatically into uh an an explanation, an image.

>> I also love uh the rebrand of a dog daycare to an Airbnb for >> dogs.

It's a great little positioning.

>> It's more it's more trendy that way.

>> Very tech forward, right? This is how you raise the fun.

>> They even came up with dog ps.

Yeah. Yeah. And then you have, oh, this is perfect because you know how people like check in with their dogs and like check the little video camera while your while their dog is at daycare. Instead

of checking yourself, you send your AI agent.

>> Just saying. Innovation.

>> Innovation. And we got to have something to raise money for the people.

Loading...

Loading video analysis...