TLDW logo

GTC March 2025 Keynote with NVIDIA CEO Jensen Huang

By NVIDIA

Summary

## Key takeaways - **AI factories are the future of data centers**: Data centers are undergoing a platform shift from general-purpose computing to accelerated computing powered by GPUs, transforming into 'AI factories' solely focused on generating tokens for various applications. [03:49], [22:25] - **Agentic AI requires 100x more computation**: The rise of agentic AI, capable of reasoning, planning, and action, has dramatically increased computational needs, requiring up to 100 times more tokens and processing power than previously estimated. [08:18], [12:56] - **Blackwell GPU offers 25x power efficiency leap**: The new Blackwell GPU architecture, utilizing disaggregated MVLink and liquid cooling, achieves a 25x improvement in performance per watt compared to Hopper, enabling more efficient AI factories. [46:43], [01:16:35] - **Robots are becoming generalists with AI**: Nvidia's Newton physics engine and Groot N1 foundation model are enabling robots to learn and perform complex, multi-step tasks, addressing the global labor shortage and ushering in an era of embodied AI. [01:56:35], [01:59:26] - **Nvidia's roadmap accelerates AI infrastructure**: Nvidia is systematically advancing AI infrastructure with annual architectural updates like Blackwell Ultra and Vera Rubin, alongside networking innovations like Spectrum X and silicon photonics, to meet the escalating demands of AI. [01:24:25], [01:32:24]

Topics Covered

  • AI's computational demand has exploded 100x due to reasoning.
  • Generative AI shifts computing from retrieval to generation.
  • Agentic AI requires reasoning, planning, and action capabilities.
  • The data center industry is undergoing a platform shift to accelerators.
  • Blackwell's architecture enables 25x performance increase through power efficiency.

Full Transcript

this is how intelligence is

made a new kind of

factory generator of

tokens the building blocks of

AI tokens have opened a new

frontier the first step into an

extraordinary

world where less possibilities are

born tokens transform images into

scientific

data charting alien

atmospheres and guiding the explorers of

tomorrow they turn raw data into

foresight so next

time we'll be ready

[Music]

tokens decode the laws of

physics to get us there

[Music]

faster and take us

further tokens see disease before it

takes

hold they help us unravel the language

of life

and learn what makes us

tick tokens connect the

dots so we can protect our most noble

[Music]

creatures they turn potential into

[Music]

plenty and help us Harvest our

[Music]

Bounty tokens don't just teach robots

how to move but to bring

[Music]

joy to lend us a

hand and put life within reach

[Music]

together we take the Next Great Leap to

bravely

go where no one has gone

[Music]

before and

here is where it all

[Music]

begins welcome to the stage Nvidia

founder and CEO Jensen Wong

[Music]

welcome to

GTC what an amazing

year we wanted to do this at

Nvidia so through the magic of

artificial

intelligence we're going to bring you to

nvidia's

headquarters I think I'm bringing you to

Nvidia headquarters

what do you

think this

is this is where we work this is where

we work what an amazing year it was and

we have a lot of incredible things to

talk about and I just want you to know

that I'm up here without a net there are

no scripts there's no teleprompter and

I've got a lot of things to cover so

let's get started first of all I want to

thank all of the sponsors all the

amazing people who are part of this

conference just about every single

industry is

represented healthc care is here

Transportation

retail gosh the computer industry

everybody in the computer industry is

here and so it's really really terrific

to see all of you and thank you for

sponsoring it

GTC started with gForce it all started

with GeForce and

today I have here a GeForce

5090 and

5090 unbelievably 25 years

later 25 years after we started working

on GeForce GeForce is sold out all over

the world this is the 90 the Blackwell

generation and comparing it to the 490

look how it's 30% smaller in volume it's

30% better at dissipating energy and

incredible performance hard to even

compare and the reason for that is

because of artificial

intelligence GeForce brought Cuda to the

world Cuda enabled Ai and AI has now

come back to revolutionized computer

Graphics what you're looking at is

realtime computer Graphics 100% path

traced for every pixel that's rendered

artificial intelligence predicts the

other 15

think about this for a second for every

pixel that we mathematically rendered

artificial intelligence inferred the

other 15 and it has to do so with so

much Precision that the image looks

right and it's temporally accurate

meaning that from frame to frame to

frame going forward or backwards because

it's computer Graphics it has to stay

temporarily stable incredible artificial

intelligence has made extraordinary

progress it has only been 10 years now

we've been talking about AI for a little

longer than that but AI really came into

the world's Consciousness about a decade

ago started with perception AI computer

vision speech

recognition then generative AI the last

5 years we've largely focused on

generative AI teaching an AI how to

translate from one modality to another

another modality text to image image to

text text to

video amino acids to

proteins properties the chemicals all

kinds of different ways that we can use

AI to generate generate content

generative AI fundamentally changed how

Computing is done from a retrieval

Computing model we now have a generative

Computing model whereas almost

everything that we did in the past was

about creating content in advance

storing multiple versions of it and

fetching whatever version we think is

appropriate at the moment of use now ai

understands the context understands what

we're asking understands the meaning of

our request and generates what it knows

if it needs it'll retrieve information

augments its understanding and generate

answer for us rather than retrieving

data it now generates answers

fundamentally changed how Computing is

done every single layer of computing has

been

transformed the last several years the

last couple two three years major

breakthrough happened fundamental

advance in artificial intelligence we

call it a gentic AI a gentic AI

basically means that you have an AI that

has agency it can perceive and

understand the context of the

circumstance it can reason very

importantly can reason about how to

answer or how to solve a problem and it

can plan and action it can plan and take

action it can use tools because it now

understands multimodality information it

can go to a website and look at the

format of the website words and videos

maybe even play a

video learns from what it learns from

that website understands it and come

back and use that information use that

new knowledge to do its job agentic AI

at the foundation of agentic AI of

course something that's very new

reasoning and then of course the next

wave is already happening we're going to

talk a lot about that today robotics

which is been enabled by physical ai ai

that understands the physical world it

understands things like friction and

inertia cause and

effect object permanence when something

goes around the corner doesn't mean has

disappeared from this universe it's

still there just not seeable and so that

ability to understand the physical world

the three-dimensional world is what's

going to enable a new era of AI we

called physical Ai and it's going to

enable robotics each one of these phases

each one of these waves opens up New

Market opportunities for all of us it

brings more and new partners to GTC as a

result GTC is now

jam-packed the only way to hold more

people at GTC is we're going to have to

grow San Jose and and we're working on

it we got a lot of land to work with we

got to grow San

Jose so that we can make GTC I have just

just you know as I'm standing here I

wish all of you could see what I see and

we're we're in the middle of a stadium

um and La last year was the first year

back that we did this live and it was it

was like a rock concert and it was

described GTC is was described as the

Woodstock of AI and this year it's

described as the Super Bowl of

AI the only difference is everybody wins

at this Super Bowl everybody's a winner

and so every single year more people

come because AI is able to solve more

interesting problems for more Industries

and more companies and this year we're

going to talk a lot about agentic Ai and

physical AI at its

core what enables each wave and each

phase of AI three

fundamental matters are involved the

first is how do you solve the data

problem and the reason why that's

important is because AI is a datadriven

computer science approach it needs data

to learn from it needs digital

experience to learn from to gain its dig

to learn knowledge and to gain digital

experience how do you solve the data

problem the second is how do you solve

the training problem without human in

the loop the reason why human in the

loop is fundamentally challenging is

because we only have so much time and we

would like and AI to be able to learn at

super human rates at Super realtime

rates and to be able to learn at a scale

that no humans can keep up with and so

the second question is how do you train

the model and the

third is how do you

scale how do you create how do you find

an algorithm whereby the more resource

you provide whatever the resource is the

small smarter the AI

becomes the scaling law well this last

year this is where almost the entire

world got it

wrong the computation requirement the

scaling law of AI is more

resilient and in fact hyper

accelerated the amount of computation we

need at this point as a result of

agentic AI as a result of reasoning is

easily a hundred times more than we

thought we needed this time last year

and let's reason about why that's true

the first part is let's just go from

what the AI can do let me work

backwards agentic AI as I mentioned at

this Foundation is reasoning we now have

AIS that can reason which is

fundamentally about breaking a problem

down step by step

maybe it approaches a problem in a few

different ways and selects the best

answer maybe it solves the problems the

same problem in a variety of

ways and and sure it has the best the

same answer consistency checking or

maybe after it's done deriving the

answer it plugs it back into the

equation maybe a quadratic equation to

confirm that in fact that's the right

answer instead of just

one shot bluring it out remember two

years ago when we started working with

chat GPT a miracle as it

was many complicated questions and many

simple questions it simply can't get

right and it's understandably so it took

a one shot whatever it learned by

studying pre-trained data whatever it

saw from other experiences pre-trained

data it does a one shot blurps it out

like a saon now we have AIS that can

reason step by step by step using a

technology called Chain of Thought best

of n consistency checking a variety of

different path planning a a variety of

different techniques we now have AIS

that can reason break a problem down

reason step by step by step well you

could imagine as a result the number of

tokens we generate and the fundamental

technology of AI is still the same

generate the next token predict the next

token token it's just that the next

token now makes up step

one then the next token after that after

it generates step one that step one has

gone into the input of the AI again as

it generates step two and step three and

step four so instead of just generating

one token or one word after next it

generate a sequence of words that

represents a step of reasoning the

amount of tokens that's generated as a

result is substantially higher and I'll

show you in a second easily 100 times

more now 100 times more what does that

mean well it could generate a 100 times

more tokens and you can see that

happening as I explained previously

or the model is more complex it

generates 10 times more tokens and in

order for us to keep the model

responsive

interactive so that we don't lose our

patients waiting for it to think

we now have to compute 10 times faster

and so 10 times tokens 10 times faster

the amount of computation we have to do

is 10 100 times more easily and so

you're going to see this in the rest of

the presentation the amount of

computation we have to do for inference

it's dramatically higher than it used to

be well the question then becomes how do

we teach an AI how to do what I just

described how to execute this chain of

thigh well one method is you have to

teach the AI how to reason and as I

mentioned earlier in training there are

two fundamental problems we have to we

have to solve where does the data come

from where does the data come from and

how do we not have it be limited by

human in the loop there's only so much

data and so much human demonstration we

can perform and so this is the big

breakthrough in the last couple years

reinforcement learning verifiable result

basically reinforcement learning of an

AI as Ito as it attacks or tries to

engage solving a problem step by step

well we have many problems that have

been solved in the history of humanity

where we know the

answer we know the equation of a

quadratic equation how to solve that we

know how to solve a Pythagorean theorem

um the the rules of a right triangle we

know many many rules of math and

geometry and logic and science we have

puzzle games that we could give

it constraints constraint constrainted

um uh uh type of problems like Sudoko um

those kind of problems on and on and on

we have hundreds of these problem spaces

where we can generate millions of

different

examples and give the AI hundreds of T

hundreds of chances to solve it step

byep step by step as we use

reinforcement learning to reward it as

it does a better and better job so as a

result you take hundreds of different

topics millions of different examples

hundreds of different tries each one of

the tries generating tens of thousands

of tokens you put that all together

we're talking about trillions and

trillions of tokens in order to train

that model and now with reinforcement

learning we have the ability to generate

an enormous amount of tokens synthetic

data generation basically using a

robotic approach to teach an AI the

combination of these two things has put

an

enormous enormous challenge of computing

in front of the industry and you can see

that the industry is responding this is

what I'm about to show

you is Hopper

shipments of the top four csps

the the top four csps they're the the

ones with the public clouds uh Amazon

Azure gcp and oci the top four C top

four csps not the AI companies that's

not included not all the startups not

included not Enterprise not included a

whole bunch of things not included just

those four just to give you a sense of

comparing the peak year of Hopper and

the first year of Blackwell okay the

peak year of Hopper oper in the first

Weir of black well so you can kind of

see that in fact AI is going through an

inflection

point it has become more useful because

it's smarter it can reason it is more

used you can tell it's more used because

whenever you go to chat GPT these days

the the it seems like you have to wait

longer and longer and longer which is a

good thing it says a lot of people are

using it with great effect and the

amount of computation necessary to train

those models and to influence those

models has grown tremendously so in just

one

year and blackw is just started shipping

in just one year you could see the

incredible growth in AI

infrastructure well that's been

reflected in Computing across the board

we're now seeing and this is the purple

is the forecast of uh of an of analysts

uh about the the next uh the increase of

capital expense

of the world's data centers including

csps and Enterprise and so on um the

world's data centers uh through uh the

end through the end of the decade so

2030 um I've said before that I expect

data center buildout to reach a trillion

dollars and I am fairly certain we're

going to reach that very soon two

Dynamics is happening at the same time

the first Dynamic is

that the vast majority of that growth is

likely to be accelerated meaning we've

known for some time that general purpose

Computing is run out of course run its

course and that we need a new Computing

approach and the world is going through

a platform

shift from hand-coded software running

on general purpose computers to machine

learning software running on

accelerators and gpus this way of doing

computation is at this Point past this

Tipping Point and we are now seeing the

inflection point happening the

inflection happening in the world's data

center build outs so the first thing is

a transition in the way we do

Computing second is an increase in

recognition that the future of software

requires capital investment now this is

a very big

idea whereas in the past we wrote the

software and we ran it on computers

in the future the computer's going to

generate the tokens for the software and

so the computer has become a generator

of tokens not a retrieval of files from

retrieval based Computing to generative

based Computing from the old way of

doing data centers to a new way of

building these infrastructure and I call

them AI factories they're AI factories

because it has one job and one job only

generating these incredible tokens that

we then reconstitute into music into

words into videos

into Research into chemicals or proteins

we reconstitute it into all kinds of

information of different types so the

world is going through a transition in

not just the amount of data centers that

will be built but also how it's

built well everything in the data center

will be accelerated not all of its Ai

and I want to say a few words about this

you know this slide this slide this

slide is is uh genuinely my favorite and

the reason for that is because for all

of you who' been coming to GTC uh all of

these years you've been listening to me

talk about these libraries uh this whole

time this this is in fact what GTC is

all about this one slide and in fact a

long time ago 20 years ago this is the

only only slide we had one library after

another library after another the

library you can't just accelerate

software just as we needed an AI

framework in order to create AIS and we

accelerate the AI Frameworks you need

Frameworks for physics and biology and

multiphysics and you know all kinds of

different quantum physics you need all

kinds of libraries and Frameworks we

call them Cuda X libraries acceleration

Frameworks for each one of these fields

of science and so this first one is

incredible this is C cpai numeric uh

numpy is the number one most downloaded

python Library most used python library

in the world downloaded 400 million

times this last year uh kitho is

computate and Cai numeric is a um uh

zero change drop in acceleration for

numpy so if any of you are using numpy

out there uh give Cai numeric a try

you're going to love it a kitho a

computational phography Library over the

course of four years we've now taken the

entire process of processing lithography

computational lithography which is the

second Factory in a Fab there's the

factory that manufactures the Wafers and

then there's the factory that

manufactures the information to

manufacture the

Wafers every industry every company that

has factories will have two factories in

the future the factory for what they

build and the factory for the

mathematics the factory for the

AI Factory for cars Factory for AIS for

the cars Factory

for smart speakers and factories for AI

for the smart speakers and so kitho is

our computational theography tsmc

Samsung asml our partners synopsis

Mentor incredible support all over I

think that this is now at its Tipping

Point in in another 5 years time every

mask every single lithography will be

processed on Nvidia Cuda Ariel is our

library for 5G turning a GPU into a 5G

radio why not signal processing is

something we do incredibly well once we

do that we can layer on top of it ai ai

for Rand or what we call AI Rand the

next generation of of uh of uh Radio

Radio Networks uh will be will have ai

deeply inserted into it why is it that

we're limited by the limits of

information Theory um because there's

only so much information Spectrum we can

get not if we add AI to it uh Coop

numerical or mathematical optimization

almost every single industry uses this

when you plan uh seats and uh flights uh

inventory and

customers um uh workers and plants uh uh

drivers and Riders

uh so on so forth where we have multiple

constraints multiple constraints um a

whole bunch of variables and you're

optimizing

for

time uh profit uh quality of service um

usage of resource whatever it happens to

be Nvidia uses it for our supply chain

management uh kuop is an incredible

Library it takes What It Takes what

would take hours and hours and it turns

into into seconds the reason why that's

a big deal is so that we can now explore

much larger space we announced uh that

we are going to open source Coop the

almost everybody is using either guui uh

goobi or um IBM clex uh or FICO uh we're

working with all three of them the

industry is so excited we're about to

accelerate The Living Daylights out of

the industry uh pair bricks for uh Gene

sequencing and Gene uh analysis Moni the

world's leading Medical Imaging Library

Earth 2 multif physics for pre for uh

predicting in very high resolution uh

local weather uh C Quantum and Cuda Q

we're going to have our

first Quantum day here at GTC we're

working with just about everybody in the

ecosystem either helping them research

on Quantum architectures Quantum

algorithms or in building a uh classical

accelerated Quantum uh heterogeneous

architecture and so really exciting work

there uh coup equivariance and censor

for tensor contraction quantum chemistry

of course this stack is world famous

people think that there's one piece of

software called CA but in fact on top of

Cuda is a whole bunch of libraries

that's integrated into all different

parts of the ecosystem and software and

infrastructure in order to make AI

possible uh I've got a new one here to

announce today

uh qdss uh our Spar solvers really

important for

CAE this is one of the biggest things

that has happened in the last year

working with cadence and synopsis and

ansis and the so and um and and well all

all of the uh the systems companies

we've now made possible uh just about

every important Eda and CAE library to

be accelerated what's amazing is until

recently Nvidia has been using general

purpose

computers running software super slowly

to design accelerated computers for

everybody else and the reason for that

is because we never had that software

that body of software optimized for a

Cuda until recently and so now our

entire industry is going to get

supercharged as we move to accelerated

Computing uh CDF a data frame for

structure data we now have a drop in

acceleration for spark and drop in

acceleration for pandas incredible and

then we have warp a library for physics

that runs in p a python library for

physics for Cuda we have a big

announcement there I'll save it in just

a

second this is just a

sampling of the libraries that make

possible accelerated Computing it's not

just Cuda we're so proud of Cuda but if

not for Cuda and the fact that we have

such a large install base none of these

libraries would be useful for any of the

developers who use them for all the

developers that use

them you use it because one it's going

to give you incredible speed up it's

going to give you incredible scale

up and two because the install base of

Cuda is now everywhere it's in every

cloud it's in every data center it's

available from every computer company in

the world it's every literally

everywhere and therefore by using one of

these

libraries your software your amazing

software can reach everyone and so we've

now reached the Tipping Point of

accelerated Computing Cuda has made it

possible and all of you this is what GTC

is about the ecosystem all of you made

this possible and so we made a little

short video for you thank you

to the

creators the Pioneers the Builders of

the

future Cuda was made for

you since

2006 6 million developers in over 200

countries have used Cuda and transformed

Computing with over 900 Cuda X libraries

and AI models you're accelerating

science

reshaping

Industries and giving machines the power

to see learn and

reason now Nvidia Blackwell is 50,000

times faster than the first Cuda

GPU these orders of magnitude gains in

speed and scale are closing the gap

between

simulation and realtime digital twins

[Music]

and for you this is still just the

beginning we can't wait to see what you

do next

[Music]

I love what we do I love even more what

you do with it and one of the things

that that most touch me in in my 33

years doing

this one scientist said to me Jensen

because of the work because of your work

I can do my life's work in my

lifetime and boy if that doesn't if that

doesn't touch you well you got to be a

corpse so this is all about you guys

thank you all right so we're going to

talk about

AI but you know AI started in a cloud it

started in the cloud for a good reason

because it turns out that AI needs

infrastructure it's machine learning in

if the science say machine learning then

you need a machine to do the science and

so machine learning requires

infrastructure and the cloud data

centers had infrastructure they also

have extraordinary computer science

extraordinary research the perfect

circumstance for AI to take off in the

cloud and the csps but that's not where

AI is limited to AI will go everywhere

and we're going to talk about AI in a

lot different ways and the cloud service

providers of course they they they like

our Leading Edge technology they like

the fact that we have full stack because

accelerated Computing as you know as I

was explaining earlier is not about the

chip it's not even just the chip in the

library the programming model is the

chip the programming model and a whole

bunch of software that goes on top of it

that entire stack is incredibly complex

each one of those layers each one of

those libraries is essentially like SQL

SQL as you know is called in storage

Computing it was the big revolution of

computation by IBM SQL is one Library

just imagine I just showed you a whole

bunch of them and in the case of AI

there's a whole bunch more so the stack

is complicated they also love the fact

that csps love that Nvidia Cuda

developers are CSP customers because in

the final analysis they're building

infrastructure for the world to use and

so the rich developer ecosystem is

really valued and really really uh

deeply appreciated well now that we're

going to take AI out to the rest of the

world the rest of the world has

different system

configurations operating environment

differences domain specific Library

differences usage differences and so AI

as it translates to Enterprise it as it

translates to manufacturing as it

translates to robotics or self-driving

cars or

even companies that are starting GPU

clouds there's a whole bunch of

companies maybe 20 of them who started

during the Nvidia time and what they do

is just one thing they host gpus they

call themselves GPU clouds and one of

our one of our great Partners cor weave

is in the process of going public and

we're super proud of them

and so GPU clouds they have their own

requirements but one of the areas that

I'm super excited about is

Edge and today we announced we announced

today that Cisco Nvidia T-Mobile the

largest telecommunications company in

the world cus

ODC are going to build a full

stack for Radio Networks here in the

United States and and that's going to be

the second stack so that this current

stack this current stack we're

announcing today will put AI into the

edge remember a hundred billion dollar

of the

world's Capital Investments each year is

in the Radio Networks and all of the

data centers provisioning for

Communications in the future there is no

question my mind that's going to be

accelerated Computing infused with ai ai

will do a far far better job adapting

the radio signals the massive MOS to the

changing environments and the traffic

conditions of course it would of course

we would use reinforcement learning to

do that of course myo is essentially one

giant radio robot of course it is and so

we will of course provide for those

capabilities of course AI could

revolutionize Comm

Communications you know when I call home

you don't have to say but that that few

words because my wife knows where I work

what that condition's like conversation

Carries On from yesterday she kind of

remembers what I like don't like and

often times just a few words you

communicated a whole bunch the reason

for that is because of context and human

priors prior knowledge well combining

those capabilities could revolutionize

commun Communications look what it's

doing for video processing look look

what I just described earlier in 3D

graphics and so of course we're going to

do the same for Edge so I'm super

excited about the announcement that we

made today T-Mobile Cisco Nvidia cus ODC

are going to build a full

stack well AI is going to go into every

industry that's just

one one of the earliest Industries that

AI went into was autonomous vehicles The

Moment I Saw alexnet and we've been

working on computer vision for a long

time the moment I saw alexnet was such

an ex inspiring moment such an exciting

moment it caused us to decide to go all

in on building self-driving cars so

we've been working on self-driving cars

now for over a

decade we build technology that almost

every single self-driving car company

uses it could be either in the data

center for example Tesla uses Nvidia

lots of Nvidia gpus in the data center

it could be in the data center or the

car wayo and wave uses Nvidia computers

in data centers as well as the car it

could be just in the car it's very rare

but sometimes it's just in the car or

they use all of our software in addition

we work with the car industry however

the car industry would like us to work

with them we build all three computers

the training computer the simulation

computer and the robotics computer the

self-driving car computer all the

software stack that sits on top of it

models and

algorithms just as we do with all of the

other industries that I've demonstrated

and so

today I'm super excited to

announce that GM has selected Nvidia to

partner with them to build their future

self-driving car Fleet

the time for autonomous vehicles has

arrived and we're work looking forward

to building with GM AI in all three

areas AI for manufacturing so they could

revolutionize the way they manufacture

AI for Enterprise so they could

revolutionize the way they work design

cars and simulate cars and and then also

AI for in the car so AI infrastructure

for GM partnering with with GM and

building with GM their AI so I'm super

excited about that one of the areas that

I'm deeply proud of and it rarely gets

any

attention is safety Automotive Safety

it's called

halos in our companies called

Halos safety

requires technology from Silicon to

systems to system software the

algorithms

the

methodologies everything from diversity

to ensuring

diversity monitoring and

transparency

explainability all of these different

philosophies have to be deeply ingrained

into every single part of how you

develop the system and the software

we're the first company in the world I

believe to have every line of code

safety assessed 7 million lines of code

safety assessed our chip our system our

system software and our algorithms are

safety assessed by Third parties that

crawl through every line of code to

ensure that it is designed to ensure

diversity transparency and

explainability we also have followed

over a thousand patents and during this

GTC and I really encourage you to do so

is to go spend time in the Halos

Workshop so that you could see all of

the different things that comes together

to ensure that cars of the future are

going to be safe as well as autonomous

and so this is something I'm very proud

of it barely it rarely gets any

attention and so I I thought I would

spend the extra time this time to talk

about that okay Nvidia

halos all of you have seen cars drive by

themselves um the wayo robo taxis are

incredible but we made a video to share

with you some of the technology we

use to solve the problems of data and

training and diversity so that we could

use the magic of AI to go create AI

let's take a

look Nvidia is Accel ating AI

development for AVS with Omniverse and

[Music]

Cosmos Cosmos prediction and reasoning

capabilities support AI first AV systems

that are endtoend trainable with new

methods of development model

distillation closed loop training and

synthetic data generation first model

distillation adapted as a policy model

cosmos's driving knowledge transfers

from from a slower intelligent teacher

to a smaller faster student inferenced

in the

car the teacher's policy model

demonstrates the optimal trajectory

followed by the student model learning

through

iterations until it performs at nearly

the same level as the

teacher the distillation process

bootstraps a policy model but complex

scenarios require further

tuning closed loop training enables

fine-tuning of policy

models log data is turned into 3D scenes

for driving closed loop in physics based

simulation using Omniverse neural

reconstruction variations of these

scenes are created to test a model's

trajectory generation

[Music]

capabilities Cosmos Behavior evaluator

can then score the generated driving

behavior to measure model

performance newly generated scenario

and their evaluation create a large data

set for Clos Loop training helping AVS

navigate complex scenarios more

robustly last 3D synthetic data

generation enhances av's adaptability to

diverse

environments from log data Omniverse

builds detailed 4D driving environments

by fusing maps and images and generates

a digital twin of the real world

including segmentation to guide Cosmos

by classifying each pixel Cosmos then

scales the training data by generating

accurate and diverse scenarios closing

the Sim to real

Gap Omniverse and Cosmos enable AVS to

learn adapt and drive intelligently

advancing safer Mobility

[Music]

Nvidia is the perfect company to do

that

gosh that's our

destiny use AI to recreate AI the

technology that we showed you there uh

is very similar to the technology that

you're enjoying um uh to uh take you to

a digital twin we call Nvidia all right

let's talk about data centers

that's not bad

huh gausian Splats just in case gaan

Splats well let's talk about data

centers uh Blackwell is in full

production and this is what it looks

like it's an incredible incredible you

know for for people for us this is a

sight of beauty would you

agree this

is how how is this not beautiful how is

this not beautiful well this is a big

deal

because we made a fundamental transition

in computer architecture I just want you

to know that in fact I've shown you a

version of this uh about 3 years ago it

was called uh Grace Hopper and the

system was called

ranger the ranger system uh is about uh

maybe about half of the width of the

screen and it was the world's first MV

link

32 3 years ago we showed Ranger working

and it

was way too large

but it was exactly the right idea we

were trying to solve scale up

distributed computing is about using a

whole lot of different computers working

together to solve a very large problem

but there's no replacement for scaling

up before you scale out both are

important but you want to scale up first

before you scale out while scaling up is

incredibly hard there is no simple

answer for it you're not going to scale

SC it up you're not going to scale it

out like Hadoop take a whole bunch of

commodity uh computers hook it up into a

large Network and do in storage

Computing using Hadoop Hadoop was a

revolutionary idea as we know it enabled

hyperscale data centers to solve

problems of gigantic

sizes and uh off using offto shelf

computers however the problem we're

trying to solve is so complex that

scaling

in that way would have simply cost way

too much power way too much energy it

would have never deep learning would

have never happened and so the thing

that we had to do was scale up first

well this is the way we scaled up I'm

not going to lift this this is this is

70 lbs this is the the the last

generation system architecture is called

hgx this revolutionized Computing As We

Know It This revolutionize artificial

intelligence this is eight

gpus eight gpus each one of them is kind

of like this okay this this is two gpus

two Blackwell gpus in one black wall

package two black wall gpus in one black

black black wall package and um uh

they're eight of these underneath

this okay and this connects into what we

call MV link

8 this then connects to a CPU

shelf like that so there's dual CPUs and

that sits on top and we connect it over

PCI Express and then many of these get

connected with

infiniband which turns into uh what is

an AI supercomputer this is the way it

was in the past this is the way this is

how we started well this is as far as we

scaled up before we scaled out but we

wanted to scale up even further and I I

told you that Ranger took this system

and scaled it out scaled it up by

another factor of four and so we had MV

link 32 but the system was way too large

and so we had to do something quite

remarkable re re-engineer how MV length

worked and how scale up worked and so

the first thing that we did was we said

listen the mvlink switches are in this

system embedded on the

motherboard we need we need to

disaggregate the mvlink system and take

it out so this is the mvy link system

okay this is an mvy link

switch this is the most this is the

highest performance switch the world's

ever made and this makes it possible for

every GPU to talk to every GPU at

exactly the same time at full bandwidth

okay so this is the mvlink switch we

disaggregated it we took it out and we

put

it in the center of the

chassis so there's all the there 18 of

these switches in nine n different racks

nine different

switch trays we call them and then the

switches are disaggregated the compute

is now sitting in here this is

equivalent to these two things in

compute what's amazing is this is

completely liquid cooled and by liquid

cooling it we can

compress all of these compute nodes into

one rack this is the big change of the

entire industry all of you in the

audience I know how many of you are here

I want to thank thank you for making

this fundamental shift from integrated

MV link to disaggregated MV link from

air

cooled to liquid

cooled from 60,000

components per computer or so to 600,000

components per rack

120

kilow fully liquid cooled and as a

result we have a one Exel flops computer

in one rack isn't it

incredible so this is the compute node

this is the compute

node okay and that now fits in in one of

these now

we 3,000

lb 5,000

cables about 2 miles

worth just an incredible Electronics

600,000 Parts I think that's like 20 20

cars 20 cars worth of parts and

integrates into one supercomputer

well our goal is to do this our goal is

to do scale up and this is what it now

looks like we essentially wanted to

build this chip it's just that no

retical limits can do this no process

technology can do this it's 30 trillion

transistors 20 trillion of it is used

for computing so it's not like you you

could you can't reasonably build this

anytime soon and so the way to solve

this problem is to disaggregate it as I

described into the grace Blackwell

mvlink

72 rack but as a result we have done the

ultimate scale up this is the most

extreme scale up the world has ever done

the amount of computation that's

possible here the memory bandwidth 570

terabytes per second everything is

everything in this machine is now in

teas everything's a trillion and you

have uh an EXA flops which is a million

trillion floating Point operations per

second well the reason why we wanted to

do

this is to solve an extreme

problem and that extreme problem a lot

of people misunderstood to be easy and

in fact it is the ultimate extreme

Computing problem and it's called

inference and the reason for that is

very

simple

inference is token Generation by a

factory and a factory is revenue and

profit

generating or lack

of and so this Factory has to be built

with

extreme efficiency with Extreme

Performance because everything about

this Factory directly affects

your quality of service your revenues

and your profitability let me show you

how to read this chart because I want to

come back to this a few more times

basically you have two axes on the x

axis is the tokens per second whenever

you chat when you uh put a prompt into

chat GPT what comes out is tokens those

to tokens are reformulated into

words you know it's more than a token

per word okay and they'll tokenize

things like th could be used for the it

could be used for them it could be used

for Theory it could be used for

theatrics it could be used for all kinds

of okay and so th is a Tok an example of

a token they reformulate these tokens to

turn into words well we've already

established that if you want your AI to

be smarter you want to generate a whole

bunch of tokens those tokens are

reasoning tokens consistency checking

tokens coming up with a whole bunch of

ideas so they can select the best of

those ideas tokens and so those tokens

might they it might be second guessing

itself it might be is this the best work

you could do and so it ask it talks to

itself just like we talk to ourselves

and so the more tokens you generate the

smarter your AI

but if you take too long to answer a

question the customer is not going to

come back this is no different than WB

search there is a real limit to how long

it can take before comes back with a

smart answer and so you have these two

Dimensions that you're fighting against

you're trying to generate a whole bunch

of tokens but you're trying to do it as

quickly as possible Therefore your token

rate

matters so you want your tokens per

second for that one user to be as fast

as

possible

however in Computer Sciences in

factories there's a fundamental tension

between latency response time and

throughput and and the reason is very

simple if you're in the large high

volume business you batch up it's called

batching you batch up a lot of customer

demand and you

manufacture a certain version of it for

everybody to consume

later however from the moment that they

batched up and

manufacture whatever they did to the

time that you consumed

it could take a long

time so no different for computer

science no different than no to no

different for AI factories that are

generating tokens and so you have these

two fundamental tensions on the one hand

you would like the customer quality of

service to be as good as possible smart

AIS that are super fast on the other

hand you're trying to get your data

center to produce tokens for as many

people as possible so you can maximize

your

revenues the perfect answer is to the

upper right

ideally the shape of that curve is a

square that you could generate very fast

tokens per person up until the limits of

the factory but no Factory can do that

and so it's probably some curve and your

goal is to maximize the area under the

curve okay the product of X and Y and

the further you push out more likely it

means the better of a factory that

you're building well it turns out that

in tokens per second for the whole

Factory and tokens per second response

time one of them requires enormous

amount of computation flops and then the

other dimension requires an enormous

amount of bandwidth and flops and so

this is a very difficult problem to

solve the the good answer is that you

should have lots of flops and lots of

bandwidth and lots of memory and lots of

everything

that's the best answer to start which is

the reason why this is such a great

computer you start with the most flops

you can the most memory you can the most

bandwidth you can of course the best

architecture you can the most Energy

Efficiency you can and you have to have

a programming model that allows you to

run software across all of this insanely

hard so that you can do this now let's

just take a look at this one demo to

give you a tactical feeling of what I'm

talking about please play

it traditional llms capture foundational

knowledge while reasoning models help

solve complex problems with thinking

tokens here a prompt asks to seat people

around a wedding table while adhering to

constraints like traditions photogenic

angles and feuding family

members traditional llm answers quickly

with under 500 tokens it makes mistakes

in seating the guests while the

reasoning model thinks with over 8,000

tokens to come up with the correct

answer it takes a pastor to keep the

peace okay as

as as as all of you know as all of you

know if you have a wedding party of

300 and you're trying to find the

perfect well the optimal seating for

everyone

that's a problem that only AI can solve

or a mother-in-law can solve and

so that's one of those problems that

that Koop cannot

solve okay so what you see here is that

that uh we gave it a problem that

requires reasoning and you saw uh R1

goes off and it reasons about it tries

all these different scenarios and it

comes back and a tests his own answer it

asks it asks itself whether it did it

right meanwhile

the last generation language model does

a one shot so the one shot is 439 tokens

it was fast it was effective but it was

wrong so it was 439 wasted

tokens on the other hand in order for

you to reason about this problem and

this is just a that was actually a very

simple problem you know we just give it

a few more un few more difficult

variables and it becomes very difficult

to reason through and it took 8,000

almost 9,000 tokens and it took a lot

more computation because the model's

more complex okay so that's one

dimension before I show you some results

let me just show let me explain

something else so the answer if you look

at if you look at um blackw you look at

the the Blackwell system and it's now

this scaled up MV link 72 the first

thing that we have to do is we have to

take this model and this model is not

small it's you know in the case of R1

people think R1 is small but it's 680

billion parameters Next Generation

models could be trillions of

parameters and the way that you solve

that problem is you take these trillions

and trillions of parameters and this

model and you uh distribute the workload

across the whole system of gpus you can

use uh tensor parallel you can take one

layer of the model and and run it across

multiple gpus you you could take um a a

slice of the pipeline and call that

pipeline parallel and put that on

multiple gpus you could take different

experts and put it across different gpus

we call it expert parallel the con the

combination of pipeline parallelism and

tensor parallelism and expert

parallelism the number of combinations

is insane and depending on the model

depending on the workload depending on

the conf the circumstance how you

configure that computer has to

change so that you can get the maximum

throughput out of it you also sometimes

optimize for very low lenes sometimes

you try trying to optimize for

throughput and so you have to do some

inflight batching a lot of different

techniques for batching and and uh

aggregating work and so the the software

the operating system for these AI

factories is insanely complicated well

one of the

observations and this is this is a

really terrific terrific thing about

having a homogeneous architecture like

mvlink 72 is that every single GPU could

do all the things that I just described

and we

observe

that these reasoning models are doing a

couple phases of computing one of the

phases of computing is thinking when

you're thinking you're not producing a

lot of tokens you're producing tokens

that you're maybe consuming yourself

you're thinking maybe you're reading

you're digesting information that

information could be a PDF the

information could be a website you could

literally be watching a video ingesting

all of that at Super linear rates and

you take all of that information and you

then formulate the answer formulate a

planned answer and so that digestion of

information context processing is very

flops

intensive on the other hand during the

next phase is called decode so the first

part we call pre-fill the next phase of

decode requires floating Point

operations but it requires an enormous

amount amount of bandwidth and it's

fairly easy to calculate you know if you

have a model and it's a few trillion

parameters well it takes a few terabytes

per second notice I was mentioning 576

terabytes per second it takes terabytes

per second to just pull the mod model in

from hbm

memory and to generate literally one

token and the reason it generates one

token is because remember that these

large language models are predicting the

next token that's why they say the next

token it's not predicting every single

token it's predicting the next token now

we have all kinds of new techniques

speculative decoding and all kinds of

new techniques for doing that faster but

in the final analysis you're predicting

the next token okay and so that you

ingest pull in the entire model and the

context we call it a KV cache and then

we produce one token and then we take

that one token we put it back into our

brain we produce the next token

every single one every single time we do

that we take trillions of parameters in

we produce one token trillions of

parameters in produce another token

trillions of parameters in produce

another token and notice that demo we

produced

8,000 600 tokens so trillions of btes of

information trillions of btes of

information have been taken into our

gpus and produce one token at a time

which is fundamental ly the reason why

you want mvy link Envy link gives us the

ability to take all of those gpus and

turn them into one massive

GPU the ultimate scale up and the second

thing is that now that everything is on

mvy link I can

disaggregate the prefill from the decode

and I could decide I want to use more

gpus for

prefill less for decode because I'm

thinking a lot

I'm doing it's agentic I'm reading a lot

of information I'm doing deep research

notice during deep

research you know and and earlier I was

listening to Michael and Michael was

talking about his his him doing research

and I do the same thing and we go off

and we write these really long research

projects for our Ai and I love doing

that because you know I already paid for

it and I just love making our gpus work

and nothing gives me more joy so so so I

I write up and then it goes off and it

does all this research and it went off

to like 94 different websites and I read

all this and I'm reading all this

information and it formulates an answer

and writes the report it's incredible

okay during that entire time prefill is

super busy and it's not really

generating that many tokens on the other

hand when you're chatting with the

chatbot and millions of us are doing the

same thing it is very token gener

generation heavy it's very decode heavy

okay and so um depending on the workload

we might decide to put more gpus in the

decode dep depending on the workload put

more gpus into prefill well this Dynamic

operation is really complic complicated

so I've just now described pipeline

pipeline parallel tensor parallel um

expert parallel pre uh inflight batching

disaggregated inferencing workload

management and then I've got to take

this thing called the KV cache I got to

Route it to the right GPU I've got to

manage it through all the memory

hierarchies that piece of software is

insanely complicated and so today we're

announcing the Nvidia

Dynamo Nvidia Dynamo does all that it is

essentially the operating system of an

AI

Factory whereas in the past in the way

that we ran data centers our operating

system would be something like VMware

and we would orchestrate and we still do

um you know we're big user orchestrate a

whole bunch of different Enterprise

applications running on top of our

Enterprise

it but in the future the application is

not Enterprise it it's agents and the

operating system is not something like

VMware it's something like Dynamo and

this operating system is running on top

of not a data center but on top of an AI

Factory now we call it Dynamo for a good

reason as you know the Dynamo was the

first instrument that started the last

Industrial Revolution the industrial

revolution of energy water comes in

electricity comes out it's pretty

fantastic you know water comes in you

light it on fire turn to the Steam and

it what comes out this this invisible

thing that's incredibly valuable it took

another 80 years to go to alternate new

current but Dynamo Dynamo is the where

it all started okay so we decided to

call this operating system this piece of

software insanely complicated software

the Nvidia Dynamo it's open source it's

open source and we're so happy that so

many of our partners are working with us

on it and one of one of my favorite

favorite Partners I just love them so

much because the Revolutionary work that

they do and also because Aran is such a

great guy but perplexity as a great

partner of ours in working through this

okay so anyhow uh really really

great okay so now we're going to have to

wait until we scale up all these

infrastructure but in the meantime we've

done a whole bunch of very indepth

simulation we have supercomputers doing

simulation of our supercomputers which

makes sense and and I'm now going to

show you the

benefit of everything that I've just

said and remember the factory diagram on

the x-axis on the xaxis is tokens per

second throughput excuse me in the y

axis tokens per second throughput of the

factory and the x-axis tokens per second

of the user experience and you want

super smart AIS and you want to produce

a whole bunch of it this is

Hopper okay so this is Hopper and it can

produce it can

produce for one user about for each user

about a 100 tokens per second 100 this

is eight gpus and it's connected with

infin band and the um um I'm normalizing

it to tokens per second per

megawatt so it's a one megawatt data

center which is not a very large AI

Factory but anyhow one megaw okay and so

it can produce for each user 100 tokens

per second and it can produce at this at

this level whatever that happens to be

100,000 tokens per second for that one

megawatt data center or it can produce

about 2 and half million tokens per

second 2 and a half million tokens per

second for that AI Factory if it was

super batched up and the customer is

willing to wait a very long time okay

does that make sense all right so

nod all right because this is this is

where you know every GTC there's there's

the price for entry you guys know and

it's like you get tortured with math

okay this is the only

only only at Nvidia do you get tortured

with math all right so Hopper you get

two and a half now what's that two and a

half million what's it what's how do you

translate that 2 and a half million

remember um Chad gbt is like $10 per

million

tokens right $10 per million tokens

Let's Pretend for for a second that that

that's I I I think the 10 million $10

per million tokens is probably down here

okay I I probably say it's down here but

let me pretend it's up there because 25

million um 10 so 25 million doll per

second does that make sense that's

that's how you think through it or on

the other hand if it's way down here

then the question is you know so it's

100,000 100,000 just divide that by 10

okay $250,000

per Factory per second and then as it

was 31 million 30 million seconds in a

year and that translates into revenues

for that 1 million that one megawatt

Data Center and so that's your goal on

the one hand you would like your your

token rate to be as fast as possible so

that you can make really smart AIS and

if you have Smart AIS people pay you

more money for it on the other hand the

smarter the AI the less you can make in

volume

very sensible

tradeoff and this is the curve we're

trying to bend now what I'm just showing

you right now is the fastest computer in

the world Hopper it's the computer that

revolutionized everything and so how do

we make that better so the first thing

that we do is we come up with Blackwell

with MV link

8 same same Blackwell that one same one

same compute and that one compute node

with MV link8 using fp8 and so black is

just

faster faster bigger more transistors

more everything but we like to do more

than that and so we introduce a new

Precision it's not quite as simple as

4bit floating point but using 4-bit

floating point we can quantize the model

use less

energy use less energy to do the same

and as a result when you use less energy

to do the same you could do more because

remember one big idea is that every

single data center in the future will be

power limited your revenues are power

limited you could figure out what your

revenues are going to be based on the

power you have to work

with this is no different than you know

like many other Industries and so we are

now a power limited industry our

revenues will associate with that well

based on that you want to make sure you

have the most energy efficient compute

architecture you can possibly get the

next

then we scale up with MV link 72 does

that make sense look at the difference

between that MV link 72 fp4 and then

because our architecture is so tightly

integrated and now we add Dynamo to it

Dynamo can

extend that even further are you

following me so Dynamo also helps Hopper

but Dynamo helps black wall incredibly

now yep

only at GTC do you get an Applause for

that and and

so so now notice what I put those two

shiny parts that's kind of where your

max Q

is you know that's likely where you'll

run your factory operations you're

trying to find that balance between

maximum throughput and maximum quality

of AI smartest AI the most of it those

two that XY intercept is really what

you're optimizing for and that's what it

looks like if you look underneath those

two squares Blackwell is way way better

than Hopper and remember this is not ISO

chips this is ISO

power this is Ultimate Mo's law this is

what Moors law was always about in the

past and now here we are

25x in one generation as isop

power there is not ISO chips it's not

ISO transistors it's not ISO anything

ISO power the ultimate the ultimate

limiter there's only so much energy we

can get into a Data Center and So within

ISO power black well is 25 times now

here's the that

rainbow that's incredible that's the fun

part look all the different config every

underneath the Paro the frontier Paro we

call it the frontier Paro under under

the Frontier paredo are millions of

points we could have configured the data

center to

do we could have paralized and split the

work and sharded the work in a whole lot

of different

ways and we found the most optimal

answer which is the Paro the frontier

Paro okay the Paro Frontier and each one

of them because of the color shows you

it's a different

configuration which is the reason why

this image says very very clearly you

want a programmable architecture that is

as homogeneously fungible as fungible as

possible because the workload changes so

dramatically across the entire Frontier

and look we got on the top expert

parallel 8 batch of 3000 disaggregation

off Dynamo off in the middle expert

parallel 64 with with uh

uh oh the P the 26%

of 26% is used for context so so Dynamo

is turned on 26% context the other 64%

is 74% is not batch of 64 and expert

parallel of 64 on one expert parallel

four on the other and then down here all

the way to the bottom you got you got

tensor parallel 16 with expert parallel

4 batch of two 1% context the

configuration of the computer is changed

ing across that entire spectrum and then

this is what happen so this is with

input sequence length this is a kind of

a commodity test case this this is a

test case that you can Benchmark

relatively easily um the input is 1,000

tokens the output is 2,000 notice

earlier we just showed you a demo where

the output is very simply 9,000 right

8,000 okay and so obviously this is not

representative of just that one chat now

this one is more representative

and this is what you know the goal is to

build these next Generation computers

for Next Generation workloads and so

here's an example of a reasoning model

and in a reasoning model Blackwell is 40

times 40 times the performance of Hopper

straight

up pretty

amazing you know I've said before

somebody actually asked you know why

would I say that but I said before that

when Blackwell starts shipping in volume

you couldn't give Hoppers

away and this is what I mean and this

makes sense if anybody if you're still

looking to buy a hopper don't be afraid

I'm it's okay

but I'm the chief re

Revenue

Destroyer my sales guys are going oh

no don't say that

there are circumstances where Hopper is

fine that's the best thing I could say

about Hopper there are circumstances

where you're

fine not

many if I have to take a

swing and so that's kind of my point um

when the technology is moving this fast

uh you you and because the workload is

so intense and you're building these

things they are factories you we really

we really like you to to um uh uh to

invest in the right right versions okay

just to put it in perspective this is

what 100 megawatt Factory looks like

this's 100 megawatt Factory you have

based on Hoppers you have 45,000 dies

1,400 racks and it produces 300 million

tokens per second okay and then this is

what it looks like with

Blackwell you have 8 yeah I know

[Applause]

that doesn't make any

sense okay so so we're not trying to

sell you less okay our sales guys are

going Jensen you're selling them less

this is better okay and so so anyways um

the more you buy the more you save

it's even better than that now the more

you buy the more you make you know and

so so anyhow uh remember everything is

in the context everything now in the

context of AI factories and and although

we talk about the chips you always start

from scale up we talk about the chips

but you always start from scale up the

full scale up what can you scale up to

the to the maximum um I want to show you

now what an Factory looks like but AI

factories are so complicated I just gave

you an example of one rack it has

600,000

Parts you know it's 3,000 lb now you've

got to take that and connect it with a

whole bunch of others and so we are

starting to build what we call the

digital twin of every data center before

you build a data center you have to

build a digital twin let's take a look

at this this is just incredibly

beautiful

the world is racing to build

state-of-the-art large scale AI

factories bringing up an AI gigafactory

is an extraordinary feat of engineering

requiring tens of thousands of workers

from suppliers Architects contractors

and Engineers to build ship and assemble

nearly 5 billion components and over

200,000 Mi of fiber nearly the distance

from the Earth to to the moon the Nvidia

Omniverse blueprint for AI Factory

digital twins enables us to design and

optimize these AI factories long before

physical construction Starts Here Nvidia

Engineers use the blueprint to plan a 1

gwatt AI Factory integrating 3D and

layout data of the latest Nvidia dgx

superpods an advanced power and cooling

systems from verv and Schneider

Electric and optimize topology from

Nvidia air a framework for simulating

Network logic layout and protocols this

work is traditionally done in silos the

Omniverse blueprint lets our engineering

teams work in parallel and

collaboratively letting us explore

various configurations to maximizing TCO

and power usage

Effectiveness Nvidia uses Cadence

reality digital twin accelerated by Cuda

and Omniverse libraries to simulate air

and liquid cooling

systems and Schneider Electric with EAP

an application to simulate power block

efficiency and

reliability realtime simulation lets us

iterate and run large scale whatif

scenarios in seconds versus hours we use

the digital twin to communicate

instructions to the large body of teams

and suppliers reducing execution errors

and accelerating time to bring up and

when planning for retrofits or upgrades

we can easily test and simulate cost and

down time ensuring a futureproof AI

Factory this is the first time anybody

who builds data oh that's so

beautiful all right I got a race here

because I'm turns out I got a lot to

tell you and and so if I if I go a

little too fast it's not because I don't

care about you it's just I got a lot of

information to go through all right so

so uh first uh our road map uh we're at

we're now in full production of blackw

uh computer companies all over the world

are ramping these incredible machines at

scale and uh

uh I'm just so so uh pleased and and so

grateful that all of you worked hard on

uh transitioning into this new

architecture and now uh in the second

half of this year we will uh easily

transition into the upgrade so we have

the Blackwell Ultra mbink 72 uh you know

it's a one and a half times more flabs

it's you know it's got a new instruction

for attention it's one and a half times

more memory all that memory is useful

for uh things like KV cache it's you

know two times more bandwidth okay for

networking bandwidth and so you're going

to now that we have the same

architecture we'll just kind of

gracefully uh glide into that and that's

called Blackwell Ultra okay so that's

coming second half of this year now

there's a reason why we we uh this is

the only product announcement in any

company where everybody's going yeah

next and in fact that's exactly the

response I was hoping to get and and

here's

why look we're building AI factories and

AI infrastructure it's going to take

years of planning this isn't this isn't

like buying a laptop

you know this isn't a this isn't

discretionary spend this is spend that

we have to go plan on and so we have to

plan on having of course the land and

the power and and we have to get get our

our capex ready and we get engineering

teams and and we have to lay it out a

couple two three years in advance which

is the reason why I show you our road

map a couple two three years in advance

so that you we don't surprise you in May

you know hi you know in another month

we're going to go to this incredible new

system I'll show you an example in a

second and so we planned this out in

multiple years the next the next click

one year

out is named after an astronomer and her

uh her grandkids are here her name is

Vera Ruben she discovered Dark Matter

okay it's

yep Vera Vera ruin is incredible because

the CPU is new it's twice the

performance of great had more memory

more bandwidth and yet just a little

tiny 50 wat CPUs is really quite

incredible okay and Reuben brand new

GPU CX9 brand new networking smart Nick

MV link 6 brand new MV link brand new

memories hbm 4 basically everything is

brand new except for the chassis and

this way we could take a whole lot of

risk in One Direction

and not risk a whole bunch of other

things related to the infrastructure and

so Vera ruin mvlink 144 is the second

half of next year now one one of the

things that I made a mistake on and so I

just need you to make this pivot we're

going to do this one

time Blackwell is really two gpus in one

Blackwell chip we call that one chip a

GPU and that was wrong and the reason

for that is it it screws up all the MV

link nomenclature and things like that

so going forward without going back to

Blackwell to fix it going forward when I

say mvlink 144 it just means that it's

connected to 144 gpus and each one of

those gpus is a GPU die and it could be

assembled in some package how it's

assembled could change from time to time

okay and so each GPU dies the GPU each

MV link is connected to the to uh to the

GPU and so very Ruben link 144 and then

this now sets the

stage for the second half of the year

the following year we call Reuben

Ultra okay so ver Ruben

Ultra I

know this one that's where you should

you

go all right so so this is ver Ruben

Reuben Ultra second half of 27 it's MV

link 5 76 extreme scale

up each rack is 600

KW 25 million

Parts okay and obviously a whole lot of

gpus and uh everything is X factored

more so 14 times more uh more flops 15

exif flops instead of one xof flop as

you me as I mentioned earlier is now 15

exop flops scaled up exop flops okay and

and it's

300 what 4.6 pedy so

4,600 terabytes per second scale up

bandwidth I don't mean aggregate I mean

scale up bandwidth and of course lots a

brand new MV link switch and CX9 okay

and so notice um uh 16 sites four gpus

in one

package extremely large MV link I just

put that in perspective this is what it

looks like okay now this is this is this

is this is going to be fun so this you

are just literally ramping up Grace

black wall at the moment and I I don't

mean to make it look like a laptop but

here we go okay so this is what Grace

black wall looks like and this is what

Reuben looks

like ISO ISO Dimension and so this is

another way of saying before you scale

out you have to scale up does that make

sense before you scale up scale out you

scale up and then after that you scale

scale out with amazing technology that

I'll show you in just a second all right

so first you scale up and then now that

gives you a sense of the pace at which

we're moving this is the amount of scale

up flops this is scale up

flops Hopper is 1x blackw is 68 x Reuben

is 900x scale up flops and then if I

turn it into essentially your TCO which

is power on top power per and the

underneath is the is the area underneath

the curve that I was talking to you

about the square underneath the curve

which is basically flops times bandwidth

okay so the the way you think about a

very easy gut feel gut check on whether

your AI factories are making progress

is Watts divided by those numbers and

you can see that Reuben it's going to

drop the cost down tremendously okay so

that's very quickly nvidia's road

map once a year once a year like like

clock ticks once a year okay how do we

scale up well we introduced we were pre

preparing to scale out that was scale up

as MV link our scale out network is

infiniband and Spectrum X

most were quite surprised that we came

into the ethernet world and the reason

why we decided to do ethernet is if we

could help ethernet

become like infiniband have the

qualities of infiniband then the network

itself would be a lot easier for

everybody to use and manage and so we uh

decided to invest in Spectrum we call it

Spectrum X and we brought to it the

properties of of uh congestion control

and and um uh very low latency and uh

and amount of software that's part of

our Computing Fabric and as a result we

made spectrx incredibly high performance

uh we scaled up the largest single GPU

cluster ever as one giant cluster with

Spectrum X right and that was Colossus

and so there are many other examples of

it spectrx is is unquestionably a huge

home run for us one of the areas that

I'm very excited about is Spectrum X is

not just for AI clouds but Spectrum X

also makes it possible for us to help

help every Enterprise become an AI

company and so uh was it last week or

the week before uh Chuck rubbins and

Cisco and Nvidia announced a partnership

for Cisco the world's largest Enterprise

networking company to take Spectrum X

and integrated into their product line

so that they could help the world's

Enterprises become AI

companies we're at 100,000

um with cx8 CX7 now cx8 is coming cx-9's

coming and during Ruben's time frame we

would like to scale out the number of

gpus to many hundreds of

thousands now the challenge with scaling

out gpus to many hundreds of thousands

is the

connection of the scale

out on the connection on scale up is

copper we should use copper as far as we

can and that's you know call it a meter

or two and that's incredibly good

connectivity very low very high

reliability very good Energy Efficiency

very low cost and so we use copper as

much as we can on scale up but on scale

out where the data centers are now the

size of the stadium we're going to need

something U much uh longdistance running

and this is where silicon photonics

comes in the challenge of silicon

photonics has been that the transceivers

consume a lot of energy to go from

electrical

to photonic has to go through a CIS go

through a transceiver and a ctis a

several CIS okay so first of all we're

announcing nvidia's first co- packaged

option silicon photonic system it is the

world's first

1.6 terabit per second CPO it is based

on a technology called micro ring

resonator modulator it is completely

built with this incred aible process

technology at tsmc that we've been

working with for some time and and we

partnered with just a giant ecosystem of

Technology providers to invent what I'm

about to show you this is really crazy

technology crazy crazy technology now

the reason why we decided to invest in

mrm is so that we could prepare

ourselves using mrm's incredible density

and power better density and power

compared to moander which is used for

telecommunications when you when you um

uh drive from one data center to another

data center uh in telecommunications or

even in the transceivers that we use we

use Mo Xander because the density

requirement is not very high until now

and so if you look at look at um these

transceivers this is an example of a

transceiver they did a very good job

tangling this up for me

oh

wow thank

[Music]

you oh Mother of

God okay this is where you got to turn

reasoning

on it's not as easy as you think these

are squirly little things all right so

this this one right here this is 30

Watts just so keep you remember this 30

watts and and if you get it on if you

buy in high volume it's

$1,000 this is a plug on this side on

this side it's electrical on this side

is is Optical okay so Optics come in

through the the yellow you plug this

into a switch it's electrical on this

side there's uh transceivers lasers um

uh and a tech technology called moander

and uh uh incredible and so we use this

to go from the GPU to the

switch to the next

switch and then the next switch down and

then next switch down to the GPU for

example and so each one of these if we

had 100,000

gpus we would have

100,000 of this side and then another

you know 100,000 which connects the the

switch to the switch and then on the

other side I attribute that to the other

to the other Nick if we had

250,000 we'll add another layer of

switches and so each GPU every GPU

250,000 every GPU would have six

transceivers every GPU would have six of

these plugs and these six plugs would

add 180 watts per

GPU 180 watts per GPU and $6,000 per GPU

okay and so the question is how do we

scale up now to millions of

gpus because if we had a million gpus

multiply by

six right it would be uh million 6

million

transceivers times 30

Watts 180 megaw

of transceivers they didn't do any math

they just move signals around and and so

the question is how do we how could we

afford and as I mentioned earlier energy

is our most important commodity

everything is related ultimate to energy

so this is going to limit our revenues

the our customers

revenues by subtracting out 180 megaw of

power and so this is the this is the

amazing thing that we did we invented

the world's first

mrm micro mirror and this is what it

looks like there's a little uh wave

guide you see that on that wave guide

goes to a ring that ring resonates and

it controls the amount of reflectivity

of the wave guide as it goes around and

limits and modulates the uh energy that

the amount of light that goes through

and it shuts It Off by absorbing it or

pass it on okay turns the light this

direct continuous laser beam into ones

and zeros and that's the miracle and

that technology is then uh the photonic

IC is stacked with the electronic IC

which is then stacked with a whole bunch

of micro lenses which is stacked with

this thing called fiber array these

things are all manufactured using this

technology at tsmc called they call it

Coupe and um package using a 3D Coos

technology working with all of these

technology providers a whole bunch of

them the names I just showed you earlier

and it turns it into to this incredible

machine so let's take a look at the

video of it

[Applause]

[Music]

[Music]

[Music]

[Music]

just a technology

Marvel and they turn into these switches

our infin band switch the Silicon is is

working fantastically second half of

this year we will ship the the Silicon

platonic switch in the second half of

this year and the second half of next

year we'll ship the Spectrum X because

of the mrm choice because of the ible

technology risks that over the last 5

years that we did and filed hundreds of

patents and we've licensed it to our

partners so that we can all build them

now we're in a position to put silicon

photonics with co- package

options no

transceivers fiber direct fiber in into

our switches with a Radix of 512 this is

the this is the 512 ports this would

just simply not possible any other way

and so this is this now set our set us

up to be able to scale up to these multi

100,000 gpus and multi-million gpus and

the benefit just so you you you you

imagine this it's incredible in a data

center we could we could save tens of

megawatts tens of megawatts let's say 10

megawatt well let's let's say 60

megawatt 60 well 6 megawatts is 10

Reuben Ultra

racks 6 megaw is 10 Reuben Ultra

racks right and 60 that's a lot 100

Reuben Ultra racks of power that we can

now deploy into Rubin all right so this

is our road map once a year once a year

an architecture every every uh two years

a new product line every single year X

factors up and we try to take silicon

risk or networking risk or system

chassis risk um in in pieces so that we

can move the industry forward as we

pursue these incredible technology uh

Vera Rubin and uh I really appreciate

the the uh the grandkids for being here

uh this is our opportunity to recognize

her and and to honor her for the

incredible work that she did our next

generation will be named after Fineman

okay nvidia's road map let me talk to

you about

Enterprise Computing this is really

important in order for us to bring AI to

the world's

Enterprise first we have to go to a

different part of

Nvidia the beauty of gaan

Splats okay in order in order for us to

take AI to Enterprise take a step back

for a second and remind yourself this

remember Ai and machine learning has

reinvented the entire Computing stack

the processor is different the operating

system is different the applications on

top are different the way the

applications are different the way you

orchestrate it are different and the way

you run them are different let me give

you one

example um the way you access data will

be fundamentally different than the past

instead of retrieving precisely the data

that you want and you read it to try to

understand it in the future we will do

what we do with perplexity instead of

doing doing retrieval that way I'll just

ask perplexity what I want ask it a

question and it will tell you the answer

this is the way Enterprise it will work

in the future as well we'll have ai

agents which are part of our digital

Workforce there's a billion knowledge

workers in the world they're probably

going to be 10 billion digital workers

working with us side by side 100 % of

software engineers in the future there

are 30 million of them around the world

100% of them are going to be AI assisted

I'm certain of that 100% of Nvidia

software Engineers will be AI assisted

by the end of this year and so AI agents

will be everywhere how they run what the

what Enterprises run and how we run it

will be fundamentally different and so

we need a new line of

computers and this is what started it

all this is the Nvidia

djx1 20 CPU cores

128 gigb of GPU

memory one pedop flops of

computation

$150,000

3,500

Watts let me now introduce you to the

new dgx

this is nvidia's new

dgx

and we call it djx

spark djx

spark now you'll be

surprised 20 CPU cores

we partnered with mediatech to build

this for us they did a fantastic job

it's been a great joy working with Ricki

and the mediate Tech Team I really

appreciate their their partnership built

us a chipto chip MV link CPU to GPU and

now the GPU has 128

gbt and this is fun one pedop

flops so this this is this is like the

original

djx1 with pin

particles you would have thought that

that's a joke that would land at

GTC okay well here's 30 million there

are 30 million uh software engineers in

the world and you know tens 10 20

million data scientists and this is this

is now this is clearly the gear of

choice thank you

Jan look at this in every bag this is

what you should

find

right this is this is the development

platform of every software engineer in

the world if you

have a family member spouse somebody you

care

about who who uh who's a software

engineer or AI researcher or or you know

just data scientist and you would like

to give them you know what the perfect

Christmas

present tell me tell me this isn't what

they want huh and so ladies and

gentlemen today uh we'll let you res we

will ship we will Reserve we will

reserve the first djx Sparks for the

attendees of GTC so go reserve yours

you already have one of these so now you

just got to get one of

these all right the next so that's thank

you Janine the next one is also a brand

new computer one that the world's never

had before so we're we're announcing a

whole new line of computers this is a

new personal computer new personal

workstation I know it's crazy check this

out Grace Blackwell

liquid

[Music]

cooled this is what a PC should look

like 20 pedop

flops unbelievable 72 CPU

cores chipto chip interface hbm memory

and just just in case some PCI Express

slots for your uh gForce

okay so so this is called djx station

djx spark and djx station are going to

be available by all of the OEM HP Dell

Lenovo assus uh it's going to be

manufactured uh for data scientists and

researchers all over the world this is

the computer of the age of

AI this is what computers should look

like and this is what computers will run

in the future and we have a whole lineup

for Enterprise now from little tiny one

to to workstation ones the server ones

to uh supercomputer ones and these will

be available uh by all of our partners

we will

also revolutionize the rest of the

Computing stack remember Computing has

three pillars there's Computing you're

looking at it there's networking as I

mentioned earlier Spectrum X going to

the world's

Enterprise an AI Network and the third

is Storage storage has to be completely

reinvented

rather than a retrieval

based storage system is going to be a

semantics based retrieval system a

semantics spased storage system and so

the storage system has to be

continuously embedding information in

the background taking raw data embedding

it into knowledge and then later when

you access it you don't retrieve it you

just talk to it you ask it questions you

give it problems and one of the one of

the examples I wish we had a video of it

um but Aaron at box even put one up in

the cloud worked with us to put it up in

the cloud and it's basically you know a

super smart storage system and in the

future you're going to have something

like that in every single Enterprise

that is the Enterprise storage of the

future and we're working with the entire

storage industry really fantastic

Partners uh ddn and Dell and HP

Enterprise and Hitachi and IBM and net

app and neutronics and Pure Storage and

vast and W basically the the entire

world storage industry will be offering

this this stack for the very first time

your storage system will be GPU

accelerated and so somebody thought I

was I didn't have enough

slides and so Michael thought I didn't

have enough slides so he he said Jensen

just in case you don't have enough

slides can I just put this in there and

so this is Michael slides but but this

is this he sent this to me he goes just

in case you don't have any slides and I

I got too many slides but this is such a

great slide and and let me tell you why

in one single slide he's explaining that

Dell is going to be offering a whole

line of

Nvidia Enterprise it AI infrastructure

systems and and all the software that

runs on top of it okay so you can see

that we're in the process of

revolutionizing the world's Enterprise

we're also announcing today this

incredible model that everybody can run

and so I showed you earlier R1 a

reasoning model I showed you versus

llama 3 a non- reasoning model

and obviously R1 is much smarter um but

we can do it even better than that and

we can make it possible to be Enterprise

ready for any company and it's now

completely open source it's part of our

system we call Nims and you can download

it you can run it anywhere you can run

it on djx spark you you can run it on

dgx station you can run on any of the

servers that the the oems make you can

run it in the cloud you can integrate

into any of your agentic AI Frameworks

and we're working with companies all

over the world and I'm going to flip

through these so watch very carefully

I've got some great Partners in the

audience want to recognize Accenture

Julie SED and her team are building

their AI Factory and their AI framework

uh AMD dos the world's largest

telecommunication software company uh

AT&T John Stanky and his team uh

building an AT&T AI system agentic

system Larry think and uh Black Rock

team building theirs uh Annie Roode uh

in the future not only will we hire ASC

designers we're going to hire a whole

bunch of digital ASC designers from

anude Cadence that will help us design

our chips and so Cadence is building

their uh AI framework and as you can see

in every single one of them there Nvidia

models Nvidia Nims and viia libraries

integrated throughout so that you can

run it on Prem in the cloud any Cloud uh

Capital One one of the most advanced

financial services companies and using

technology has Nvidia all over it uh

deoe Jason and his team Ian y Janet and

his team NASDAQ and Adena and her team

uh integrating Nvidia technology into

their AI Frameworks and then Chris Jen

and his team at sap uh bill mcder and

his team at service

now that was pretty good huh

first this is one of those Keynotes

where the first slide took 30 minutes

and then all the other slide took 30

minutes all right so so next let's go

somewhere else let's go talk about

robotics shall

[Music]

we let's talk about

robots well the time has come the time

have has come for

robots uh robots have the benefit the

benefit of being able to interact with

the physical world and do things that

otherwise digital information cannot we

know very clearly that the world is has

severe shortage of of human labors human

workers by the end of this decade the

world is going to be at least 50 million

workers short we'd be more than

delighted to pay them each $50,000 to

come to work we're probably going to

have to pay robots $50,000 a year to

come to work and so this is going to be

a very very large industry there are all

kinds of robotic systems your

infrastructure will be robotic billions

of cameras and warehouses and factories

10 20 million factories around the world

every car is already a robot as I

mentioned earlier and then now we're

building General robots let me show you

how we're doing

[Music]

that everything that moves will be

autonomous physical AI will embody

robots of every kind in every industry

three computers built by Nvidia enable a

continuous loop of robot AI simulation

training testing and Real World

Experience training robots requires huge

volumes of data Internet scale data

provides common sense and reasoning but

robots need action and control data

which is expensive to

capture with blueprints built on Nvidia

Omniverse and Cosmos developers can

generate massive amounts of diverse

synthetic data for training robot

policies first in Omniverse developers

aggregate real world sensor or

demonstration data according to their

different domains robots and tasks then

use Omniverse to condition Cosmos

multiplying the original captures into

large volumes of photoreal diverse

data developers use Isaac lab to

post-rain the robot policy IES with the

augmented data set and let the robots

learn new skills by cloning behaviors

through imitation learning or through

trial and error with reinforcement

learning AI

feedback practicing in a lab is

different than the real

world new policies need to be field

tested developers use Omniverse for

software and Hardware in the loop

testing simulating the policies in a

digital twin with real world

environmental Dynamics with domain

randomization physics feedback and High

Fidelity sensor

simulation real world operations require

multiple robots to work together Mega

and Omniverse blueprint lets developers

test fleets of post-train policies at

scale here foxc contests heterogeneous

robots in a virtual Nvidia Blackwell

production facility

as the robot brains execute their

missions they perceive the results of

their actions through sensor simulation

then plan their next action Mega lets

developers test many robot policies

enabling the robots to work as a system

whether for spatial reasoning navigation

Mobility or

dexterity amazing things are born in

simulation today we're introducing

Nvidia Isaac Groot

N1 Groot N1 is a generalist Foundation

model for humanoid

robots it's built on the foundations of

synthetic data generation and learning

in

simulation Groot N1 features a dual

system architecture for thinking fast

and slow inspired by principles of human

cognitive

processing the slow thinking system lets

the robot perceive and reason about its

environment and instructions and plan

the right actions to take the fast

thinking system translates the plan into

precise and continuous robot actions

Groot n1's generalization lets robots

manipulate common objects with ease and

execute multi-step sequences

collaboratively and with this entire

pipeline of synthetic data generation

and robot learning humanoid robot

developers can post-train Gro N1 across

multiple embodiments and tasks across

many

environments around the world in every

industry developers are using nvidia's 3

computers to build the next generation

of embodied AI

[Music]

physical Ai and

Robotics are moving so fast everybody

pay attention to this space this could

very well likely be the largest industry

of all at its core we have the same

challenges as I mentioned before there

are three that we focus on they are

rather systematic

one how do you solve the data

problem how where do you create the data

necessary to train the AI two what's the

model architecture and then

three what's the scaling loss how can we

scale either the data the compute or

both so that we can make AIS smarter and

smarter and smarter how do we scale and

those two those fundamental problems

exist in robotics as well in

robotics we created a system called

Omniverse it's our operating system for

physical a you've heard me talk about

Omniverse for a long time we added two

technologies to it today I'm going to

show you two things one of them is so

that we could scale AI with generative

capabilities and generative model that

understand the physical world we call it

Cosmos using

Omniverse to condition Cosmos and using

Cosmos to generate an infinite number of

environments

allows us to create data that is

grounded grounded controlled by

us and yet be systematically infinite at

the same time okay so you see Omniverse

we use candy colors to give you an

example of us controlling the robot in

the scenario perfectly and yet o Cosmos

can create all these virtual

environments the second thing just as as

we were talking about earlier one of the

incredible scaling capabilities of

language models today is reinforcement

learning verifiable rewards the question

is what's the verifiable rewards in

robotics and as we know very well is the

laws of physics verifiable physics

rewards and so we need an incredible

physics engine well most physics engines

have been designed for a variety of

reasons they could be designed because

we wanted to use it for large

machineries or uh maybe we design it for

uh virtual worlds video games and such

but we need a physics engine that is

designed for very fine

grain rigid and soft bodies designed for

being able to train tactile feedback and

fine motor skills and actuator controls

we needed to be GPU accelerated so that

we these virtual worlds could live in

super linear time super real time and

train these AI models incredibly fast

and we needed to be integrated

harmoniously into a framework that is

used by roboticist all over the world

Moko and so today we're announcing

something really really special it is a

partnership of three

companies Deep

Mind Disney research and Nvidia and we

call it

Newton let's let's take a look at Newton

[Music]

tell me that wasn't amazing

hey

blue how you

doing how do you like how do you like

your new physics engine you like it

huh yeah I bet I know tactile

feedback rigid body soft body

simulation super real

time can you imagine just now what you

were looking at is complete real time

simulation this is how we're going to

train robots in a future

uh just so you know blue has uh two

computers two Nvidia computers

inside look how smart you

are yes you're

smart

okay all right hey blue listen how about

let's take them home let's finish this

keynote it's

lunchtime are you ready let's finish it

up we have another announcement to

you're good you're good just stand right

here stand right here stand right

here all right good right

there that's good all right

Stan okay we have another amazing

news I told you the progress of our

robotics has been making

progress and today we're announcing that

Groot

N1 is open

sourced I want to thank all of you to

come let's wrap up I want to thank all

of you for coming to GTC we talked about

several things one Blackwell is in full

production and the ramp is incredible

customer demand is incredible and for

good reason because there's an

inflection point in AI the amount of

computation we have to do in AI is so

much greater as a result of reasoning Ai

and the training of reasoning AI systems

and agent agentic Systems Second

Blackwell mvlink 72 with Dynamo is 40

times the performance AI Factory

performance of Hopper and inference is

going to be one of the most important

workloads in the next decade as we scale

out AI third we have an

annual annual rhythm of road maps that

has been laid out for you so that you

could plan your AI infrastructure and

then we have two we have three AI

infrastructures we're building AI

infrastructure for the cloud AI

infrastructure for Enterprise and AI

infrastructure for

[Music]

robots we have one more treat for

you play it

[Music]

[Music]

[Music]

[Music]

a

[Music]

[Music]

[Music]

[Music]

[Music]

thank you everybody thank you for all

the partners that made this video

possible thank you everybody that made

this video possible have a great GTC

thank

you hey

blue let's go home good

job good little

[Music]

man thank you I love you too thank you

Loading...

Loading video analysis...