GTC March 2025 Keynote with NVIDIA CEO Jensen Huang
By NVIDIA
Summary
## Key takeaways - **AI factories are the future of data centers**: Data centers are undergoing a platform shift from general-purpose computing to accelerated computing powered by GPUs, transforming into 'AI factories' solely focused on generating tokens for various applications. [03:49], [22:25] - **Agentic AI requires 100x more computation**: The rise of agentic AI, capable of reasoning, planning, and action, has dramatically increased computational needs, requiring up to 100 times more tokens and processing power than previously estimated. [08:18], [12:56] - **Blackwell GPU offers 25x power efficiency leap**: The new Blackwell GPU architecture, utilizing disaggregated MVLink and liquid cooling, achieves a 25x improvement in performance per watt compared to Hopper, enabling more efficient AI factories. [46:43], [01:16:35] - **Robots are becoming generalists with AI**: Nvidia's Newton physics engine and Groot N1 foundation model are enabling robots to learn and perform complex, multi-step tasks, addressing the global labor shortage and ushering in an era of embodied AI. [01:56:35], [01:59:26] - **Nvidia's roadmap accelerates AI infrastructure**: Nvidia is systematically advancing AI infrastructure with annual architectural updates like Blackwell Ultra and Vera Rubin, alongside networking innovations like Spectrum X and silicon photonics, to meet the escalating demands of AI. [01:24:25], [01:32:24]
Topics Covered
- AI's computational demand has exploded 100x due to reasoning.
- Generative AI shifts computing from retrieval to generation.
- Agentic AI requires reasoning, planning, and action capabilities.
- The data center industry is undergoing a platform shift to accelerators.
- Blackwell's architecture enables 25x performance increase through power efficiency.
Full Transcript
this is how intelligence is
made a new kind of
factory generator of
tokens the building blocks of
AI tokens have opened a new
frontier the first step into an
extraordinary
world where less possibilities are
born tokens transform images into
scientific
data charting alien
atmospheres and guiding the explorers of
tomorrow they turn raw data into
foresight so next
time we'll be ready
[Music]
tokens decode the laws of
physics to get us there
[Music]
faster and take us
further tokens see disease before it
takes
hold they help us unravel the language
of life
and learn what makes us
tick tokens connect the
dots so we can protect our most noble
[Music]
creatures they turn potential into
[Music]
plenty and help us Harvest our
[Music]
Bounty tokens don't just teach robots
how to move but to bring
[Music]
joy to lend us a
hand and put life within reach
[Music]
together we take the Next Great Leap to
bravely
go where no one has gone
[Music]
before and
here is where it all
[Music]
begins welcome to the stage Nvidia
founder and CEO Jensen Wong
[Music]
welcome to
GTC what an amazing
year we wanted to do this at
Nvidia so through the magic of
artificial
intelligence we're going to bring you to
nvidia's
headquarters I think I'm bringing you to
Nvidia headquarters
what do you
think this
is this is where we work this is where
we work what an amazing year it was and
we have a lot of incredible things to
talk about and I just want you to know
that I'm up here without a net there are
no scripts there's no teleprompter and
I've got a lot of things to cover so
let's get started first of all I want to
thank all of the sponsors all the
amazing people who are part of this
conference just about every single
industry is
represented healthc care is here
Transportation
retail gosh the computer industry
everybody in the computer industry is
here and so it's really really terrific
to see all of you and thank you for
sponsoring it
GTC started with gForce it all started
with GeForce and
today I have here a GeForce
5090 and
5090 unbelievably 25 years
later 25 years after we started working
on GeForce GeForce is sold out all over
the world this is the 90 the Blackwell
generation and comparing it to the 490
look how it's 30% smaller in volume it's
30% better at dissipating energy and
incredible performance hard to even
compare and the reason for that is
because of artificial
intelligence GeForce brought Cuda to the
world Cuda enabled Ai and AI has now
come back to revolutionized computer
Graphics what you're looking at is
realtime computer Graphics 100% path
traced for every pixel that's rendered
artificial intelligence predicts the
other 15
think about this for a second for every
pixel that we mathematically rendered
artificial intelligence inferred the
other 15 and it has to do so with so
much Precision that the image looks
right and it's temporally accurate
meaning that from frame to frame to
frame going forward or backwards because
it's computer Graphics it has to stay
temporarily stable incredible artificial
intelligence has made extraordinary
progress it has only been 10 years now
we've been talking about AI for a little
longer than that but AI really came into
the world's Consciousness about a decade
ago started with perception AI computer
vision speech
recognition then generative AI the last
5 years we've largely focused on
generative AI teaching an AI how to
translate from one modality to another
another modality text to image image to
text text to
video amino acids to
proteins properties the chemicals all
kinds of different ways that we can use
AI to generate generate content
generative AI fundamentally changed how
Computing is done from a retrieval
Computing model we now have a generative
Computing model whereas almost
everything that we did in the past was
about creating content in advance
storing multiple versions of it and
fetching whatever version we think is
appropriate at the moment of use now ai
understands the context understands what
we're asking understands the meaning of
our request and generates what it knows
if it needs it'll retrieve information
augments its understanding and generate
answer for us rather than retrieving
data it now generates answers
fundamentally changed how Computing is
done every single layer of computing has
been
transformed the last several years the
last couple two three years major
breakthrough happened fundamental
advance in artificial intelligence we
call it a gentic AI a gentic AI
basically means that you have an AI that
has agency it can perceive and
understand the context of the
circumstance it can reason very
importantly can reason about how to
answer or how to solve a problem and it
can plan and action it can plan and take
action it can use tools because it now
understands multimodality information it
can go to a website and look at the
format of the website words and videos
maybe even play a
video learns from what it learns from
that website understands it and come
back and use that information use that
new knowledge to do its job agentic AI
at the foundation of agentic AI of
course something that's very new
reasoning and then of course the next
wave is already happening we're going to
talk a lot about that today robotics
which is been enabled by physical ai ai
that understands the physical world it
understands things like friction and
inertia cause and
effect object permanence when something
goes around the corner doesn't mean has
disappeared from this universe it's
still there just not seeable and so that
ability to understand the physical world
the three-dimensional world is what's
going to enable a new era of AI we
called physical Ai and it's going to
enable robotics each one of these phases
each one of these waves opens up New
Market opportunities for all of us it
brings more and new partners to GTC as a
result GTC is now
jam-packed the only way to hold more
people at GTC is we're going to have to
grow San Jose and and we're working on
it we got a lot of land to work with we
got to grow San
Jose so that we can make GTC I have just
just you know as I'm standing here I
wish all of you could see what I see and
we're we're in the middle of a stadium
um and La last year was the first year
back that we did this live and it was it
was like a rock concert and it was
described GTC is was described as the
Woodstock of AI and this year it's
described as the Super Bowl of
AI the only difference is everybody wins
at this Super Bowl everybody's a winner
and so every single year more people
come because AI is able to solve more
interesting problems for more Industries
and more companies and this year we're
going to talk a lot about agentic Ai and
physical AI at its
core what enables each wave and each
phase of AI three
fundamental matters are involved the
first is how do you solve the data
problem and the reason why that's
important is because AI is a datadriven
computer science approach it needs data
to learn from it needs digital
experience to learn from to gain its dig
to learn knowledge and to gain digital
experience how do you solve the data
problem the second is how do you solve
the training problem without human in
the loop the reason why human in the
loop is fundamentally challenging is
because we only have so much time and we
would like and AI to be able to learn at
super human rates at Super realtime
rates and to be able to learn at a scale
that no humans can keep up with and so
the second question is how do you train
the model and the
third is how do you
scale how do you create how do you find
an algorithm whereby the more resource
you provide whatever the resource is the
small smarter the AI
becomes the scaling law well this last
year this is where almost the entire
world got it
wrong the computation requirement the
scaling law of AI is more
resilient and in fact hyper
accelerated the amount of computation we
need at this point as a result of
agentic AI as a result of reasoning is
easily a hundred times more than we
thought we needed this time last year
and let's reason about why that's true
the first part is let's just go from
what the AI can do let me work
backwards agentic AI as I mentioned at
this Foundation is reasoning we now have
AIS that can reason which is
fundamentally about breaking a problem
down step by step
maybe it approaches a problem in a few
different ways and selects the best
answer maybe it solves the problems the
same problem in a variety of
ways and and sure it has the best the
same answer consistency checking or
maybe after it's done deriving the
answer it plugs it back into the
equation maybe a quadratic equation to
confirm that in fact that's the right
answer instead of just
one shot bluring it out remember two
years ago when we started working with
chat GPT a miracle as it
was many complicated questions and many
simple questions it simply can't get
right and it's understandably so it took
a one shot whatever it learned by
studying pre-trained data whatever it
saw from other experiences pre-trained
data it does a one shot blurps it out
like a saon now we have AIS that can
reason step by step by step using a
technology called Chain of Thought best
of n consistency checking a variety of
different path planning a a variety of
different techniques we now have AIS
that can reason break a problem down
reason step by step by step well you
could imagine as a result the number of
tokens we generate and the fundamental
technology of AI is still the same
generate the next token predict the next
token token it's just that the next
token now makes up step
one then the next token after that after
it generates step one that step one has
gone into the input of the AI again as
it generates step two and step three and
step four so instead of just generating
one token or one word after next it
generate a sequence of words that
represents a step of reasoning the
amount of tokens that's generated as a
result is substantially higher and I'll
show you in a second easily 100 times
more now 100 times more what does that
mean well it could generate a 100 times
more tokens and you can see that
happening as I explained previously
or the model is more complex it
generates 10 times more tokens and in
order for us to keep the model
responsive
interactive so that we don't lose our
patients waiting for it to think
we now have to compute 10 times faster
and so 10 times tokens 10 times faster
the amount of computation we have to do
is 10 100 times more easily and so
you're going to see this in the rest of
the presentation the amount of
computation we have to do for inference
it's dramatically higher than it used to
be well the question then becomes how do
we teach an AI how to do what I just
described how to execute this chain of
thigh well one method is you have to
teach the AI how to reason and as I
mentioned earlier in training there are
two fundamental problems we have to we
have to solve where does the data come
from where does the data come from and
how do we not have it be limited by
human in the loop there's only so much
data and so much human demonstration we
can perform and so this is the big
breakthrough in the last couple years
reinforcement learning verifiable result
basically reinforcement learning of an
AI as Ito as it attacks or tries to
engage solving a problem step by step
well we have many problems that have
been solved in the history of humanity
where we know the
answer we know the equation of a
quadratic equation how to solve that we
know how to solve a Pythagorean theorem
um the the rules of a right triangle we
know many many rules of math and
geometry and logic and science we have
puzzle games that we could give
it constraints constraint constrainted
um uh uh type of problems like Sudoko um
those kind of problems on and on and on
we have hundreds of these problem spaces
where we can generate millions of
different
examples and give the AI hundreds of T
hundreds of chances to solve it step
byep step by step as we use
reinforcement learning to reward it as
it does a better and better job so as a
result you take hundreds of different
topics millions of different examples
hundreds of different tries each one of
the tries generating tens of thousands
of tokens you put that all together
we're talking about trillions and
trillions of tokens in order to train
that model and now with reinforcement
learning we have the ability to generate
an enormous amount of tokens synthetic
data generation basically using a
robotic approach to teach an AI the
combination of these two things has put
an
enormous enormous challenge of computing
in front of the industry and you can see
that the industry is responding this is
what I'm about to show
you is Hopper
shipments of the top four csps
the the top four csps they're the the
ones with the public clouds uh Amazon
Azure gcp and oci the top four C top
four csps not the AI companies that's
not included not all the startups not
included not Enterprise not included a
whole bunch of things not included just
those four just to give you a sense of
comparing the peak year of Hopper and
the first year of Blackwell okay the
peak year of Hopper oper in the first
Weir of black well so you can kind of
see that in fact AI is going through an
inflection
point it has become more useful because
it's smarter it can reason it is more
used you can tell it's more used because
whenever you go to chat GPT these days
the the it seems like you have to wait
longer and longer and longer which is a
good thing it says a lot of people are
using it with great effect and the
amount of computation necessary to train
those models and to influence those
models has grown tremendously so in just
one
year and blackw is just started shipping
in just one year you could see the
incredible growth in AI
infrastructure well that's been
reflected in Computing across the board
we're now seeing and this is the purple
is the forecast of uh of an of analysts
uh about the the next uh the increase of
capital expense
of the world's data centers including
csps and Enterprise and so on um the
world's data centers uh through uh the
end through the end of the decade so
2030 um I've said before that I expect
data center buildout to reach a trillion
dollars and I am fairly certain we're
going to reach that very soon two
Dynamics is happening at the same time
the first Dynamic is
that the vast majority of that growth is
likely to be accelerated meaning we've
known for some time that general purpose
Computing is run out of course run its
course and that we need a new Computing
approach and the world is going through
a platform
shift from hand-coded software running
on general purpose computers to machine
learning software running on
accelerators and gpus this way of doing
computation is at this Point past this
Tipping Point and we are now seeing the
inflection point happening the
inflection happening in the world's data
center build outs so the first thing is
a transition in the way we do
Computing second is an increase in
recognition that the future of software
requires capital investment now this is
a very big
idea whereas in the past we wrote the
software and we ran it on computers
in the future the computer's going to
generate the tokens for the software and
so the computer has become a generator
of tokens not a retrieval of files from
retrieval based Computing to generative
based Computing from the old way of
doing data centers to a new way of
building these infrastructure and I call
them AI factories they're AI factories
because it has one job and one job only
generating these incredible tokens that
we then reconstitute into music into
words into videos
into Research into chemicals or proteins
we reconstitute it into all kinds of
information of different types so the
world is going through a transition in
not just the amount of data centers that
will be built but also how it's
built well everything in the data center
will be accelerated not all of its Ai
and I want to say a few words about this
you know this slide this slide this
slide is is uh genuinely my favorite and
the reason for that is because for all
of you who' been coming to GTC uh all of
these years you've been listening to me
talk about these libraries uh this whole
time this this is in fact what GTC is
all about this one slide and in fact a
long time ago 20 years ago this is the
only only slide we had one library after
another library after another the
library you can't just accelerate
software just as we needed an AI
framework in order to create AIS and we
accelerate the AI Frameworks you need
Frameworks for physics and biology and
multiphysics and you know all kinds of
different quantum physics you need all
kinds of libraries and Frameworks we
call them Cuda X libraries acceleration
Frameworks for each one of these fields
of science and so this first one is
incredible this is C cpai numeric uh
numpy is the number one most downloaded
python Library most used python library
in the world downloaded 400 million
times this last year uh kitho is
computate and Cai numeric is a um uh
zero change drop in acceleration for
numpy so if any of you are using numpy
out there uh give Cai numeric a try
you're going to love it a kitho a
computational phography Library over the
course of four years we've now taken the
entire process of processing lithography
computational lithography which is the
second Factory in a Fab there's the
factory that manufactures the Wafers and
then there's the factory that
manufactures the information to
manufacture the
Wafers every industry every company that
has factories will have two factories in
the future the factory for what they
build and the factory for the
mathematics the factory for the
AI Factory for cars Factory for AIS for
the cars Factory
for smart speakers and factories for AI
for the smart speakers and so kitho is
our computational theography tsmc
Samsung asml our partners synopsis
Mentor incredible support all over I
think that this is now at its Tipping
Point in in another 5 years time every
mask every single lithography will be
processed on Nvidia Cuda Ariel is our
library for 5G turning a GPU into a 5G
radio why not signal processing is
something we do incredibly well once we
do that we can layer on top of it ai ai
for Rand or what we call AI Rand the
next generation of of uh of uh Radio
Radio Networks uh will be will have ai
deeply inserted into it why is it that
we're limited by the limits of
information Theory um because there's
only so much information Spectrum we can
get not if we add AI to it uh Coop
numerical or mathematical optimization
almost every single industry uses this
when you plan uh seats and uh flights uh
inventory and
customers um uh workers and plants uh uh
drivers and Riders
uh so on so forth where we have multiple
constraints multiple constraints um a
whole bunch of variables and you're
optimizing
for
time uh profit uh quality of service um
usage of resource whatever it happens to
be Nvidia uses it for our supply chain
management uh kuop is an incredible
Library it takes What It Takes what
would take hours and hours and it turns
into into seconds the reason why that's
a big deal is so that we can now explore
much larger space we announced uh that
we are going to open source Coop the
almost everybody is using either guui uh
goobi or um IBM clex uh or FICO uh we're
working with all three of them the
industry is so excited we're about to
accelerate The Living Daylights out of
the industry uh pair bricks for uh Gene
sequencing and Gene uh analysis Moni the
world's leading Medical Imaging Library
Earth 2 multif physics for pre for uh
predicting in very high resolution uh
local weather uh C Quantum and Cuda Q
we're going to have our
first Quantum day here at GTC we're
working with just about everybody in the
ecosystem either helping them research
on Quantum architectures Quantum
algorithms or in building a uh classical
accelerated Quantum uh heterogeneous
architecture and so really exciting work
there uh coup equivariance and censor
for tensor contraction quantum chemistry
of course this stack is world famous
people think that there's one piece of
software called CA but in fact on top of
Cuda is a whole bunch of libraries
that's integrated into all different
parts of the ecosystem and software and
infrastructure in order to make AI
possible uh I've got a new one here to
announce today
uh qdss uh our Spar solvers really
important for
CAE this is one of the biggest things
that has happened in the last year
working with cadence and synopsis and
ansis and the so and um and and well all
all of the uh the systems companies
we've now made possible uh just about
every important Eda and CAE library to
be accelerated what's amazing is until
recently Nvidia has been using general
purpose
computers running software super slowly
to design accelerated computers for
everybody else and the reason for that
is because we never had that software
that body of software optimized for a
Cuda until recently and so now our
entire industry is going to get
supercharged as we move to accelerated
Computing uh CDF a data frame for
structure data we now have a drop in
acceleration for spark and drop in
acceleration for pandas incredible and
then we have warp a library for physics
that runs in p a python library for
physics for Cuda we have a big
announcement there I'll save it in just
a
second this is just a
sampling of the libraries that make
possible accelerated Computing it's not
just Cuda we're so proud of Cuda but if
not for Cuda and the fact that we have
such a large install base none of these
libraries would be useful for any of the
developers who use them for all the
developers that use
them you use it because one it's going
to give you incredible speed up it's
going to give you incredible scale
up and two because the install base of
Cuda is now everywhere it's in every
cloud it's in every data center it's
available from every computer company in
the world it's every literally
everywhere and therefore by using one of
these
libraries your software your amazing
software can reach everyone and so we've
now reached the Tipping Point of
accelerated Computing Cuda has made it
possible and all of you this is what GTC
is about the ecosystem all of you made
this possible and so we made a little
short video for you thank you
to the
creators the Pioneers the Builders of
the
future Cuda was made for
you since
2006 6 million developers in over 200
countries have used Cuda and transformed
Computing with over 900 Cuda X libraries
and AI models you're accelerating
science
reshaping
Industries and giving machines the power
to see learn and
reason now Nvidia Blackwell is 50,000
times faster than the first Cuda
GPU these orders of magnitude gains in
speed and scale are closing the gap
between
simulation and realtime digital twins
[Music]
and for you this is still just the
beginning we can't wait to see what you
do next
[Music]
I love what we do I love even more what
you do with it and one of the things
that that most touch me in in my 33
years doing
this one scientist said to me Jensen
because of the work because of your work
I can do my life's work in my
lifetime and boy if that doesn't if that
doesn't touch you well you got to be a
corpse so this is all about you guys
thank you all right so we're going to
talk about
AI but you know AI started in a cloud it
started in the cloud for a good reason
because it turns out that AI needs
infrastructure it's machine learning in
if the science say machine learning then
you need a machine to do the science and
so machine learning requires
infrastructure and the cloud data
centers had infrastructure they also
have extraordinary computer science
extraordinary research the perfect
circumstance for AI to take off in the
cloud and the csps but that's not where
AI is limited to AI will go everywhere
and we're going to talk about AI in a
lot different ways and the cloud service
providers of course they they they like
our Leading Edge technology they like
the fact that we have full stack because
accelerated Computing as you know as I
was explaining earlier is not about the
chip it's not even just the chip in the
library the programming model is the
chip the programming model and a whole
bunch of software that goes on top of it
that entire stack is incredibly complex
each one of those layers each one of
those libraries is essentially like SQL
SQL as you know is called in storage
Computing it was the big revolution of
computation by IBM SQL is one Library
just imagine I just showed you a whole
bunch of them and in the case of AI
there's a whole bunch more so the stack
is complicated they also love the fact
that csps love that Nvidia Cuda
developers are CSP customers because in
the final analysis they're building
infrastructure for the world to use and
so the rich developer ecosystem is
really valued and really really uh
deeply appreciated well now that we're
going to take AI out to the rest of the
world the rest of the world has
different system
configurations operating environment
differences domain specific Library
differences usage differences and so AI
as it translates to Enterprise it as it
translates to manufacturing as it
translates to robotics or self-driving
cars or
even companies that are starting GPU
clouds there's a whole bunch of
companies maybe 20 of them who started
during the Nvidia time and what they do
is just one thing they host gpus they
call themselves GPU clouds and one of
our one of our great Partners cor weave
is in the process of going public and
we're super proud of them
and so GPU clouds they have their own
requirements but one of the areas that
I'm super excited about is
Edge and today we announced we announced
today that Cisco Nvidia T-Mobile the
largest telecommunications company in
the world cus
ODC are going to build a full
stack for Radio Networks here in the
United States and and that's going to be
the second stack so that this current
stack this current stack we're
announcing today will put AI into the
edge remember a hundred billion dollar
of the
world's Capital Investments each year is
in the Radio Networks and all of the
data centers provisioning for
Communications in the future there is no
question my mind that's going to be
accelerated Computing infused with ai ai
will do a far far better job adapting
the radio signals the massive MOS to the
changing environments and the traffic
conditions of course it would of course
we would use reinforcement learning to
do that of course myo is essentially one
giant radio robot of course it is and so
we will of course provide for those
capabilities of course AI could
revolutionize Comm
Communications you know when I call home
you don't have to say but that that few
words because my wife knows where I work
what that condition's like conversation
Carries On from yesterday she kind of
remembers what I like don't like and
often times just a few words you
communicated a whole bunch the reason
for that is because of context and human
priors prior knowledge well combining
those capabilities could revolutionize
commun Communications look what it's
doing for video processing look look
what I just described earlier in 3D
graphics and so of course we're going to
do the same for Edge so I'm super
excited about the announcement that we
made today T-Mobile Cisco Nvidia cus ODC
are going to build a full
stack well AI is going to go into every
industry that's just
one one of the earliest Industries that
AI went into was autonomous vehicles The
Moment I Saw alexnet and we've been
working on computer vision for a long
time the moment I saw alexnet was such
an ex inspiring moment such an exciting
moment it caused us to decide to go all
in on building self-driving cars so
we've been working on self-driving cars
now for over a
decade we build technology that almost
every single self-driving car company
uses it could be either in the data
center for example Tesla uses Nvidia
lots of Nvidia gpus in the data center
it could be in the data center or the
car wayo and wave uses Nvidia computers
in data centers as well as the car it
could be just in the car it's very rare
but sometimes it's just in the car or
they use all of our software in addition
we work with the car industry however
the car industry would like us to work
with them we build all three computers
the training computer the simulation
computer and the robotics computer the
self-driving car computer all the
software stack that sits on top of it
models and
algorithms just as we do with all of the
other industries that I've demonstrated
and so
today I'm super excited to
announce that GM has selected Nvidia to
partner with them to build their future
self-driving car Fleet
the time for autonomous vehicles has
arrived and we're work looking forward
to building with GM AI in all three
areas AI for manufacturing so they could
revolutionize the way they manufacture
AI for Enterprise so they could
revolutionize the way they work design
cars and simulate cars and and then also
AI for in the car so AI infrastructure
for GM partnering with with GM and
building with GM their AI so I'm super
excited about that one of the areas that
I'm deeply proud of and it rarely gets
any
attention is safety Automotive Safety
it's called
halos in our companies called
Halos safety
requires technology from Silicon to
systems to system software the
algorithms
the
methodologies everything from diversity
to ensuring
diversity monitoring and
transparency
explainability all of these different
philosophies have to be deeply ingrained
into every single part of how you
develop the system and the software
we're the first company in the world I
believe to have every line of code
safety assessed 7 million lines of code
safety assessed our chip our system our
system software and our algorithms are
safety assessed by Third parties that
crawl through every line of code to
ensure that it is designed to ensure
diversity transparency and
explainability we also have followed
over a thousand patents and during this
GTC and I really encourage you to do so
is to go spend time in the Halos
Workshop so that you could see all of
the different things that comes together
to ensure that cars of the future are
going to be safe as well as autonomous
and so this is something I'm very proud
of it barely it rarely gets any
attention and so I I thought I would
spend the extra time this time to talk
about that okay Nvidia
halos all of you have seen cars drive by
themselves um the wayo robo taxis are
incredible but we made a video to share
with you some of the technology we
use to solve the problems of data and
training and diversity so that we could
use the magic of AI to go create AI
let's take a
look Nvidia is Accel ating AI
development for AVS with Omniverse and
[Music]
Cosmos Cosmos prediction and reasoning
capabilities support AI first AV systems
that are endtoend trainable with new
methods of development model
distillation closed loop training and
synthetic data generation first model
distillation adapted as a policy model
cosmos's driving knowledge transfers
from from a slower intelligent teacher
to a smaller faster student inferenced
in the
car the teacher's policy model
demonstrates the optimal trajectory
followed by the student model learning
through
iterations until it performs at nearly
the same level as the
teacher the distillation process
bootstraps a policy model but complex
scenarios require further
tuning closed loop training enables
fine-tuning of policy
models log data is turned into 3D scenes
for driving closed loop in physics based
simulation using Omniverse neural
reconstruction variations of these
scenes are created to test a model's
trajectory generation
[Music]
capabilities Cosmos Behavior evaluator
can then score the generated driving
behavior to measure model
performance newly generated scenario
and their evaluation create a large data
set for Clos Loop training helping AVS
navigate complex scenarios more
robustly last 3D synthetic data
generation enhances av's adaptability to
diverse
environments from log data Omniverse
builds detailed 4D driving environments
by fusing maps and images and generates
a digital twin of the real world
including segmentation to guide Cosmos
by classifying each pixel Cosmos then
scales the training data by generating
accurate and diverse scenarios closing
the Sim to real
Gap Omniverse and Cosmos enable AVS to
learn adapt and drive intelligently
advancing safer Mobility
[Music]
Nvidia is the perfect company to do
that
gosh that's our
destiny use AI to recreate AI the
technology that we showed you there uh
is very similar to the technology that
you're enjoying um uh to uh take you to
a digital twin we call Nvidia all right
let's talk about data centers
that's not bad
huh gausian Splats just in case gaan
Splats well let's talk about data
centers uh Blackwell is in full
production and this is what it looks
like it's an incredible incredible you
know for for people for us this is a
sight of beauty would you
agree this
is how how is this not beautiful how is
this not beautiful well this is a big
deal
because we made a fundamental transition
in computer architecture I just want you
to know that in fact I've shown you a
version of this uh about 3 years ago it
was called uh Grace Hopper and the
system was called
ranger the ranger system uh is about uh
maybe about half of the width of the
screen and it was the world's first MV
link
32 3 years ago we showed Ranger working
and it
was way too large
but it was exactly the right idea we
were trying to solve scale up
distributed computing is about using a
whole lot of different computers working
together to solve a very large problem
but there's no replacement for scaling
up before you scale out both are
important but you want to scale up first
before you scale out while scaling up is
incredibly hard there is no simple
answer for it you're not going to scale
SC it up you're not going to scale it
out like Hadoop take a whole bunch of
commodity uh computers hook it up into a
large Network and do in storage
Computing using Hadoop Hadoop was a
revolutionary idea as we know it enabled
hyperscale data centers to solve
problems of gigantic
sizes and uh off using offto shelf
computers however the problem we're
trying to solve is so complex that
scaling
in that way would have simply cost way
too much power way too much energy it
would have never deep learning would
have never happened and so the thing
that we had to do was scale up first
well this is the way we scaled up I'm
not going to lift this this is this is
70 lbs this is the the the last
generation system architecture is called
hgx this revolutionized Computing As We
Know It This revolutionize artificial
intelligence this is eight
gpus eight gpus each one of them is kind
of like this okay this this is two gpus
two Blackwell gpus in one black wall
package two black wall gpus in one black
black black wall package and um uh
they're eight of these underneath
this okay and this connects into what we
call MV link
8 this then connects to a CPU
shelf like that so there's dual CPUs and
that sits on top and we connect it over
PCI Express and then many of these get
connected with
infiniband which turns into uh what is
an AI supercomputer this is the way it
was in the past this is the way this is
how we started well this is as far as we
scaled up before we scaled out but we
wanted to scale up even further and I I
told you that Ranger took this system
and scaled it out scaled it up by
another factor of four and so we had MV
link 32 but the system was way too large
and so we had to do something quite
remarkable re re-engineer how MV length
worked and how scale up worked and so
the first thing that we did was we said
listen the mvlink switches are in this
system embedded on the
motherboard we need we need to
disaggregate the mvlink system and take
it out so this is the mvy link system
okay this is an mvy link
switch this is the most this is the
highest performance switch the world's
ever made and this makes it possible for
every GPU to talk to every GPU at
exactly the same time at full bandwidth
okay so this is the mvlink switch we
disaggregated it we took it out and we
put
it in the center of the
chassis so there's all the there 18 of
these switches in nine n different racks
nine different
switch trays we call them and then the
switches are disaggregated the compute
is now sitting in here this is
equivalent to these two things in
compute what's amazing is this is
completely liquid cooled and by liquid
cooling it we can
compress all of these compute nodes into
one rack this is the big change of the
entire industry all of you in the
audience I know how many of you are here
I want to thank thank you for making
this fundamental shift from integrated
MV link to disaggregated MV link from
air
cooled to liquid
cooled from 60,000
components per computer or so to 600,000
components per rack
120
kilow fully liquid cooled and as a
result we have a one Exel flops computer
in one rack isn't it
incredible so this is the compute node
this is the compute
node okay and that now fits in in one of
these now
we 3,000
lb 5,000
cables about 2 miles
worth just an incredible Electronics
600,000 Parts I think that's like 20 20
cars 20 cars worth of parts and
integrates into one supercomputer
well our goal is to do this our goal is
to do scale up and this is what it now
looks like we essentially wanted to
build this chip it's just that no
retical limits can do this no process
technology can do this it's 30 trillion
transistors 20 trillion of it is used
for computing so it's not like you you
could you can't reasonably build this
anytime soon and so the way to solve
this problem is to disaggregate it as I
described into the grace Blackwell
mvlink
72 rack but as a result we have done the
ultimate scale up this is the most
extreme scale up the world has ever done
the amount of computation that's
possible here the memory bandwidth 570
terabytes per second everything is
everything in this machine is now in
teas everything's a trillion and you
have uh an EXA flops which is a million
trillion floating Point operations per
second well the reason why we wanted to
do
this is to solve an extreme
problem and that extreme problem a lot
of people misunderstood to be easy and
in fact it is the ultimate extreme
Computing problem and it's called
inference and the reason for that is
very
simple
inference is token Generation by a
factory and a factory is revenue and
profit
generating or lack
of and so this Factory has to be built
with
extreme efficiency with Extreme
Performance because everything about
this Factory directly affects
your quality of service your revenues
and your profitability let me show you
how to read this chart because I want to
come back to this a few more times
basically you have two axes on the x
axis is the tokens per second whenever
you chat when you uh put a prompt into
chat GPT what comes out is tokens those
to tokens are reformulated into
words you know it's more than a token
per word okay and they'll tokenize
things like th could be used for the it
could be used for them it could be used
for Theory it could be used for
theatrics it could be used for all kinds
of okay and so th is a Tok an example of
a token they reformulate these tokens to
turn into words well we've already
established that if you want your AI to
be smarter you want to generate a whole
bunch of tokens those tokens are
reasoning tokens consistency checking
tokens coming up with a whole bunch of
ideas so they can select the best of
those ideas tokens and so those tokens
might they it might be second guessing
itself it might be is this the best work
you could do and so it ask it talks to
itself just like we talk to ourselves
and so the more tokens you generate the
smarter your AI
but if you take too long to answer a
question the customer is not going to
come back this is no different than WB
search there is a real limit to how long
it can take before comes back with a
smart answer and so you have these two
Dimensions that you're fighting against
you're trying to generate a whole bunch
of tokens but you're trying to do it as
quickly as possible Therefore your token
rate
matters so you want your tokens per
second for that one user to be as fast
as
possible
however in Computer Sciences in
factories there's a fundamental tension
between latency response time and
throughput and and the reason is very
simple if you're in the large high
volume business you batch up it's called
batching you batch up a lot of customer
demand and you
manufacture a certain version of it for
everybody to consume
later however from the moment that they
batched up and
manufacture whatever they did to the
time that you consumed
it could take a long
time so no different for computer
science no different than no to no
different for AI factories that are
generating tokens and so you have these
two fundamental tensions on the one hand
you would like the customer quality of
service to be as good as possible smart
AIS that are super fast on the other
hand you're trying to get your data
center to produce tokens for as many
people as possible so you can maximize
your
revenues the perfect answer is to the
upper right
ideally the shape of that curve is a
square that you could generate very fast
tokens per person up until the limits of
the factory but no Factory can do that
and so it's probably some curve and your
goal is to maximize the area under the
curve okay the product of X and Y and
the further you push out more likely it
means the better of a factory that
you're building well it turns out that
in tokens per second for the whole
Factory and tokens per second response
time one of them requires enormous
amount of computation flops and then the
other dimension requires an enormous
amount of bandwidth and flops and so
this is a very difficult problem to
solve the the good answer is that you
should have lots of flops and lots of
bandwidth and lots of memory and lots of
everything
that's the best answer to start which is
the reason why this is such a great
computer you start with the most flops
you can the most memory you can the most
bandwidth you can of course the best
architecture you can the most Energy
Efficiency you can and you have to have
a programming model that allows you to
run software across all of this insanely
hard so that you can do this now let's
just take a look at this one demo to
give you a tactical feeling of what I'm
talking about please play
it traditional llms capture foundational
knowledge while reasoning models help
solve complex problems with thinking
tokens here a prompt asks to seat people
around a wedding table while adhering to
constraints like traditions photogenic
angles and feuding family
members traditional llm answers quickly
with under 500 tokens it makes mistakes
in seating the guests while the
reasoning model thinks with over 8,000
tokens to come up with the correct
answer it takes a pastor to keep the
peace okay as
as as as all of you know as all of you
know if you have a wedding party of
300 and you're trying to find the
perfect well the optimal seating for
everyone
that's a problem that only AI can solve
or a mother-in-law can solve and
so that's one of those problems that
that Koop cannot
solve okay so what you see here is that
that uh we gave it a problem that
requires reasoning and you saw uh R1
goes off and it reasons about it tries
all these different scenarios and it
comes back and a tests his own answer it
asks it asks itself whether it did it
right meanwhile
the last generation language model does
a one shot so the one shot is 439 tokens
it was fast it was effective but it was
wrong so it was 439 wasted
tokens on the other hand in order for
you to reason about this problem and
this is just a that was actually a very
simple problem you know we just give it
a few more un few more difficult
variables and it becomes very difficult
to reason through and it took 8,000
almost 9,000 tokens and it took a lot
more computation because the model's
more complex okay so that's one
dimension before I show you some results
let me just show let me explain
something else so the answer if you look
at if you look at um blackw you look at
the the Blackwell system and it's now
this scaled up MV link 72 the first
thing that we have to do is we have to
take this model and this model is not
small it's you know in the case of R1
people think R1 is small but it's 680
billion parameters Next Generation
models could be trillions of
parameters and the way that you solve
that problem is you take these trillions
and trillions of parameters and this
model and you uh distribute the workload
across the whole system of gpus you can
use uh tensor parallel you can take one
layer of the model and and run it across
multiple gpus you you could take um a a
slice of the pipeline and call that
pipeline parallel and put that on
multiple gpus you could take different
experts and put it across different gpus
we call it expert parallel the con the
combination of pipeline parallelism and
tensor parallelism and expert
parallelism the number of combinations
is insane and depending on the model
depending on the workload depending on
the conf the circumstance how you
configure that computer has to
change so that you can get the maximum
throughput out of it you also sometimes
optimize for very low lenes sometimes
you try trying to optimize for
throughput and so you have to do some
inflight batching a lot of different
techniques for batching and and uh
aggregating work and so the the software
the operating system for these AI
factories is insanely complicated well
one of the
observations and this is this is a
really terrific terrific thing about
having a homogeneous architecture like
mvlink 72 is that every single GPU could
do all the things that I just described
and we
observe
that these reasoning models are doing a
couple phases of computing one of the
phases of computing is thinking when
you're thinking you're not producing a
lot of tokens you're producing tokens
that you're maybe consuming yourself
you're thinking maybe you're reading
you're digesting information that
information could be a PDF the
information could be a website you could
literally be watching a video ingesting
all of that at Super linear rates and
you take all of that information and you
then formulate the answer formulate a
planned answer and so that digestion of
information context processing is very
flops
intensive on the other hand during the
next phase is called decode so the first
part we call pre-fill the next phase of
decode requires floating Point
operations but it requires an enormous
amount amount of bandwidth and it's
fairly easy to calculate you know if you
have a model and it's a few trillion
parameters well it takes a few terabytes
per second notice I was mentioning 576
terabytes per second it takes terabytes
per second to just pull the mod model in
from hbm
memory and to generate literally one
token and the reason it generates one
token is because remember that these
large language models are predicting the
next token that's why they say the next
token it's not predicting every single
token it's predicting the next token now
we have all kinds of new techniques
speculative decoding and all kinds of
new techniques for doing that faster but
in the final analysis you're predicting
the next token okay and so that you
ingest pull in the entire model and the
context we call it a KV cache and then
we produce one token and then we take
that one token we put it back into our
brain we produce the next token
every single one every single time we do
that we take trillions of parameters in
we produce one token trillions of
parameters in produce another token
trillions of parameters in produce
another token and notice that demo we
produced
8,000 600 tokens so trillions of btes of
information trillions of btes of
information have been taken into our
gpus and produce one token at a time
which is fundamental ly the reason why
you want mvy link Envy link gives us the
ability to take all of those gpus and
turn them into one massive
GPU the ultimate scale up and the second
thing is that now that everything is on
mvy link I can
disaggregate the prefill from the decode
and I could decide I want to use more
gpus for
prefill less for decode because I'm
thinking a lot
I'm doing it's agentic I'm reading a lot
of information I'm doing deep research
notice during deep
research you know and and earlier I was
listening to Michael and Michael was
talking about his his him doing research
and I do the same thing and we go off
and we write these really long research
projects for our Ai and I love doing
that because you know I already paid for
it and I just love making our gpus work
and nothing gives me more joy so so so I
I write up and then it goes off and it
does all this research and it went off
to like 94 different websites and I read
all this and I'm reading all this
information and it formulates an answer
and writes the report it's incredible
okay during that entire time prefill is
super busy and it's not really
generating that many tokens on the other
hand when you're chatting with the
chatbot and millions of us are doing the
same thing it is very token gener
generation heavy it's very decode heavy
okay and so um depending on the workload
we might decide to put more gpus in the
decode dep depending on the workload put
more gpus into prefill well this Dynamic
operation is really complic complicated
so I've just now described pipeline
pipeline parallel tensor parallel um
expert parallel pre uh inflight batching
disaggregated inferencing workload
management and then I've got to take
this thing called the KV cache I got to
Route it to the right GPU I've got to
manage it through all the memory
hierarchies that piece of software is
insanely complicated and so today we're
announcing the Nvidia
Dynamo Nvidia Dynamo does all that it is
essentially the operating system of an
AI
Factory whereas in the past in the way
that we ran data centers our operating
system would be something like VMware
and we would orchestrate and we still do
um you know we're big user orchestrate a
whole bunch of different Enterprise
applications running on top of our
Enterprise
it but in the future the application is
not Enterprise it it's agents and the
operating system is not something like
VMware it's something like Dynamo and
this operating system is running on top
of not a data center but on top of an AI
Factory now we call it Dynamo for a good
reason as you know the Dynamo was the
first instrument that started the last
Industrial Revolution the industrial
revolution of energy water comes in
electricity comes out it's pretty
fantastic you know water comes in you
light it on fire turn to the Steam and
it what comes out this this invisible
thing that's incredibly valuable it took
another 80 years to go to alternate new
current but Dynamo Dynamo is the where
it all started okay so we decided to
call this operating system this piece of
software insanely complicated software
the Nvidia Dynamo it's open source it's
open source and we're so happy that so
many of our partners are working with us
on it and one of one of my favorite
favorite Partners I just love them so
much because the Revolutionary work that
they do and also because Aran is such a
great guy but perplexity as a great
partner of ours in working through this
okay so anyhow uh really really
great okay so now we're going to have to
wait until we scale up all these
infrastructure but in the meantime we've
done a whole bunch of very indepth
simulation we have supercomputers doing
simulation of our supercomputers which
makes sense and and I'm now going to
show you the
benefit of everything that I've just
said and remember the factory diagram on
the x-axis on the xaxis is tokens per
second throughput excuse me in the y
axis tokens per second throughput of the
factory and the x-axis tokens per second
of the user experience and you want
super smart AIS and you want to produce
a whole bunch of it this is
Hopper okay so this is Hopper and it can
produce it can
produce for one user about for each user
about a 100 tokens per second 100 this
is eight gpus and it's connected with
infin band and the um um I'm normalizing
it to tokens per second per
megawatt so it's a one megawatt data
center which is not a very large AI
Factory but anyhow one megaw okay and so
it can produce for each user 100 tokens
per second and it can produce at this at
this level whatever that happens to be
100,000 tokens per second for that one
megawatt data center or it can produce
about 2 and half million tokens per
second 2 and a half million tokens per
second for that AI Factory if it was
super batched up and the customer is
willing to wait a very long time okay
does that make sense all right so
nod all right because this is this is
where you know every GTC there's there's
the price for entry you guys know and
it's like you get tortured with math
okay this is the only
only only at Nvidia do you get tortured
with math all right so Hopper you get
two and a half now what's that two and a
half million what's it what's how do you
translate that 2 and a half million
remember um Chad gbt is like $10 per
million
tokens right $10 per million tokens
Let's Pretend for for a second that that
that's I I I think the 10 million $10
per million tokens is probably down here
okay I I probably say it's down here but
let me pretend it's up there because 25
million um 10 so 25 million doll per
second does that make sense that's
that's how you think through it or on
the other hand if it's way down here
then the question is you know so it's
100,000 100,000 just divide that by 10
okay $250,000
per Factory per second and then as it
was 31 million 30 million seconds in a
year and that translates into revenues
for that 1 million that one megawatt
Data Center and so that's your goal on
the one hand you would like your your
token rate to be as fast as possible so
that you can make really smart AIS and
if you have Smart AIS people pay you
more money for it on the other hand the
smarter the AI the less you can make in
volume
very sensible
tradeoff and this is the curve we're
trying to bend now what I'm just showing
you right now is the fastest computer in
the world Hopper it's the computer that
revolutionized everything and so how do
we make that better so the first thing
that we do is we come up with Blackwell
with MV link
8 same same Blackwell that one same one
same compute and that one compute node
with MV link8 using fp8 and so black is
just
faster faster bigger more transistors
more everything but we like to do more
than that and so we introduce a new
Precision it's not quite as simple as
4bit floating point but using 4-bit
floating point we can quantize the model
use less
energy use less energy to do the same
and as a result when you use less energy
to do the same you could do more because
remember one big idea is that every
single data center in the future will be
power limited your revenues are power
limited you could figure out what your
revenues are going to be based on the
power you have to work
with this is no different than you know
like many other Industries and so we are
now a power limited industry our
revenues will associate with that well
based on that you want to make sure you
have the most energy efficient compute
architecture you can possibly get the
next
then we scale up with MV link 72 does
that make sense look at the difference
between that MV link 72 fp4 and then
because our architecture is so tightly
integrated and now we add Dynamo to it
Dynamo can
extend that even further are you
following me so Dynamo also helps Hopper
but Dynamo helps black wall incredibly
now yep
only at GTC do you get an Applause for
that and and
so so now notice what I put those two
shiny parts that's kind of where your
max Q
is you know that's likely where you'll
run your factory operations you're
trying to find that balance between
maximum throughput and maximum quality
of AI smartest AI the most of it those
two that XY intercept is really what
you're optimizing for and that's what it
looks like if you look underneath those
two squares Blackwell is way way better
than Hopper and remember this is not ISO
chips this is ISO
power this is Ultimate Mo's law this is
what Moors law was always about in the
past and now here we are
25x in one generation as isop
power there is not ISO chips it's not
ISO transistors it's not ISO anything
ISO power the ultimate the ultimate
limiter there's only so much energy we
can get into a Data Center and So within
ISO power black well is 25 times now
here's the that
rainbow that's incredible that's the fun
part look all the different config every
underneath the Paro the frontier Paro we
call it the frontier Paro under under
the Frontier paredo are millions of
points we could have configured the data
center to
do we could have paralized and split the
work and sharded the work in a whole lot
of different
ways and we found the most optimal
answer which is the Paro the frontier
Paro okay the Paro Frontier and each one
of them because of the color shows you
it's a different
configuration which is the reason why
this image says very very clearly you
want a programmable architecture that is
as homogeneously fungible as fungible as
possible because the workload changes so
dramatically across the entire Frontier
and look we got on the top expert
parallel 8 batch of 3000 disaggregation
off Dynamo off in the middle expert
parallel 64 with with uh
uh oh the P the 26%
of 26% is used for context so so Dynamo
is turned on 26% context the other 64%
is 74% is not batch of 64 and expert
parallel of 64 on one expert parallel
four on the other and then down here all
the way to the bottom you got you got
tensor parallel 16 with expert parallel
4 batch of two 1% context the
configuration of the computer is changed
ing across that entire spectrum and then
this is what happen so this is with
input sequence length this is a kind of
a commodity test case this this is a
test case that you can Benchmark
relatively easily um the input is 1,000
tokens the output is 2,000 notice
earlier we just showed you a demo where
the output is very simply 9,000 right
8,000 okay and so obviously this is not
representative of just that one chat now
this one is more representative
and this is what you know the goal is to
build these next Generation computers
for Next Generation workloads and so
here's an example of a reasoning model
and in a reasoning model Blackwell is 40
times 40 times the performance of Hopper
straight
up pretty
amazing you know I've said before
somebody actually asked you know why
would I say that but I said before that
when Blackwell starts shipping in volume
you couldn't give Hoppers
away and this is what I mean and this
makes sense if anybody if you're still
looking to buy a hopper don't be afraid
I'm it's okay
but I'm the chief re
Revenue
Destroyer my sales guys are going oh
no don't say that
there are circumstances where Hopper is
fine that's the best thing I could say
about Hopper there are circumstances
where you're
fine not
many if I have to take a
swing and so that's kind of my point um
when the technology is moving this fast
uh you you and because the workload is
so intense and you're building these
things they are factories you we really
we really like you to to um uh uh to
invest in the right right versions okay
just to put it in perspective this is
what 100 megawatt Factory looks like
this's 100 megawatt Factory you have
based on Hoppers you have 45,000 dies
1,400 racks and it produces 300 million
tokens per second okay and then this is
what it looks like with
Blackwell you have 8 yeah I know
[Applause]
that doesn't make any
sense okay so so we're not trying to
sell you less okay our sales guys are
going Jensen you're selling them less
this is better okay and so so anyways um
the more you buy the more you save
it's even better than that now the more
you buy the more you make you know and
so so anyhow uh remember everything is
in the context everything now in the
context of AI factories and and although
we talk about the chips you always start
from scale up we talk about the chips
but you always start from scale up the
full scale up what can you scale up to
the to the maximum um I want to show you
now what an Factory looks like but AI
factories are so complicated I just gave
you an example of one rack it has
600,000
Parts you know it's 3,000 lb now you've
got to take that and connect it with a
whole bunch of others and so we are
starting to build what we call the
digital twin of every data center before
you build a data center you have to
build a digital twin let's take a look
at this this is just incredibly
beautiful
the world is racing to build
state-of-the-art large scale AI
factories bringing up an AI gigafactory
is an extraordinary feat of engineering
requiring tens of thousands of workers
from suppliers Architects contractors
and Engineers to build ship and assemble
nearly 5 billion components and over
200,000 Mi of fiber nearly the distance
from the Earth to to the moon the Nvidia
Omniverse blueprint for AI Factory
digital twins enables us to design and
optimize these AI factories long before
physical construction Starts Here Nvidia
Engineers use the blueprint to plan a 1
gwatt AI Factory integrating 3D and
layout data of the latest Nvidia dgx
superpods an advanced power and cooling
systems from verv and Schneider
Electric and optimize topology from
Nvidia air a framework for simulating
Network logic layout and protocols this
work is traditionally done in silos the
Omniverse blueprint lets our engineering
teams work in parallel and
collaboratively letting us explore
various configurations to maximizing TCO
and power usage
Effectiveness Nvidia uses Cadence
reality digital twin accelerated by Cuda
and Omniverse libraries to simulate air
and liquid cooling
systems and Schneider Electric with EAP
an application to simulate power block
efficiency and
reliability realtime simulation lets us
iterate and run large scale whatif
scenarios in seconds versus hours we use
the digital twin to communicate
instructions to the large body of teams
and suppliers reducing execution errors
and accelerating time to bring up and
when planning for retrofits or upgrades
we can easily test and simulate cost and
down time ensuring a futureproof AI
Factory this is the first time anybody
who builds data oh that's so
beautiful all right I got a race here
because I'm turns out I got a lot to
tell you and and so if I if I go a
little too fast it's not because I don't
care about you it's just I got a lot of
information to go through all right so
so uh first uh our road map uh we're at
we're now in full production of blackw
uh computer companies all over the world
are ramping these incredible machines at
scale and uh
uh I'm just so so uh pleased and and so
grateful that all of you worked hard on
uh transitioning into this new
architecture and now uh in the second
half of this year we will uh easily
transition into the upgrade so we have
the Blackwell Ultra mbink 72 uh you know
it's a one and a half times more flabs
it's you know it's got a new instruction
for attention it's one and a half times
more memory all that memory is useful
for uh things like KV cache it's you
know two times more bandwidth okay for
networking bandwidth and so you're going
to now that we have the same
architecture we'll just kind of
gracefully uh glide into that and that's
called Blackwell Ultra okay so that's
coming second half of this year now
there's a reason why we we uh this is
the only product announcement in any
company where everybody's going yeah
next and in fact that's exactly the
response I was hoping to get and and
here's
why look we're building AI factories and
AI infrastructure it's going to take
years of planning this isn't this isn't
like buying a laptop
you know this isn't a this isn't
discretionary spend this is spend that
we have to go plan on and so we have to
plan on having of course the land and
the power and and we have to get get our
our capex ready and we get engineering
teams and and we have to lay it out a
couple two three years in advance which
is the reason why I show you our road
map a couple two three years in advance
so that you we don't surprise you in May
you know hi you know in another month
we're going to go to this incredible new
system I'll show you an example in a
second and so we planned this out in
multiple years the next the next click
one year
out is named after an astronomer and her
uh her grandkids are here her name is
Vera Ruben she discovered Dark Matter
okay it's
yep Vera Vera ruin is incredible because
the CPU is new it's twice the
performance of great had more memory
more bandwidth and yet just a little
tiny 50 wat CPUs is really quite
incredible okay and Reuben brand new
GPU CX9 brand new networking smart Nick
MV link 6 brand new MV link brand new
memories hbm 4 basically everything is
brand new except for the chassis and
this way we could take a whole lot of
risk in One Direction
and not risk a whole bunch of other
things related to the infrastructure and
so Vera ruin mvlink 144 is the second
half of next year now one one of the
things that I made a mistake on and so I
just need you to make this pivot we're
going to do this one
time Blackwell is really two gpus in one
Blackwell chip we call that one chip a
GPU and that was wrong and the reason
for that is it it screws up all the MV
link nomenclature and things like that
so going forward without going back to
Blackwell to fix it going forward when I
say mvlink 144 it just means that it's
connected to 144 gpus and each one of
those gpus is a GPU die and it could be
assembled in some package how it's
assembled could change from time to time
okay and so each GPU dies the GPU each
MV link is connected to the to uh to the
GPU and so very Ruben link 144 and then
this now sets the
stage for the second half of the year
the following year we call Reuben
Ultra okay so ver Ruben
Ultra I
know this one that's where you should
you
go all right so so this is ver Ruben
Reuben Ultra second half of 27 it's MV
link 5 76 extreme scale
up each rack is 600
KW 25 million
Parts okay and obviously a whole lot of
gpus and uh everything is X factored
more so 14 times more uh more flops 15
exif flops instead of one xof flop as
you me as I mentioned earlier is now 15
exop flops scaled up exop flops okay and
and it's
300 what 4.6 pedy so
4,600 terabytes per second scale up
bandwidth I don't mean aggregate I mean
scale up bandwidth and of course lots a
brand new MV link switch and CX9 okay
and so notice um uh 16 sites four gpus
in one
package extremely large MV link I just
put that in perspective this is what it
looks like okay now this is this is this
is this is going to be fun so this you
are just literally ramping up Grace
black wall at the moment and I I don't
mean to make it look like a laptop but
here we go okay so this is what Grace
black wall looks like and this is what
Reuben looks
like ISO ISO Dimension and so this is
another way of saying before you scale
out you have to scale up does that make
sense before you scale up scale out you
scale up and then after that you scale
scale out with amazing technology that
I'll show you in just a second all right
so first you scale up and then now that
gives you a sense of the pace at which
we're moving this is the amount of scale
up flops this is scale up
flops Hopper is 1x blackw is 68 x Reuben
is 900x scale up flops and then if I
turn it into essentially your TCO which
is power on top power per and the
underneath is the is the area underneath
the curve that I was talking to you
about the square underneath the curve
which is basically flops times bandwidth
okay so the the way you think about a
very easy gut feel gut check on whether
your AI factories are making progress
is Watts divided by those numbers and
you can see that Reuben it's going to
drop the cost down tremendously okay so
that's very quickly nvidia's road
map once a year once a year like like
clock ticks once a year okay how do we
scale up well we introduced we were pre
preparing to scale out that was scale up
as MV link our scale out network is
infiniband and Spectrum X
most were quite surprised that we came
into the ethernet world and the reason
why we decided to do ethernet is if we
could help ethernet
become like infiniband have the
qualities of infiniband then the network
itself would be a lot easier for
everybody to use and manage and so we uh
decided to invest in Spectrum we call it
Spectrum X and we brought to it the
properties of of uh congestion control
and and um uh very low latency and uh
and amount of software that's part of
our Computing Fabric and as a result we
made spectrx incredibly high performance
uh we scaled up the largest single GPU
cluster ever as one giant cluster with
Spectrum X right and that was Colossus
and so there are many other examples of
it spectrx is is unquestionably a huge
home run for us one of the areas that
I'm very excited about is Spectrum X is
not just for AI clouds but Spectrum X
also makes it possible for us to help
help every Enterprise become an AI
company and so uh was it last week or
the week before uh Chuck rubbins and
Cisco and Nvidia announced a partnership
for Cisco the world's largest Enterprise
networking company to take Spectrum X
and integrated into their product line
so that they could help the world's
Enterprises become AI
companies we're at 100,000
um with cx8 CX7 now cx8 is coming cx-9's
coming and during Ruben's time frame we
would like to scale out the number of
gpus to many hundreds of
thousands now the challenge with scaling
out gpus to many hundreds of thousands
is the
connection of the scale
out on the connection on scale up is
copper we should use copper as far as we
can and that's you know call it a meter
or two and that's incredibly good
connectivity very low very high
reliability very good Energy Efficiency
very low cost and so we use copper as
much as we can on scale up but on scale
out where the data centers are now the
size of the stadium we're going to need
something U much uh longdistance running
and this is where silicon photonics
comes in the challenge of silicon
photonics has been that the transceivers
consume a lot of energy to go from
electrical
to photonic has to go through a CIS go
through a transceiver and a ctis a
several CIS okay so first of all we're
announcing nvidia's first co- packaged
option silicon photonic system it is the
world's first
1.6 terabit per second CPO it is based
on a technology called micro ring
resonator modulator it is completely
built with this incred aible process
technology at tsmc that we've been
working with for some time and and we
partnered with just a giant ecosystem of
Technology providers to invent what I'm
about to show you this is really crazy
technology crazy crazy technology now
the reason why we decided to invest in
mrm is so that we could prepare
ourselves using mrm's incredible density
and power better density and power
compared to moander which is used for
telecommunications when you when you um
uh drive from one data center to another
data center uh in telecommunications or
even in the transceivers that we use we
use Mo Xander because the density
requirement is not very high until now
and so if you look at look at um these
transceivers this is an example of a
transceiver they did a very good job
tangling this up for me
oh
wow thank
[Music]
you oh Mother of
God okay this is where you got to turn
reasoning
on it's not as easy as you think these
are squirly little things all right so
this this one right here this is 30
Watts just so keep you remember this 30
watts and and if you get it on if you
buy in high volume it's
$1,000 this is a plug on this side on
this side it's electrical on this side
is is Optical okay so Optics come in
through the the yellow you plug this
into a switch it's electrical on this
side there's uh transceivers lasers um
uh and a tech technology called moander
and uh uh incredible and so we use this
to go from the GPU to the
switch to the next
switch and then the next switch down and
then next switch down to the GPU for
example and so each one of these if we
had 100,000
gpus we would have
100,000 of this side and then another
you know 100,000 which connects the the
switch to the switch and then on the
other side I attribute that to the other
to the other Nick if we had
250,000 we'll add another layer of
switches and so each GPU every GPU
250,000 every GPU would have six
transceivers every GPU would have six of
these plugs and these six plugs would
add 180 watts per
GPU 180 watts per GPU and $6,000 per GPU
okay and so the question is how do we
scale up now to millions of
gpus because if we had a million gpus
multiply by
six right it would be uh million 6
million
transceivers times 30
Watts 180 megaw
of transceivers they didn't do any math
they just move signals around and and so
the question is how do we how could we
afford and as I mentioned earlier energy
is our most important commodity
everything is related ultimate to energy
so this is going to limit our revenues
the our customers
revenues by subtracting out 180 megaw of
power and so this is the this is the
amazing thing that we did we invented
the world's first
mrm micro mirror and this is what it
looks like there's a little uh wave
guide you see that on that wave guide
goes to a ring that ring resonates and
it controls the amount of reflectivity
of the wave guide as it goes around and
limits and modulates the uh energy that
the amount of light that goes through
and it shuts It Off by absorbing it or
pass it on okay turns the light this
direct continuous laser beam into ones
and zeros and that's the miracle and
that technology is then uh the photonic
IC is stacked with the electronic IC
which is then stacked with a whole bunch
of micro lenses which is stacked with
this thing called fiber array these
things are all manufactured using this
technology at tsmc called they call it
Coupe and um package using a 3D Coos
technology working with all of these
technology providers a whole bunch of
them the names I just showed you earlier
and it turns it into to this incredible
machine so let's take a look at the
video of it
[Applause]
[Music]
[Music]
[Music]
[Music]
just a technology
Marvel and they turn into these switches
our infin band switch the Silicon is is
working fantastically second half of
this year we will ship the the Silicon
platonic switch in the second half of
this year and the second half of next
year we'll ship the Spectrum X because
of the mrm choice because of the ible
technology risks that over the last 5
years that we did and filed hundreds of
patents and we've licensed it to our
partners so that we can all build them
now we're in a position to put silicon
photonics with co- package
options no
transceivers fiber direct fiber in into
our switches with a Radix of 512 this is
the this is the 512 ports this would
just simply not possible any other way
and so this is this now set our set us
up to be able to scale up to these multi
100,000 gpus and multi-million gpus and
the benefit just so you you you you
imagine this it's incredible in a data
center we could we could save tens of
megawatts tens of megawatts let's say 10
megawatt well let's let's say 60
megawatt 60 well 6 megawatts is 10
Reuben Ultra
racks 6 megaw is 10 Reuben Ultra
racks right and 60 that's a lot 100
Reuben Ultra racks of power that we can
now deploy into Rubin all right so this
is our road map once a year once a year
an architecture every every uh two years
a new product line every single year X
factors up and we try to take silicon
risk or networking risk or system
chassis risk um in in pieces so that we
can move the industry forward as we
pursue these incredible technology uh
Vera Rubin and uh I really appreciate
the the uh the grandkids for being here
uh this is our opportunity to recognize
her and and to honor her for the
incredible work that she did our next
generation will be named after Fineman
okay nvidia's road map let me talk to
you about
Enterprise Computing this is really
important in order for us to bring AI to
the world's
Enterprise first we have to go to a
different part of
Nvidia the beauty of gaan
Splats okay in order in order for us to
take AI to Enterprise take a step back
for a second and remind yourself this
remember Ai and machine learning has
reinvented the entire Computing stack
the processor is different the operating
system is different the applications on
top are different the way the
applications are different the way you
orchestrate it are different and the way
you run them are different let me give
you one
example um the way you access data will
be fundamentally different than the past
instead of retrieving precisely the data
that you want and you read it to try to
understand it in the future we will do
what we do with perplexity instead of
doing doing retrieval that way I'll just
ask perplexity what I want ask it a
question and it will tell you the answer
this is the way Enterprise it will work
in the future as well we'll have ai
agents which are part of our digital
Workforce there's a billion knowledge
workers in the world they're probably
going to be 10 billion digital workers
working with us side by side 100 % of
software engineers in the future there
are 30 million of them around the world
100% of them are going to be AI assisted
I'm certain of that 100% of Nvidia
software Engineers will be AI assisted
by the end of this year and so AI agents
will be everywhere how they run what the
what Enterprises run and how we run it
will be fundamentally different and so
we need a new line of
computers and this is what started it
all this is the Nvidia
djx1 20 CPU cores
128 gigb of GPU
memory one pedop flops of
computation
$150,000
3,500
Watts let me now introduce you to the
new dgx
this is nvidia's new
dgx
and we call it djx
spark djx
spark now you'll be
surprised 20 CPU cores
we partnered with mediatech to build
this for us they did a fantastic job
it's been a great joy working with Ricki
and the mediate Tech Team I really
appreciate their their partnership built
us a chipto chip MV link CPU to GPU and
now the GPU has 128
gbt and this is fun one pedop
flops so this this is this is like the
original
djx1 with pin
particles you would have thought that
that's a joke that would land at
GTC okay well here's 30 million there
are 30 million uh software engineers in
the world and you know tens 10 20
million data scientists and this is this
is now this is clearly the gear of
choice thank you
Jan look at this in every bag this is
what you should
find
right this is this is the development
platform of every software engineer in
the world if you
have a family member spouse somebody you
care
about who who uh who's a software
engineer or AI researcher or or you know
just data scientist and you would like
to give them you know what the perfect
Christmas
present tell me tell me this isn't what
they want huh and so ladies and
gentlemen today uh we'll let you res we
will ship we will Reserve we will
reserve the first djx Sparks for the
attendees of GTC so go reserve yours
you already have one of these so now you
just got to get one of
these all right the next so that's thank
you Janine the next one is also a brand
new computer one that the world's never
had before so we're we're announcing a
whole new line of computers this is a
new personal computer new personal
workstation I know it's crazy check this
out Grace Blackwell
liquid
[Music]
cooled this is what a PC should look
like 20 pedop
flops unbelievable 72 CPU
cores chipto chip interface hbm memory
and just just in case some PCI Express
slots for your uh gForce
okay so so this is called djx station
djx spark and djx station are going to
be available by all of the OEM HP Dell
Lenovo assus uh it's going to be
manufactured uh for data scientists and
researchers all over the world this is
the computer of the age of
AI this is what computers should look
like and this is what computers will run
in the future and we have a whole lineup
for Enterprise now from little tiny one
to to workstation ones the server ones
to uh supercomputer ones and these will
be available uh by all of our partners
we will
also revolutionize the rest of the
Computing stack remember Computing has
three pillars there's Computing you're
looking at it there's networking as I
mentioned earlier Spectrum X going to
the world's
Enterprise an AI Network and the third
is Storage storage has to be completely
reinvented
rather than a retrieval
based storage system is going to be a
semantics based retrieval system a
semantics spased storage system and so
the storage system has to be
continuously embedding information in
the background taking raw data embedding
it into knowledge and then later when
you access it you don't retrieve it you
just talk to it you ask it questions you
give it problems and one of the one of
the examples I wish we had a video of it
um but Aaron at box even put one up in
the cloud worked with us to put it up in
the cloud and it's basically you know a
super smart storage system and in the
future you're going to have something
like that in every single Enterprise
that is the Enterprise storage of the
future and we're working with the entire
storage industry really fantastic
Partners uh ddn and Dell and HP
Enterprise and Hitachi and IBM and net
app and neutronics and Pure Storage and
vast and W basically the the entire
world storage industry will be offering
this this stack for the very first time
your storage system will be GPU
accelerated and so somebody thought I
was I didn't have enough
slides and so Michael thought I didn't
have enough slides so he he said Jensen
just in case you don't have enough
slides can I just put this in there and
so this is Michael slides but but this
is this he sent this to me he goes just
in case you don't have any slides and I
I got too many slides but this is such a
great slide and and let me tell you why
in one single slide he's explaining that
Dell is going to be offering a whole
line of
Nvidia Enterprise it AI infrastructure
systems and and all the software that
runs on top of it okay so you can see
that we're in the process of
revolutionizing the world's Enterprise
we're also announcing today this
incredible model that everybody can run
and so I showed you earlier R1 a
reasoning model I showed you versus
llama 3 a non- reasoning model
and obviously R1 is much smarter um but
we can do it even better than that and
we can make it possible to be Enterprise
ready for any company and it's now
completely open source it's part of our
system we call Nims and you can download
it you can run it anywhere you can run
it on djx spark you you can run it on
dgx station you can run on any of the
servers that the the oems make you can
run it in the cloud you can integrate
into any of your agentic AI Frameworks
and we're working with companies all
over the world and I'm going to flip
through these so watch very carefully
I've got some great Partners in the
audience want to recognize Accenture
Julie SED and her team are building
their AI Factory and their AI framework
uh AMD dos the world's largest
telecommunication software company uh
AT&T John Stanky and his team uh
building an AT&T AI system agentic
system Larry think and uh Black Rock
team building theirs uh Annie Roode uh
in the future not only will we hire ASC
designers we're going to hire a whole
bunch of digital ASC designers from
anude Cadence that will help us design
our chips and so Cadence is building
their uh AI framework and as you can see
in every single one of them there Nvidia
models Nvidia Nims and viia libraries
integrated throughout so that you can
run it on Prem in the cloud any Cloud uh
Capital One one of the most advanced
financial services companies and using
technology has Nvidia all over it uh
deoe Jason and his team Ian y Janet and
his team NASDAQ and Adena and her team
uh integrating Nvidia technology into
their AI Frameworks and then Chris Jen
and his team at sap uh bill mcder and
his team at service
now that was pretty good huh
first this is one of those Keynotes
where the first slide took 30 minutes
and then all the other slide took 30
minutes all right so so next let's go
somewhere else let's go talk about
robotics shall
[Music]
we let's talk about
robots well the time has come the time
have has come for
robots uh robots have the benefit the
benefit of being able to interact with
the physical world and do things that
otherwise digital information cannot we
know very clearly that the world is has
severe shortage of of human labors human
workers by the end of this decade the
world is going to be at least 50 million
workers short we'd be more than
delighted to pay them each $50,000 to
come to work we're probably going to
have to pay robots $50,000 a year to
come to work and so this is going to be
a very very large industry there are all
kinds of robotic systems your
infrastructure will be robotic billions
of cameras and warehouses and factories
10 20 million factories around the world
every car is already a robot as I
mentioned earlier and then now we're
building General robots let me show you
how we're doing
[Music]
that everything that moves will be
autonomous physical AI will embody
robots of every kind in every industry
three computers built by Nvidia enable a
continuous loop of robot AI simulation
training testing and Real World
Experience training robots requires huge
volumes of data Internet scale data
provides common sense and reasoning but
robots need action and control data
which is expensive to
capture with blueprints built on Nvidia
Omniverse and Cosmos developers can
generate massive amounts of diverse
synthetic data for training robot
policies first in Omniverse developers
aggregate real world sensor or
demonstration data according to their
different domains robots and tasks then
use Omniverse to condition Cosmos
multiplying the original captures into
large volumes of photoreal diverse
data developers use Isaac lab to
post-rain the robot policy IES with the
augmented data set and let the robots
learn new skills by cloning behaviors
through imitation learning or through
trial and error with reinforcement
learning AI
feedback practicing in a lab is
different than the real
world new policies need to be field
tested developers use Omniverse for
software and Hardware in the loop
testing simulating the policies in a
digital twin with real world
environmental Dynamics with domain
randomization physics feedback and High
Fidelity sensor
simulation real world operations require
multiple robots to work together Mega
and Omniverse blueprint lets developers
test fleets of post-train policies at
scale here foxc contests heterogeneous
robots in a virtual Nvidia Blackwell
production facility
as the robot brains execute their
missions they perceive the results of
their actions through sensor simulation
then plan their next action Mega lets
developers test many robot policies
enabling the robots to work as a system
whether for spatial reasoning navigation
Mobility or
dexterity amazing things are born in
simulation today we're introducing
Nvidia Isaac Groot
N1 Groot N1 is a generalist Foundation
model for humanoid
robots it's built on the foundations of
synthetic data generation and learning
in
simulation Groot N1 features a dual
system architecture for thinking fast
and slow inspired by principles of human
cognitive
processing the slow thinking system lets
the robot perceive and reason about its
environment and instructions and plan
the right actions to take the fast
thinking system translates the plan into
precise and continuous robot actions
Groot n1's generalization lets robots
manipulate common objects with ease and
execute multi-step sequences
collaboratively and with this entire
pipeline of synthetic data generation
and robot learning humanoid robot
developers can post-train Gro N1 across
multiple embodiments and tasks across
many
environments around the world in every
industry developers are using nvidia's 3
computers to build the next generation
of embodied AI
[Music]
physical Ai and
Robotics are moving so fast everybody
pay attention to this space this could
very well likely be the largest industry
of all at its core we have the same
challenges as I mentioned before there
are three that we focus on they are
rather systematic
one how do you solve the data
problem how where do you create the data
necessary to train the AI two what's the
model architecture and then
three what's the scaling loss how can we
scale either the data the compute or
both so that we can make AIS smarter and
smarter and smarter how do we scale and
those two those fundamental problems
exist in robotics as well in
robotics we created a system called
Omniverse it's our operating system for
physical a you've heard me talk about
Omniverse for a long time we added two
technologies to it today I'm going to
show you two things one of them is so
that we could scale AI with generative
capabilities and generative model that
understand the physical world we call it
Cosmos using
Omniverse to condition Cosmos and using
Cosmos to generate an infinite number of
environments
allows us to create data that is
grounded grounded controlled by
us and yet be systematically infinite at
the same time okay so you see Omniverse
we use candy colors to give you an
example of us controlling the robot in
the scenario perfectly and yet o Cosmos
can create all these virtual
environments the second thing just as as
we were talking about earlier one of the
incredible scaling capabilities of
language models today is reinforcement
learning verifiable rewards the question
is what's the verifiable rewards in
robotics and as we know very well is the
laws of physics verifiable physics
rewards and so we need an incredible
physics engine well most physics engines
have been designed for a variety of
reasons they could be designed because
we wanted to use it for large
machineries or uh maybe we design it for
uh virtual worlds video games and such
but we need a physics engine that is
designed for very fine
grain rigid and soft bodies designed for
being able to train tactile feedback and
fine motor skills and actuator controls
we needed to be GPU accelerated so that
we these virtual worlds could live in
super linear time super real time and
train these AI models incredibly fast
and we needed to be integrated
harmoniously into a framework that is
used by roboticist all over the world
Moko and so today we're announcing
something really really special it is a
partnership of three
companies Deep
Mind Disney research and Nvidia and we
call it
Newton let's let's take a look at Newton
[Music]
tell me that wasn't amazing
hey
blue how you
doing how do you like how do you like
your new physics engine you like it
huh yeah I bet I know tactile
feedback rigid body soft body
simulation super real
time can you imagine just now what you
were looking at is complete real time
simulation this is how we're going to
train robots in a future
uh just so you know blue has uh two
computers two Nvidia computers
inside look how smart you
are yes you're
smart
okay all right hey blue listen how about
let's take them home let's finish this
keynote it's
lunchtime are you ready let's finish it
up we have another announcement to
you're good you're good just stand right
here stand right here stand right
here all right good right
there that's good all right
Stan okay we have another amazing
news I told you the progress of our
robotics has been making
progress and today we're announcing that
Groot
N1 is open
sourced I want to thank all of you to
come let's wrap up I want to thank all
of you for coming to GTC we talked about
several things one Blackwell is in full
production and the ramp is incredible
customer demand is incredible and for
good reason because there's an
inflection point in AI the amount of
computation we have to do in AI is so
much greater as a result of reasoning Ai
and the training of reasoning AI systems
and agent agentic Systems Second
Blackwell mvlink 72 with Dynamo is 40
times the performance AI Factory
performance of Hopper and inference is
going to be one of the most important
workloads in the next decade as we scale
out AI third we have an
annual annual rhythm of road maps that
has been laid out for you so that you
could plan your AI infrastructure and
then we have two we have three AI
infrastructures we're building AI
infrastructure for the cloud AI
infrastructure for Enterprise and AI
infrastructure for
[Music]
robots we have one more treat for
you play it
[Music]
[Music]
[Music]
[Music]
a
[Music]
[Music]
[Music]
[Music]
[Music]
thank you everybody thank you for all
the partners that made this video
possible thank you everybody that made
this video possible have a great GTC
thank
you hey
blue let's go home good
job good little
[Music]
man thank you I love you too thank you
Loading video analysis...