NVIDIA’s New AI’s Movements Are So Real It’s Uncanny
By Two Minute Papers
Summary
## Key takeaways - **Motion Imitation: The Force & Torque Problem**: Copying human movement in computer programs is impossible because motion capture data shows what to do but not how to apply the necessary forces and torques at every joint and time instance. [00:07], [00:38] - **DeepMimic: Motion Imitation as a Video Game**: DeepMimic turned motion imitation into a game where virtual characters learned to mimic motion capture data by maximizing scores for joint angles and contacts through endless retries. [00:48], [01:20] - **DeepMimic's Limitation: Manual Score Tuning**: DeepMimic required extensive manual tuning of hundreds of score counters for each joint and movement, making it difficult to adapt to new motions or body types. [02:36], [03:02] - **ADD: AI Judge for Automatic Motion Imitation**: The ADD system uses an AI judge to automatically determine what a perfect performance looks like, eliminating the need for manual score tuning and improving motion realism. [03:43], [04:08] - **ADD vs. DeepMimic: Performance Comparison**: While initial tests showed similar results, ADD significantly outperformed DeepMimic and other methods in complex movements like parkour jumps and climbing, producing fluid and believable motions. [04:39], [06:08] - **ADD's Versatility and Limitations**: ADD works on various body morphologies, controls robots, and can perform diverse behaviors, though it may struggle with extremely flashy tricks, sometimes causing the AI to give up. [06:53], [08:46]
Topics Covered
- Why Copying Human Motion Is "Impossible."
- Hand-Tuning AI Motion: A PhD Nightmare?
- Why AI Judges Outperform Manual Motion Tuning.
- Why Groundbreaking AI Research Goes Unnoticed.
- The "First Law of Papers": Predicting AI's Evolution.
Full Transcript
There is one thing that I really want. What I want is a digital character to move exactly like a
human. Well, just copy it then, right? Do a little motion capture with the sensors on the human body,
record it running, jumping, reading papers, and then, copy it within a computer program. Well,
unfortunately, that is impossible. Okay, why? Well, this motion capture data shows you what
you need to do, but it does not tell you how. You see, these virtual characters
have muscles and joints, and to copy these movements, you would need to come up with
the forces and torques exerted everywhere at every time instance to be able to mimic it.
Oh man, that is hard. Really hard.
But, this amazing paper from 2018 could already do that. It was called DeepMimic,
and goodness, this…am I seeing correctly? This really is able to match the reference motions
superbly. Is that a word? I don’t know. I don’t care. We talked about this work
approximately 500 videos ago. Yes I heard you. I heard what you just said. And the answer is yes,
we’ve been around that long. Almost a 1,000 paper videos now.
Okay, DeepMimic. This worked by turning motion imitation into a video game where every joint,
angle, and contact had its own little score counter. The controller played
this game over and over, tweaking its moves through endless retries,
until it learned how to rack up the maximum score. Which is perfect imitation
of the motion capture performance. And boy, does it look perfect in places.
But it got better, it worked on a bunch of different body morphologies as well. And you
could even do the favorite pastime of the computer graphics researcher: throwing boxes at it, for how
long? Until it collapses of course. You could even do some art direction where you would ask it to
dance a bit more vigorously. Yes, more life, more life, more energy! Oh baby. It looks terrible but
I still love it. And now, you’re tired after doing all this kung fu. Yup, that one checks out too.
So, is this work perfect? It sure seems so! Well,
it is not. Here is a dirty little secret, but don’t tell anyone.
So here’s the problem: every single one of those little score counters in DeepMimic
had to be designed and tuned by hand. You had to decide exactly how many points to give for
matching a knee angle, how harshly to punish a wobbly torso, and how much to care about foot
placement versus balance. Change the motion, or even switch to a robot body, and suddenly
all your scores are wrong again. You spend days turning invisible dials just to get something
that doesn’t collapse into a breakdancing mess. Just to get a flavor of the paper,
you have to optimize these: joint rotations, velocities, root velocity, end effectors that
describe where hands and feet should go, and center of mass. Yum yum yum. Tastes great,
but man, that’s a lot to optimize. And you have to do it by hand, manually? Oof. This system is
held together by duct tape and the tears of PhD students. There’s got to be a better way!
And that’s where this new paper, ADD, the Adversarial Differential Discriminator comes
in. Instead of hard-coding hundreds of score counters, it introduces an AI judge that learns
automatically what a perfect performance looks like. The system plays the same imitation game,
but now, instead of manually juggling separate scores for elbows, knees,
and toes, the judge gives a single verdict on how close the motion feels
to the real human one. As training goes on, this judge gets smarter,
focusing attention on the parts that still look off and pushing the character to refine them.
So, previous DeepMimic lots of hand-tuning, new ADD, single automatic AI judge.
Okay, good. But that does help? Hmm… they say this is better than DeepMimic? Well, I’d like
to see that! And, well well well. I am not seeing that. With pink, we got DeepMimic, the previous
method and with blue, ADD, the new technique. Both are nailing the problem, but it’s not better. So,
have we been deceived by the marketing department again? I swear I’m not buying more car insurance.
Okay, let’s calm down and see if this next test is better. Here is the reference movement. A
little parkour! Loving it. So, the previous AMP method does exactly what we all feel like at the
moment. Disqualified. Get out of here. And now, DeepMimic is about to nail it, as always. Wait a
second…that’s not it, sir. You need a little more carbohydrates before working out! Wow,
it failed. And if you think this is a failure case, now check this out. Now off with you,
eat a nutrition bar while we watch the reference motion do the jump. Okay, got it? Got it. So now,
start running, and…oof! Sir. Sir! Are you okay? Goodness, failed again, even worse.
Now hold on to your papers Fellow Scholars and let’s see if the new
technique with the AI judge is any better, and…oh my goodness. They did
it. What about the climbing? Let’s see…absolutely nailed that too. Wow!
The motions are really fluid and believable,
and physically correct. It controls all the joints correctly. That is really tough.
And yes, I know what you want. You also want to see the earlier low energy AMP do the jump too.
Your wish is now granted. Ain’t no jumping here brother. They don’t pay me enough. And
believe it or not, it’s going to get even better. Here’s how. Dear Fellow Scholars,
this is Two Minute Papers with Dr. Károly Zsolnai-Fehér. Dr. Carroll.
Oh yeah! We still retain DeepMimic’s wonderful
property of working on different body morphologies. I am starting to flip out.
So, the crowd favorite, the walking sausage man, not a problem. ADD does as well as hand-tuned
techniques, but this one works automatically. It can control a robot too, it can fall and get up.
It can do a variety of different behaviors, karate, jumping on one leg, walking an invisible
dog. And at this point, I am so happy, this is basically me. So good! Mother of Papers.
Here we also have an ablation study, which takes out each individual puzzle
piece that they invented, and they show how the system breaks when you remove
these individual puzzle pieces. In short, they show one by one that everything they
invented here is indeed useful. Not just a soup of stuff that works, but they show
that each piece is necessary. I still can’t believe that they’ve actually done it. Wow!
And now, before I tell you about the limitations of this new method,
look at this. Oh my goodness. Is this for real? This research paper is an
absolutely marvelous piece of work, and nearly nobody is talking about it. Wait a
minute. Not nearly nobody. Actually nobody is talking about it. Let me try to help.
So this is why I feel that talking about these research works is so important,
it’s a bit like saving endangered species. If we don’t do it here, nobody does it. So please, save
a paper today, like this video, subscribe, hit the bell icon, and leave a really kind comment.
Okay, so it’s not flawless, of course. Sometimes the AI judge gets confused on
the flashier tricks - instead of pulling off a smooth backflip, the poor thing just lies down
and gives up halfway. It’s a bit like a dance judge who judges the waltz well but freezes
when the performer suddenly tries parkour - you know, still learning what “graceful”
means when gravity is involved. Reminds me of an earlier paper where an AI player
collapsed, get this, in a way that reprogrammed the mind of the other AI to lose. Insanity.
But just think about it. AI systems now don’t just imitate motion,
they actually understand how we move around. And just two more papers down the line,
and I am sure these digital creatures will learn to move with the same grace
and intent as living ones. That is the First Law of Papers. Do not look at where we are,
look at where we will be two more papers down the line. What a time to be alive!
So, digital ninjas, motion shapers,
smoother than your graphics shaders - subscribe to Two Minute Papers!
Loading video analysis...