TLDW logo

Ilya vs. Google - The ONE Number That Decides Who's Right

By AI News & Strategy Daily | Nate B Jones

Summary

Topics Covered

  • Benchmarks mask poor generalization
  • LLMs grind contests, humans generalize
  • Emotions provide instant value functions
  • Scaling era ends, research restarts
  • AGI means superintelligent learner

Full Transcript

Ilia Sutskiver went on the Dwaresh podcast. I think everybody should pay attention to the 96minute podcast, but we don't all have 96 minutes. So, this in 10 minutes or so is what Ilia talked about and why it matters. The first big point to call out, Ilia is calling out what many of us have seen and I'm so glad to hear it from him. These models are smarter on paper than they are in practice. So, Ilia starts from that contradiction, right? He says, "We're living in what should be a science

fiction moment. trillions of parameters in our models. The labs are spending on the order of 1% of GDP, yet models will still feel unreliable where it matters. Benchmarks might say genius and everyday users might say useful idiot. The the example he gives I love from vibe coding is when you tell it to fix a bug, it fixes the bug and it reintroduces a bug. You tell it to fix that bug, it reintroduces the old bug and you go back and forth. Ilia points the finger at

fiction moment. trillions of parameters in our models. The labs are spending on the order of 1% of GDP, yet models will still feel unreliable where it matters. Benchmarks might say genius and everyday users might say useful idiot. The the example he gives I love from vibe coding is when you tell it to fix a bug, it fixes the bug and it reintroduces a bug. You tell it to fix that bug, it reintroduces the old bug and you go back and forth. Ilia points the finger at

training for this. He says pre-training is a very blunt instrument. You ingest all this text and what do you do with it? Right? And and the refinements, the distortions, the skewing happens during reinforcement learning and post-training. And labs will divi design reinforcement learning environments to optimize for public benchmarks. And humans end up being reward hackers in this situation. Instead of the models gaming the reward, the researchers build training setups that just optimize for

benchmark scores. And so when you combine that with poor generalization, you get models that look really good on tests and they can be really brittle when you step off the evaluation manifold or the evaluation part of the model. Now I want to call out here that this is something that we see not just in one model but to differing degrees in different models. And so one of the signs of an excellent model is that it does generalize better than other models. And that's one of the ways that

benchmark scores. And so when you combine that with poor generalization, you get models that look really good on tests and they can be really brittle when you step off the evaluation manifold or the evaluation part of the model. Now I want to call out here that this is something that we see not just in one model but to differing degrees in different models. And so one of the signs of an excellent model is that it does generalize better than other models. And that's one of the ways that

you can tell you are in one of the top two or three models in the world. Shad GPT2 5.1 thinking, Gemini 3, Claude Opus 4.5. These are all models that generalize relatively well. And one of the signs of a model that doesn't generalize well is when you give it a new task like that famous Christmas tree test I gave it and it just falls apart. So Kimmy K2 thinking is a good example here. I would argue Grock 4 also does not generalize as well. But the point is not to point a finger at a model. The

point is to say that we're talking about gradations here, but all models do struggle with this. It's not like there's a model that's perfect and doesn't struggle with this. Ilia's second point is about generalization. The deepest technical claim that Ilia makes to Dwarkesh is that models generalize dramatically worse than people. they they need a lot more data to reach competence and when you move them to a new domain they fail in ways that a reasonably bright teenager

wouldn't. And so he talks about this idea. Imagine a student who grinds for 10,000 hours on contest problems and another one who does a 100 focused hours and gets good and moves on. The grinder might win contests. The second person is the one you'd bet on in life. And so what he's suggesting is today's LLMs are kind of like that teenager that grinds for 10,000 hours on contest problems and is highly specialized. And so what Ilia is looking for is a degree of sample

wouldn't. And so he talks about this idea. Imagine a student who grinds for 10,000 hours on contest problems and another one who does a 100 focused hours and gets good and moves on. The grinder might win contests. The second person is the one you'd bet on in life. And so what he's suggesting is today's LLMs are kind of like that teenager that grinds for 10,000 hours on contest problems and is highly specialized. And so what Ilia is looking for is a degree of sample

efficiency here. He's looking for the equivalent of the 15-year-old kid who has seen orders of magnitude less data than a frontier model, yet is more robust across everyday tasks and can learn something like driving in roughly 10 hours with no explicit reward function. the teenager shows up with an internal sense of this seems dangerous or this seems fine. Now, we might say some teenagers don't do as well as at that as others, but here we are. But the idea is that the teenager learns, right?

efficiency here. He's looking for the equivalent of the 15-year-old kid who has seen orders of magnitude less data than a frontier model, yet is more robust across everyday tasks and can learn something like driving in roughly 10 hours with no explicit reward function. the teenager shows up with an internal sense of this seems dangerous or this seems fine. Now, we might say some teenagers don't do as well as at that as others, but here we are. But the idea is that the teenager learns, right?

The model doesn't learn. And so, Ilia's view is that we need a machine learning principle that's kind of like that, that's kind of like humanlike generalization, something beyond a bigger transformer and more tokens. This is sharply divergent from the view at Google. And I I cannot underline that enough. This is me popping into the summary here. The view at Google, especially postGemini 3, is the opposite of what Ilia is saying. It's one of the biggest tensions in computer science and

AI right now. Google has said in so many words, pre-training is fine, post-training is fine. We see no limits to scale. We just ship Gemini 3 and it's really good. And you know what? Gemini 3 is really good. And so I think one of the really interesting tensions or counterbats right now is who's right here? Ilia keeps doubling down and saying we have challenges with pre-training and post-raining. There's something missing from these models and other labs keep shipping models based on

pre-training and post-raining that keep getting better and better. I'm not smart enough to decide who's right, but you should be aware that there's big disagreement among basically the leading lights of AI around how this works. Third point from Ilia, value functions and emotions. So one of the things that Ilia calls out is that you need to think about how human learning looks different very deeply to understand how to bring it to machines. So he cites a case where

a patient has lost emotional processing but kept IQ and language. On paper, that person will still score fine, but in everyday life, they become almost incapable of making decisions. So for Ilia, this is evidence that emotions are not decorative. They're built in. They have what he calls a value function. So emotions are a simple robust signal about how good or bad a situation is. And long before you get an explicit success failure outcome, your gut knows. And Ilia takes that seriously and he

maps it back to reinforcement learning. And he says at the end of the day, reinforcement learning only arrives at the end of an episode, right? And that's extremely inefficient because the value function estimates at each moment how promising the future looks. So if you have a gut sort of pit of fear in your in your stomach and you say don't walk down the dark alley, that is the opposite of the way reinforcement learning works. And Ilia's taking that seriously. I know this sounds silly, but

Ilia doesn't think it's silly. What he's calling out is that we have a value function in our emotions. that m that pit of fear, that intuition that this is the right call and that that projects into the future and helps us to make really good decisions. Whereas reinforcement learning is fundamentally backwards looking and only rewards past activities. That gap Ilia thinks is at the heart of why human learning scales differently. That is an original thought. I think that's a really

interesting take. Number four, Ilia claims the scaling era is over in a way that matters. Again, completely opposed to Google's view. Ilia is saying that we have three periods right now in AI. We have an early age of research when people tried all kinds of models but had very limited compute. We had the age of scaling that started with GPT where the recipe was clear and everyone piled in. And we have the coming age he claims of research and this time it's with huge

interesting take. Number four, Ilia claims the scaling era is over in a way that matters. Again, completely opposed to Google's view. Ilia is saying that we have three periods right now in AI. We have an early age of research when people tried all kinds of models but had very limited compute. We had the age of scaling that started with GPT where the recipe was clear and everyone piled in. And we have the coming age he claims of research and this time it's with huge

computers. Scaling laws created a low-risk playbook. If you had capital you could effectively convert it into better benchmark numbers. That is the era he claims is finished. And he says that's finished because he says webscale data is finite. This is not a new claim if you've been following Ilia. He made the same claim at Nurips a year or two ago. And what's interesting is that other model makers are claiming they can continue to scale pre-training with other means including synthetic data. So

computers. Scaling laws created a low-risk playbook. If you had capital you could effectively convert it into better benchmark numbers. That is the era he claims is finished. And he says that's finished because he says webscale data is finite. This is not a new claim if you've been following Ilia. He made the same claim at Nurips a year or two ago. And what's interesting is that other model makers are claiming they can continue to scale pre-training with other means including synthetic data. So

again, there's a lot of disagreement about whether Ilia is correct that the scaling era is over. And that, if you're wondering, is a really healthy sign for the AI ecosystem. Bubbles become dangerous when no one can disagree. The fact that these incredibly intelligent folks building AI systems have important areas, areas where they disagree, is super positive for all of us. We get to enjoy the benefits as they work it out. Takeaway number five, SSI strategy, which is the company he founded, is

research first. And so this explains why he's done this, right? if he believes the research era is just beginning, he's raised on the order of $3 billion and basically he has no consumerf facing business. And he argues that that's a benefit because he has no tax to serve customers, which is a really interesting claim for someone from Silicon Valley is not having customers is great. That one was a little bit surprising to me, but that's where he's at. And so he's

research first. And so this explains why he's done this, right? if he believes the research era is just beginning, he's raised on the order of $3 billion and basically he has no consumerf facing business. And he argues that that's a benefit because he has no tax to serve customers, which is a really interesting claim for someone from Silicon Valley is not having customers is great. That one was a little bit surprising to me, but that's where he's at. And so he's

claiming it that that that this is an age of research company. And the bet is not that we'll outscale a open AI, but that we have a different picture of how generalization should work. And if we have enough compute, we can see if the picture is correct. Essentially, he has a thesis for how artificial general intelligence might work. And he wants to lay that out. Now, speaking of artificial general intelligence, one of the things that Ilia calls out is that

we need to redefine what we mean by AGI. The usual definition, a system that can do every human job, is in Ilia's view very misleading. Because by that standard, humans themselves are not artificial general intelligences. No one emerges from childhood able to perform every job. Intelligence as we see it is really about learning. It's the general learner that can pick things up quickly that matters, not a static catalog of skills. This is why I believe humans will do well in the age of AI. Ilia's

preferred object is the super intelligent learner. Think of like a super capable 15-year-old mind that can learn any job much faster and more deeply than a human. That's what's in his head. That's not what he's invented. That he hasn't figured that out yet. Nobody has. That is the challenge he has set himself. And so his goal is to spin up many copies of this learner, drop them into different roles, see how they specialize, see how they actually evolve. And that leads to functional

super intelligence via parallel continual learning, not one final all- knowing training run. And this is the scenario he's trying to construct is this sort of data center of super intelligent learning systems that continue to learn and converge over time. He has no idea how long this is going to take guys. He gave a timeline of 5 to 20 years with which with his researcher for I don't know. Takeaway number seven alignment. Why did I shift toward incremental deployment? He makes

a really interesting point here. Ilia suggests essentially that before when he thought of the idea that you could deploy a system and it would rapidly take over the economy, he was reasoning about systems no one had created. That has been one of my biggest critiques of people who reason about super intelligence is we don't have that system. It's really hard to make big assumptions about it. Ilia agrees. Ilia says, "We can't reason about a system we haven't met." And so I think the safest

thing we can do is incrementally deploy systems and learn from them. Now, ironically, he just got done saying that safe super intelligence will not be deploying systems. So, I guess he's depending on OpenAI and others to do this, but the idea, I think, is sound. The idea is that you can incrementally deploy a system that is increasingly more powerful and gradually learn about it and learn how to manage it and learn how to work with it and then you have much more grounded sense of the risk

than you would if you just started reasoning theoretically about Terminator. Right? Takeaway number eight, multi- aent setups and why ecosystems are the real moat. So toward the end of the talk with Doresh, he talked about the idea that frontier models tend to play games with one another. They tend to play games with themselves. They tend to have a sense of negotiation and strategy that is defined within an adversarial multi- aent schema. This this if this sounds complicated, don't

worry. It's going to get simpler here. What Ilia is basically saying is that we have a bit of a problem with our current crop of agents and models in that they are intentionally setting up post-training environments that encourage models toward a very narrow range of agent strategies and that leads toward less diversity and creativity in our AI agents. He wants to see more diversity, incentives, and competition so that agents are rewarded for finding genuinely different strategies instead

worry. It's going to get simpler here. What Ilia is basically saying is that we have a bit of a problem with our current crop of agents and models in that they are intentionally setting up post-training environments that encourage models toward a very narrow range of agent strategies and that leads toward less diversity and creativity in our AI agents. He wants to see more diversity, incentives, and competition so that agents are rewarded for finding genuinely different strategies instead

of repeating versions of the prisoners dilemma or some other known agent strategy forever. And so he thinks that hints at another layer of differentiation not around who has the biggest model, but who has the in most interesting, richest training ecosystem of tools and agents and games to get really interesting results out of the machine learning models. I think that's a really interesting point and that is a really interesting idea of remote. Number nine, Ilia thinks that research

has a sense of taste. So for him, the idea of taste is it's a top- down aesthetic about how intelligence ought to work anchored in the brain but at a level of abstraction that allows you to work technically. Essentially having an opinion grounded in reality on intelligence. By that definition, I don't know that I have taste or you have taste. Only a few people have taste. But that being said, the key is understanding intelligence in a way that is differentiated from your peers allows

you to take a genuinely different approach to a tough problem. Remember at the beginning of this talk, Ilia was saying that he thinks these models don't generalize or learn well. And I think most people would agree. In that case, you sort of have to branch out and try different research methods to really solve that hard problem. That is what he's calling research taste. That is the whole talk. Before I let you go, I'm going to give you five takeaways almost no one is talking about. Real quick,

we'll take a minute or two here. Number one, general generalization sits underneath alignment. So, if you don't understand how your system generalizes, you cannot expect its values to generalize in a stable way. Most public discourse will treat alignment as something that you slap on the top of a model. Ilia is implicitly arguing alignment is underneath and generalizing helps the model to scale those values. I think that's really interesting. Takeaway number two, business can boom

even if research is stalling. So Ilia's stallout picture, which we may or may not agree with, Google disagrees. He doesn't think it means all of this collapses. He's not predicting a pop of the bubble. He's predicting hundreds of billions of dollars in revenue, products that feel impressive, a research frontier that is maybe not advancing human level learning, but is interesting. And so that scenario is likely and it creates a lot of pressure to declare that the problem is solved,

even if in Ilia's view, we haven't really solved for learning. And so one of the things that Ilia worries about, ironically, is not the bubble popping. It is business booming while the bubble doesn't matter anymore because we declare the problem solved because business does so well. and the really interesting research problems around generalization get ignored. That brings me to the third non-obvious takeaway. The AGI moment is the wrong focal point. And so framing everything as a single

arrival date, as AI 2027 tempts us to do, obscures what matters. When we get human level trainees with shared memory and they're developing quickly, that's a much more actionable way to think about it than when we set a wake up date, right? And so I think one of the things that Ilia calls out that's really interesting is maybe the maybe the functional way of talking about general intelligence is actually to talk about when agents are able to start learning in useful ways. And it's funny to me

that we say this because again Anthropic just published a paper basically saying agents are amnesiacs with tools. We are a ways away even if we can make lots of money and implement them in very successful ways. And I think that's one of the larger takeaways I have here is that Ilia is calling out how far away we are from the larger vision even as we're profoundly successful with the models we have. The last one I want to call out is that Ilia is suggesting that research

taste is a strategic asset that is incredibly rare. He's saying a handful of people in the world will decide which directions to pursue and which to kill. And this gives color to why folks like Mark Zuckerberg are willing to pay any amount of money to buy the right intelligence. A human who can determine how to think about artificial general intelligence in a useful way, a novel way, guide a new research direction is priceless. Literally priceless. We can't

put a price on it. People are trying to just inflate numbers away. Don't think of this as a status report from the OpenAI's former co-founder, right? Think instead of Ilia coming back from taking time at safe super intelligence looking at the field as a whole and trying to give his sense of where we are in this ongoing journey that he has helped to shape. He thinks that the scaling phase of AI is ending. Time will tell, right? Like it maybe we will sit here in a year

and say Gemini 3 was the last big pre-trained run. Was right. Maybe we'll sit here and we'll think, well, Ilia must have missed something because the pre-training models continue to scale. But either way, Ilia has made a really interesting point about the kinds of challenges that we need to solve. And I think indirectly he's cast light on where we need to focus to compensate for today's AI agents. Where we need to focus to help today's AI agents work usefully and harness. Memory is a big

one. Ability to learn how you handle tool calls. Those all fall out of some of the brittleleness that Ilia called out to Dwarves. So I hope you enjoyed this summary and uh best of luck. I guess we'll see who's right in the race for super intelligence.

one. Ability to learn how you handle tool calls. Those all fall out of some of the brittleleness that Ilia called out to Dwarves. So I hope you enjoyed this summary and uh best of luck. I guess we'll see who's right in the race for super intelligence.

Loading...

Loading video analysis...