From Vibe Coding to Vibe Researching: OpenAI’s Mark Chen and Jakub Pachocki

By a16z

Summary

## Key takeaways - **GPT-5 Fuses Fast Replies with Reasoning**: GPT-5 brings reasoning into the mainstream by fusing instant response models like GPT-4 with long-thinking O-series models, identifying the right amount of thinking for any prompt to deliver agentic behavior by default. [01:26], [02:08] - **Benchmarks Saturated; Shift to Economic Discovery**: Evals like math competitions are saturating at 98-99%, so progress now targets actual discovery of new ideas with economic relevance, preparing models as top researchers. [02:44], [04:23] - **Automated Researcher: Vibe Researching Next**: The big target is an automated researcher automating discovery of new ideas, extending from vibe coding—high schoolers' default—to vibe researching over 1-5 hour horizons. [00:00], [07:18] - **RL Keeps Surprising Despite Skeptics**: Reinforcement learning keeps delivering continuous improvements despite predictions of plateauing or mode collapse, combining with language modeling for robust execution in rich environments. [12:42], [14:20] - **Coding Models Enable 30-File Refactors**: Latest coding models handle messy real-world coding, perfectly executing a 30-file refactor in 15 minutes, transforming default coding to 'vibe coding' better than human competitive coders. [19:21], [21:11] - **Hire Cave-Dwellers Who Solve Hard Problems**: Seek 'cave-dwellers' not visible on social media but with track records solving hard problems in any field like physics or finance, prioritizing persistence over visibility. [28:49], [29:21]

Topics Covered

GPT-5 mainstreams reasoning
Evals target economic discovery
Build automated researcher
Vibe coding becomes default
Persist through research failures

Full Transcript

The big thing that we are targeting is producing an automated researcher. So

automating the discovery of new ideas.

The next set of evals and milestone that we're looking at will involve actual movement on things that are economically relevant. And I was talking to some some

relevant. And I was talking to some some high schoolers and they're saying, "Oh, you know, actually the default way to code is vibe coding. I I do think you know the future hopefully will be vibe researching."

researching." Thanks for coming Jacob and Mark. Jacob,

you're the chief scientist at OpenAI.

Mark, you are the chief research officer at OpenAI and you guys have the both the uh the privilege and the stress of running probably one of the most high-profile research teams in AI. And

so we're just really stoked um to talk with you about a whole bunch of things we've been curious about, including GPD5, which was, you know, one of the most exciting updates to come out of

Open in recent times. And then stepping back, how you build a research team that can do not just GPD5 but codeex and chat GPT and an API uh business and can weave

all of the many different bets you guys have across modalities, across product form factors um into one coherent research culture and story.

>> And so to kick things off, why don't we start with GPD5? Just tell us a little bit about the GPD5 launch from your perspective. How did it go? So I think

perspective. How did it go? So I think GPT5 was really our attempt to bring reasoning into the mainstream and um prior to GPT5 right we have two

different series of models you had uh the GPT kind of 2 3 4 series which were kind of these instant response models and then we had an O series which uh

essentially thought for a very long time and then gave you the best answer that it could give. So tactically uh we don't want our users to be puzzled by you know which mode should I use and it involves

a lot of research in kind of identifying what the right amount of thinking uh for any particular prompt looks like and uh taking that pain away from the user. So

we think the future is about reasoning more and more about reasoning more and more about agents and uh we think GPD5 is this step towards delivering

reasoning and more agentic behavior by default. There is also a number of

default. There is also a number of improvements across the board in this model relative to O3 um and our previous

models but our primary our primary um fees for for this launch was indeed bringing the reasoning mode to more people.

>> Can you say more about how you guys think about evals? I noticed even in that launch video there were a number of evals where you were inching up from you know 98 to 99% and that's kind of how you know you saturated the eval. What

approach do you guys take to measuring progress and and how do you think about it?

>> One thing is that indeed for like these evos that we've been using for the last few years, they're indeed pretty close to saturated and so yeah like uh for a lot of them like you know inching from

like 96 to 98% is not necessarily uh the most important thing in the world. I

think another thing that's maybe even more important but a little bit subtler when we were in this like GPT2 GPT3 GT4 era um you know there was kind of one recipe you just like pre-train a model

on a lot of um data and you kind of like use these um evals as just kind of a yard sick um of u how this generalizes to like different tasks. Um now we have

this uh different ways of training in particular uh reinforcement learning on like serious reasoning where we can pick a domain and we can really train a model to like become an expert in this domain

to reason very hard about it which lets us um you know target particular uh kinds of of of of tasks uh which will mean that like we can get like extremely

good performance on some evolves but it doesn't indicate as great generalization to to other things I think so the we think about it in this world, we

definitely think like uh we are in a little bit of a uh deficit like uh of of of great evaluations and I think the big

things that we look at are actual marks of the model being able to discover new things. I think for me the most exciting

things. I think for me the most exciting thread and like actual sign of progress this year has been our model's performance in uh math and programming

competitions. although I think like they

competitions. although I think like they are also becoming saturated in a sense.

Um and the next set of evolves and milestone that we're looking at will involve actual um discovery and and actual um movement on on on things that

are that are economically relevant.

>> Totally. You guys already got number two in the at coder competition. So there's

really only number one left.

>> Yeah. Yeah. I mean I think it is important to note that these evals like um you know II atcoder IMO um are actually real world markers for success in future research. I think a lot of you

know the best researchers in the world have gone through these competitions have gotten very good results >> um and and yeah I think we are kind of preparing for this frontier where we're trying to get our models to discover new

things.

>> Yeah very exciting.

>> Which capability from GPD5 before the release? surprised you the most when you

release? surprised you the most when you were working through the eval bench or using it internally? Were there any moments where you felt like this was starting to get good enough to release

because it was useful in your daily usage? I think one big thing for me was

usage? I think one big thing for me was um just how much it moved the frontier in very hard sciences. Um you know we would try the models with some of our friends who are you know uh professional

physicists or professional mathematicians and you already saw kind of some instances of of of this on Twitter where you know you can take uh a problem and have it discover maybe not

like very complicated new mathematics but you know um some non-trivial new mathematics and >> uh you know we we see physicists mathematicians kind of uh repeating this

experience over and over where they're trying pro and saying, "Wow, this is something that the you know, previous version of the models couldn't do." And

it is a little bit of a light bulb moment for them. It's like uh able to automate maybe like what could take uh one of their students months of of time.

>> Well, GP5 is a is a definite improvement on O3. For for me, 03 was definitely

on O3. For for me, 03 was definitely like that moment where the reasoning models became like actually very useful on a daily basis. I think especially for

um you know working through a math uh formula or or or a derivation like they like it actually got to a level where it is like fairly trustworthy and and I can

actually use it as a as a tool uh for for my work. Um and yeah I think I think uh yeah it is very exciting to get to

that moment. Um but I expect that um

that moment. Um but I expect that um well now as we're seeing um you know these models like like actually able to automate well yes like like we're saying solving contest problems over over

longer time horizons I I I expect that that is well that that that was quite small compared to what's coming over the next year.

>> What is coming in the next one to five years it would be just at whatever level you're you're comfortable sharing what what does the research road map look like? So the big thing that we are

like? So the big thing that we are targeting with our research is producing um an automated researcher. So auto

automating the discovery of new ideas um and you know of course like a particular thing we think about a lot is automating our own own work automating ML research.

Uh but that can get a little bit self-reerential. So we're also thinking

self-reerential. So we're also thinking about automating um progress in in in other sciences. And I think like one

other sciences. And I think like one good way to measure progress there is looking at like what is the time horizon on which these models actually can um

reason and make progress. And so now as we kind of like get to a level of near mastery of this of this um high school competitions let's say I I I would say

like we get we get to like maybe on on the order of one to five hours of of of reasoning. Um and and so we are focused

reasoning. Um and and so we are focused on extending that horizon both in terms of like the models um will capability to plan over very long horizons and actually able to retain ability to

retain memory.

>> And back to Eval's question that's why I think eval of the form of how long does this model autonomously operate for are of particular interest to us.

>> And actually maybe on that topic there's been this huge move toward agency and model development. But I think at least

model development. But I think at least the state that it's in currently, users have sort of observed this trade-off between too many tools or planning hops can result in quality regressions uh

versus um something that maybe has a little bit less agency, the the quality is at least observed today to be a bit higher. H how do you guys think about

higher. H how do you guys think about the trade-off between stability and depth? the more um steps that the model

depth? the more um steps that the model is undertaking maybe the less likely the tenth step is to be accurate versus you ask it to do one thing it can do it very very well um and to have it keep doing

that one thing better and better but more complex things there's sort of that trade-off um but of course to get to full autonomy you are taking multiple steps you're using multiple tools

>> I I I think actually like well the well the ability to maintain depth is a lot of it is being consistent over long horizons >> um so I I think they are very related

problems. Um and in fact I think like with the reasoning models we have seen the models like greatly um extend the the the length over which they are able

to reason uh and and work um reliably without without going off track. Yeah, I

think this is uh this is going to remain a big area of focus for us.

>> Yeah. And I think reasoning is core to this ability to operate over a long horizon because you know you imagine kind of yourself solving a math problem right you try an approach it doesn't work and you know you have to think

about you know what what's the next approach I'm going to take um what are the mistakes in the first approach and then you try another thing >> and you know the the world gives you some hard feedback right and then you keep trying different approaches and the

ability to do that over a long period of time is reasoning and gives agents that robustness >> we talked a lot about math and science um I I curious to get your take on do

you think some of the progress that we've made can actually extend um similarly to domains that are less verifiable. They're sort of less of an

verifiable. They're sort of less of an explicit right or wrong.

>> Oh yeah, this is a this is a a question I I really like. Um I think if you actually truly want to extend to

research um and you know finding discovering ideas that that meaningfully advance technology on the on you know the scale of like months and years like I think the these questions like

stop being so different right like it is one thing to solve like a very well posed uh constraint problem on the scale of an hour right and there's like kind of a finite amount of ideas you need to look through and that might feel

extremely different from solving something very open-ended. Um but you know even if you want to solve like a very well- definfined problem that is on much longer scale right you like you

know prove this millennial price problem. uh well that suddenly requires

problem. uh well that suddenly requires you to think about okay like what are the fields of mathematics or other science that might possibly be relevant you know are there inspiration from physics that I must take like what is

kind of the entire uh program that I want to develop around this and now these become very open-ended questions and it's actually hard to you know for for for our own research right like if

all we cared about is you know reduce the uh modeling loss on a given data set right like like measuring the progress on that like uh you know like like are we kind of actually asking the right

questions in research like actually becomes like a fairly open-ended affair.

>> Yeah. And I think it also makes sense to think about what the limits of, you know, uh open-ended means, you know. Um

I think a while back Sam tweeted about some of the improvements that we were making in having our models write more creatively. And you know, we do consider

creatively. And you know, we do consider the extremes here as well.

>> Right. Right. Let's talk about RL because it seems like since 01 came out, RL has been the gift that keeps giving.

You know, every every couple months Open puts out a release and everyone goes, "Oh, that's great, but this RL thing is going to plateau. We're going to saturate the evals. The models won't

generalize or there's going to be mode collapse because of too much synthetic data for whatever. Everybody's got a laundry list of reasons to believe that the gains and performance from RL are going to tap out and and somehow they

just don't. You guys just keep coming

just don't. You guys just keep coming out and putting out continuous improvements. Why is RL working so well

improvements. Why is RL working so well and what if anything has surprised you about how well it works? RL is a very versatile method, right? And there are a

lot of ideas you can explore um once you have an RL system working. a long time at OpenAI, we started from this before language models, right? Like we were thinking about like okay like RL is this

like extremely powerful thing of course like on top of deep learning which is this like incredible general learning method. Um but the thing that we

method. Um but the thing that we struggled with for a very long time is like what is the environment like how do we actually anchor these models to the real world or like should we you know

simulate uh you know some some some some island where they all learn to collaborate and compete. Um and and then you know of

compete. Um and and then you know of course came the the the the language modeling breakthrough right and we saw that oh yeah if we if we if we scale deep learning on modeling natural language we can create models with this

like incredibly nuance understanding of human language and so since then we've been we've been you know seeking how to combine these paradigms and how to get our to work on natural language. Once you do right like

natural language. Once you do right like then you kind of have the well you have the ability to um to to well to to to actually like like like

execute on on these different ideas and objectives in this like extremely um robust rich environment given by pre-training. Uh and so yeah, so I think

pre-training. Uh and so yeah, so I think uh it's been a it's been a it's been a real um um yeah, I think it's been perhaps the

most exciting period uh in our research over the last few years where we've really like uh yeah, we found so many

new directions and promising ideas uh that that that all seem to to to be working out and and and and and we're trying to uh Yeah.

Understand how to compare. One of the hardest things about RL for folks who are not practitioners of RL is the idea of crafting the right reward model. And

so, especially if you're a business or an enterprise who wants to harness all this amazing progress you guys are putting out, but doesn't even know where to start. How what do the next few years

to start. How what do the next few years look like for a company like that? What

is the right mindset for somebody who's trying to make sense of RL to craft the right reward model? Is there anything you've learned about the best practices

or an approach of thinking of using this latest sort of um family of reasoning techniques? What what is the right way I

techniques? What what is the right way I should think about even approaching reward modeling as a biologist or a physicist? I expect

this will evolve quite rapidly. I expect

it will become simpler, right? Like I

think I think >> you know maybe like two years ago we would have been talking about like what is the right way to craft my fine-tuning data set and I I don't think we are like at the end of that evolution yet and I

think we will be inching towards more and more humanlike learning uh which you know RL is still not quite. So I think I think maybe the most important part of the mindset is to like not assume that

like what is now will be forever.

>> Um so I want to bring the conversation back to coding. We would be remiss not to say congrats on GBT5 codecs. uh which

just dropped today. Um can you guys say a little bit more about what's different about it, how it's trained differently, um maybe why you're excited about it?

>> Yeah, so I think um one of the big focuses of the codeex team is to just take the raw intelligence that we have from our reasoning models and make it very useful for real world coding. So um

a lot of the work they've done is kind of consistent with this. um they are working on kind of having the model be able to handle more difficult environments. Um we know that real world

environments. Um we know that real world coding is very messy. Um so they're trying to handle all the intricacies here. Um there's a lot of coding that

here. Um there's a lot of coding that has to do with you know style with um just like kind of softer things like how how proactive the model is, how how lazy

it is and just being able to define um in some sense like a spec for how uh a coding model should behave. um they do a lot of you know very strong work there and as as you see like um they they're

also working on a lot better presets you know uh coders they have some kind of notion of this is how long I'm waiting I'm willing to wait for a particular solution um I think we've done a lot of

work to dial in on you know for easy problems being a lot you know lower latency for harder problems actually the the right thing is to be even higher latency um get you the really best

solution um and just being able to find that preset um is sweet spot for if you were to say like easier problems versus harder.

>> What we've found is the the latest the the previous generation of the codeex models, they they were spending too little time solving the hardest problems and too much time solving the easy easy problems. And I think um

>> that that is actually just um probably out of the box uh what what you might get out of 03.

>> Maybe just on the the topic of coding since you guys are both competitive coders in prior lives. Um, I

know you've been at OpenAI for almost a decade now, but I was struck by uh the story of Lisa Doll, the Go player who kind of famously quit Go after he lost

to Alph Go um multiple times. Uh, and I think in a recent interview you guys were both saying that now the coding models are better than your capabilities. Uh, and that gets you

capabilities. Uh, and that gets you excited. Um, but say more about that.

excited. Um, but say more about that.

And um, how much would you say you code now? Well, if you're hands- on keyboard,

now? Well, if you're hands- on keyboard, you can you can talk about OpenAI generally, but how much code is written by AI now >> in terms of cutting models being better?

I I mean, I think yeah, I think it is extremely exciting to see this progress.

I think like the programming competitions have a nice kind of encapsulated test of like ability to um come up with some new ideas um in in in

you know, in this like boxed uh environment and time frame. Um I do think like you know if you look at things like uh well I guess the IMO

problem six or or maybe um some very hardest uh programming competitions problems like I think there's still a little bit of headway to go for the models but I wouldn't expect that to last very long. I took a little bit uh

historically I've been like >> being humble >> historically I've actually been like extremely reluctant to use any sort of

>> tools. I I I just used Vim pretty much

>> tools. I I I just used Vim pretty much >> old school.

>> Yeah. Um yeah, eventually I think like like especially with this with this um um latest coding tools um like GPT5, I I've really kind of felt like okay like

this is this is no longer the way like like you can do a you know 30 file refactor like pretty much perfectly in like 15 minutes like you kind of have to

use it. Um yeah and so I've been I've

use it. Um yeah and so I've been I've been kind of like um learning this new way of coding which definitely feels a little bit different. I um I think it is like a little bit of an uncanny valley

still right now where like like you kind of have to use it because it is just like exciting so many things but it's still like you know a little bit like u

not quite as good as a as as a coworker.

Um I so you know I I think like our our priority is getting out of that uncanny valley.

>> Yeah. But uh yeah, it's definitely an interesting time.

>> Yeah, definitely >> to kind of like speak to the lease little moment. Um I think AlphaGo for

little moment. Um I think AlphaGo for both of us was, you know, a very formative milestone in AI development.

And at least for me, it was the reason I started working on this in the first place. And maybe partly because of our

place. And maybe partly because of our backgrounds in competitive programming like I had this affinity to building these models which could do very very well in in these forms of contests and

going from you know solving eighth grade math problems um to a year later um hitting our level of performance in in these coding contests. It's crazy to see

that progression and um you kind of imagine or like to think that you feel a set of the feelings at least at all felt too, right? It's um like wow this is

too, right? It's um like wow this is really crazy, right? and and what are the possibilities and you know this is something that I took decades to do and it took a lot of hard work to get to the

forefront of um so you really do feel an implication of that is these models what can't they do right and I do feel like already it's kind of transformed the default for coding um this past weekend

I was talking to some some high schoolers and they were saying oh you know actually the default way to code is vibe coding like um you know I think like they they would consider oh it's like maybe sometimes for completeness

you would go and like actually do all of the mechanics of coding it from scratch yourself, but that's just a strange concept to them. Like why would you do that? You know, just vibe code by

that? You know, just vibe code by default. Yeah. And and so yeah, I mean I

default. Yeah. And and so yeah, I mean I I I do think you know the future hopefully will be vibe researching.

>> Yeah.

>> I that I have a question about that which is what makes a great researcher.

Right. When you say vibe researching, there's um a big part of vibe coding is just having good taste in wanting to build something useful and interesting for the world. And I think what's so awesome about tools like Codeex is if

you've got a good intuition for what people want, it helps you articulate that and then and then basically actualize a prototype very fast. With a

with research, what's the what's the analog? What what makes a great

analog? What what makes a great researcher? Persistence uh is a is a

researcher? Persistence uh is a is a very key trait, right? Like I think like what What is different about research when you're actually trying to I think a

special thing about research right is you're trying to create something or or learn something that is just not known right like it's not known to work like you don't know whether it will work and

so always trying something that will most likely fail and I think getting to a place where you are like in a mindset of like being ready to fail and being ready to learn from these failures and

you know so and you know and of course with that comes creating kind of clear hypothesis and being extremely honest with yourself about how you're doing on them, right? I think a trap many people

them, right? I think a trap many people fall into is going out of the way to like to to prove that it works, right?

Which is quite different from, you know, like I think like believing in your idea and significance is extremely important, right? And you want to persist persist

right? And you want to persist persist that, but you have to be honest with yourself about when it's working and when it's not uh so that you can learn and adjust.

>> Yeah, I think there are just very few shortcuts for experience. Um I I think through experience you kind of learn, you know, what's the right horizon to be thinking of a problem, right? You can't

pick something that's too hard or it's not satisfying to do something that's too easy. Um and I think a lot of

too easy. Um and I think a lot of research is managing your own emotions over a long period of time, too. You

know, there's just going to be a lot of things you try and they're not going to work. And

work. And >> sometimes you you need to know when to persevere through that or sometimes when to kind of switch to a different problem. Um and I think interestingness

problem. Um and I think interestingness is something you know you try to fit through reading good papers talking to to your colleagues and um and you kind of maybe distill their experience into

your own process.

>> When I was in grad school um you know there's a big part uh I was I'm a failed machine learning researcher. I was in grad school for for bioinformatics. But

a big part of my research advisor's thrust was about picking the right problems to work on such that you could then sustain and persist through the hard times. And you said something

hard times. And you said something interesting which was there's a difference between having conviction in an idea and then being maximally truth seeeking about when it's not working.

And though both those things might are sometimes intention because you kind of go native on an on a topic or a problem sometimes that you have deep conviction in. Have you found is there any sort of

in. Have you found is there any sort of heruristics you found are useful at the taste step at the problem picking step that help you arrive at the right set of problems where that conviction and truth

seeeking is not as much in zero sum tension as other kinds of problems. >> Yeah to to be clear I don't think conviction and truth seeeking are really in a zero sum tension. I think like you

can be like you can be convinced or you know you can have a lot of belief in idea and and you can be you know very persistent in it while it's not working.

I think it's just important that you're kind of honest with yourself like like how much progress you're making and you're in a mindset where you're able to learn from uh the failures along the way. I think it's important to look for

way. I think it's important to look for problems that you really care about and you really believe are important, right?

And so um I think one one thing I've observed in in in in many um researchers that inspired me has been really going after the hard

problems like looking at the questions that are you know kind of like you know widely known but like not really kind of considered tractable and just asking like you know why are they not tractable or like you know what like what what

about this approach like why does this approach fail right you're you're you're always like thinking about what is really the barrier for the next step. If

you're going after problems that like you really truly believe are important, right, then then that that that makes it so so much easier to find the motivation to persist with them over years. And in

the development of like during the re training phase of GPD5 for example, are there any were there any moments where there were there was a hard problem the

original initial attempts that were being made to crack that problem weren't working and yet you found somebody persisted through that. Um, and what was

it about those sto any of those stories that comes to mind that worked well >> that you wish other people and other researchers did more of?

I think on the path there right like along the sequence of models like both the pre-trained models and the resig models um I think one very common theme

is um bugs uh and you know both like just like yes silly bugs in software that can kind of stay in your software for like months and kind of invalidate all your experiments a little bit in a

way that you don't know. Um

and you know identifying them can be can be a very meaningful breakthrough for your research program. uh but also kind of bugs in the sense of like well you have a particular way of thinking about something and that way is a little bit

skewed which causes you to uh make the wrong assumptions and identifying those wrong assumptions thinking rethinking frames from from scratch. Uh I think um

you know both for getting the first reasoning models working or getting the uh you know larger pre-trained models working I think I think we've had like multiple issues like that that we've had to work through. As leaders of the

research org, how do you think about what it takes to keep the best talent on your team? And on the flip side,

your team? And on the flip side, creating a very resilient org that doesn't crumble if a key person leaves.

>> The biggest I think uh things that OpenAI has going for it in terms of keeping the best people motivated and exciting excited is like we are in the

business of doing fundamental research, right? We aren't the type of company

right? We aren't the type of company that looks around and says, "Oh, what model did P, you know, company X build or what model did company Y build?" Um,

you know, we have a fairly clear and crisp definition of what it is we're out to build. Um, we like innovating at the

to build. Um, we like innovating at the frontier. Um, we we really don't like

frontier. Um, we we really don't like copying and um, I think people are inspired by that mission, right? you are

really in in in the business of discovering new things about the deep learning stack and and um and I think we're we're kind of building something very exciting together. Um I think

beyond that a lot of it's creating very good culture. So we want a good pipeline

good culture. So we want a good pipeline for training up people to become very good researchers. um we I think

good researchers. um we I think historically have hired um you know the the best talent and and the most innovative talent. So I just think um

innovative talent. So I just think um you know we have a very deep bench as well and um yeah I think most of the our leaders are very inspired by the mission and that's what's kept all of them there

like when I look at my direct reports um they haven't been affected by the Talon wars. I was chatting with a researcher

wars. I was chatting with a researcher recently and he was talking about wanting to find the cave dwellers. Um,

and these are often the people who are not posting on social media about their work. Um, for whatever reason they may

work. Um, for whatever reason they may not even be publishing. They're sort of in the background doing the work. Um, I don't know if you would agree with this concept, but how

do you guys hire for researchers? And

are there any non-obvious ways that you look for talent or you know attributes that you look for that are non-obvious?

>> So I think I think one thing that um we look for is having solved hard problems in any field. A lot of our most successful researchers um have started

their journey with deep learning at OpenAI and have worked in other fields like um physics or >> um computer science, the computer

science or finance uh in the in the past strong technical fundamentals coupled with the abil the um intent to like work

on very ambitious problems and actually stick with them. We don't purely look for oh, you know, who did the most visible work or or or or is the most visible on social media or >> Yeah. As you were talking, I I was

>> Yeah. As you were talking, I I was thinking back to when I when I was a founder and I was running my own company and we would recruit for great talent engineers. Many of the attributes you

engineers. Many of the attributes you described were ones that were on my mind then. Um, and Elon recently tweeted that

then. Um, and Elon recently tweeted that he thinks this whole researcher versus engineer distinction is silly. Is that

just a semantic uh is it just being you know semantically nitpicky or do you think these two things are more similar than they actually look?

>> Yeah, I mean I I do think there can like researchers they don't just fit one shape. Um you know we have certain

one shape. Um you know we have certain researchers who are very productive at openi who are just so good at idea generation and um you know they don't necessarily need to show great impact

through implementing all of their ideas, right? I think there's so much alpha

right? I think there's so much alpha they generate in just kind of coming up with oh let's try this or let's try this or maybe we're thinking about that and there's other researchers who you know

they are just very very efficient at um taking one idea um rigorously exploring you know the space of experiments around that idea. So I think you know

that idea. So I think you know researchers come in very different forms. I think um maybe that first type wouldn't necessarily map into the same bucket as a a great engineer, but um you

know we we do kind of try to have a fairly diverse um set of research tastes and styles.

>> Yeah. Mhm. And and say a little bit about what it takes to make like a create a frontier sort of winning culture >> that can attract all kinds of shapes and

of researchers and then actually grow them, thrive them, make them win together at scale. What is it? What what

do you think are the most >> critical ingredients of a winning culture?

>> So I I think actually the most important thing is just to make sure you protect fundamental research, right? Um, I think you can get into this world with so many different companies these days where

you're just thinking about, oh, how do I compete on, you know, a chat product or some other kind of product surface and um, you need to make sure that you leave space and recognize the research for

what it is and also give them the space to do that, right? Like you can't have them being pulled in all of these different product directions. Um, so I think that's one thing that we pay

attention to within our culture, >> especially now that there's so much spotlight on open AI, so much spotlight on AI in general and and the competition

between different labs. Uh, it would be easy to fall into a mindset of like, oh, we're racing to bit beat this latest release or something. and and um you

know there's definitely like um uh a risk that people kind of start looking over their shoulder and start thinking about oh you know what are these other things and and uh I see it as a large

part of um our job to make sure that people have this comfort and space to think about you know what what are things actually going to look like in a

year or two? um like what are the actually big research questions that we want to answer and and how do we actually get to models that like vastly outperform what we see currently rather than just like iteratively improving in

the current paradigm.

>> Just to pull on that thread more on protecting fundamental research um you guys are obviously one of the best research organizations in the world but you're also one of the best product companies in the world. H how do you

balance and especially with um you've brought on some of the best product execs in the world as well. um how do you balance that focus between the two and while protecting fundamental research also continue to move forward

the great products that you have out?

>> Yeah, I mean I think it's about kind of delineating a set of researchers who really do care about product and who really want to be accountable to the success of the product and and they

should of course very closely coordinate with the the research work at large. Um

but I think just kind of people understanding their their mandates and what they are rewarded for um uh that that's a very important thing. One thing

that I think is also helpful is that um our product team and and broader company leadership is is is bought into this

vision right where where we are going with research. And so uh you know nobody

with research. And so uh you know nobody is assuming that like oh the product we have now is a product we'll have forever and we'll just kind of wait for like you know new versions from research like

like we we are able to think jointly about what the future looks like. One of

the things that you guys have done is let such a diversity of different ideas and bets flourish inside of OpenAI that you then have to figure out some

way as research leaders to to make it all make coherent sense as one part of a road map. And you got, you know, people

road map. And you got, you know, people over here investigating the future of diffusion models and visual media. And

over here you've got folks, you know, investigating the future of reasoning when it comes to code.

How do you paint a coherent picture of all that? How does that all come

all that? How does that all come together when when there might be at at least naively some tension between giving researchers the independence to go to fundamental research and then

somehow making that all fit into one coherent research program. Our settle

goal um for our research program has been getting to an automated researcher for um a couple years now. Uh and so we've been we've been um building most

our projects with this goal in mind. Um

and so this still leaves a lot of room for um kind of bottom up idea generation for fundamental research on on various

domains. But we are you know always

domains. But we are you know always thinking about how do these ideas come together eventually. Um we are you know

together eventually. Um we are you know we we believe for example that reasoning models go much further and we have a lot of explorations on things that are not directly reasoning models but we are thinking a lot about how they eventually

combine and you know what does what what will this uh kind of innovation look like once you have something that is out there and thinking for for for moms

about a very hard problem. Um and so I think this clarity of of like our long-term objectives is important. Um

but yeah but it doesn't doesn't mean that we are you know prescriptive about like oh here are all the little pieces right like we definitely view this as a as a question of of exploration and learning about about these technologies.

>> Yeah I think you want to be opinionated and prescriptive at their very kind of course level but you know a lot of ideas can bubble up in a finer level >> and has have there been any moments where th those things have been

intention at all recently? Well, one

provocative example could be recently, you know, this new image model came out, which is nano banana, right, from Google. It's extraordinary value shown

Google. It's extraordinary value shown to that like lots of everyday people um can unlock a lot of creativity when these models are good at understanding editing prompts. Um and and I could see

editing prompts. Um and and I could see how that would create some tension for a research program that may not be prioritizing that as directly. um if if if one of your you know somebody

talented on your team came and said guys like this thing is so clearly valuable in the world out there we should be spending you you know more effort more energy on this how do you reason about that question >> I think that's definitely a question

that we've been kind of thinking about for quite a while at OpenAI I mean if you if you look at GP3 right like like once we kind of saw like oh like this is kind of where language models are going

we we definitely like had a lot of discussions about well clearly there are going to so many magical things you can do with AI, right? And you will you will be able to

right? And you will you will be able to go to this like like extremely smart models that are, you know, out there pushing the frontiers of science, but you will also have this like incredible

media generation and this incredibly uh you know transformative u um entertainment applications. Uh and so

entertainment applications. Uh and so like how do we prioritize among all these directions uh has definitely been something we've been we've been thinking about for for for quite a while. Yeah,

absolutely. And and the real answer is like we don't discourage someone from being really excited by that and and it's just if we're consistent in the prioritization um and our product

strategy, then it just will naturally fall in and and so it's just for us like we we do encourage a lot of people to to be excited about, you know, building this um you know, building kind of like

aic products, you know, whatever kind of products that that they're excited by.

But I think it's uh important for us to also have a a separate group of people who you you protect that their goal is to create the algorithmic advances. How

does that translate and just to build on Andre's question into a concrete framework around resourcing like do you think about okay x% of compute resources

will go to longer term you know very important but maybe a bit more pie in the sky exploration versus there's also you know obviously current product inference but sort of this thing in the middle where uh it's achievable in the

short to medium term.

>> Yeah. Um so I think that's a big part of both of our jobs. You know just uh this portfolio management question of how much compute do you give to which project and um I think historically

we've put a little bit more on just the core algorithmic advances uh versus kind of the the product research. Um but it's something that you have to feel out over time, right? It's it's dynamic. I think

time, right? It's it's dynamic. I think

monthtomonth there could be different needs. And so I think it's important to

needs. And so I think it's important to stay fairly flexible on that. And if you had 10% more resources, would you put it toward compute or is it data curation,

people? Where would you stick that from

people? Where would you stick that from like a marginal uh >> good question? Um honestly, yeah, I think um comput

>> today reasonable answer. Yeah. Yeah. I

mean, honestly, I I I do think kind of to your question of prioritization, right? It's like in a vacuum any of

right? It's like in a vacuum any of these things you would love to like go and excel and win at. Um I think the danger is you end up like second place at everything and you know not like you

know clearly leading at at anything. So

I think prioritization is important right and you need to make sure >> there's some things you're cleareyed on this is the thing that we need to win.

>> Yeah.

>> Yeah. But I think it makes sense to talk about it for just a a little bit more which is compute sets so much of comput is destiny in a way right at a

research organization like openi and so do would you a couple of years ago I think it became very fashionable to say oh okay we're not going to be compute constrained anytime soon because there's a bunch of CMS that are you know people

are discovering and we're going to get more efficient and all the algorithms are going to get better and then eventually like really we'll just be in a data constrained regime And it seems like, you know, a couple years have come and gone and we're still like this is

sort of very computed environment.

>> Does that change anytime soon, you think? Or I mean I think like we've seen

think? Or I mean I think like we've seen for long enough like how much we can do with compute. Um yeah, I

I I I haven't really bought that much into the like will be data constraint claim and um yeah, I don't I don't I don't expect that to change. Yeah,

anyone who says that should just step into my job for a week and there's no one who's like a you know I have all the compute that I need. Yeah.

>> You know, historically the job of advancing fundamental research has historically been largely a mandate that universities have had partly for the compute reasons you just described. That

hasn't been the case for Frontier AI.

You guys have spent done such an incredible job kind of channeling the arc of Frontier AI progress to help the sciences out. Um, and I'm wondering when

sciences out. Um, and I'm wondering when those worlds collide, the fundamental world of university research today and the world of frontier AI, what comes out? So, I guess I I personally started

out? So, I guess I I personally started as a resident at OpenAI and it's a program that we had for uh people in different fields to come in, you know,

learn quickly about about AI and become productive as a researcher. And I think there's a lot of really powerful elements in in that uh program. And you

know the idea is just like you know could we accelerate something that looks like a PhD in in as as little time as possible. And I think a lot of that just

possible. And I think a lot of that just looks like implementing a lot of you know very core results. And you know through doing that you're going to make mistakes. You're going to be like oh wow

mistakes. You're going to be like oh wow like build intuition for if I you know set this wrong like that's going to blow up my network in this way. Um and so you just need a lot of that hands-on

experience. Um I think um over time, you

experience. Um I think um over time, you know, there been curriculums developed at um probably all these large labs in in like optimization and architecture

and RL and um yeah, probably no better way than to just kind of try to implement a lot of those things and read about them and think critically about them. Yeah.

them. Yeah.

>> Yeah. I think maybe like one other nice thing that you get to experience at academia is like yeah this like persistence right of like oh you know you have a few years and you're kind of trying to solve a problem and it's a

hard problem and you've never dealt with such a hard problem before um and yeah I do feel like this is a thing that's like

um well currently the pace of progress is very fast uh maybe also the ideas tends to work out a little bit more often than they did in the fast. Um

because uh yeah, deep learning just wants to learn. Uh

and um getting your hands on on a more challenging problem for for a little bit maybe you know being part of a team attacking like an ambitious challenge and uh and and you know getting that

feeling of you know uh what it feels like to be stuck and what it feels like to finally be making progress I think is uh is also something that's like very useful to learn. How does

external perception reception of a particular product launch impact how you prioritize something? Is

that uh is it to the extent where you know perception and usage uh in the case where they're married obviously there's probably a clear directive there but in a case where maybe they're divorced a

bit does that impact how you think about roadmap or where you emphasize resources? So we we generally like have

resources? So we we generally like have some pretty strong convictions about the future and so we we don't tie them that closely to like short-term reception of

our products, right? Like of course we you know learn based on what is going on. we, you know, read other papers and

on. we, you know, read other papers and we we we look at like what other labs are working on but uh but generally like we we act from a place of of of fairly

strong belief in and and uh in what we're building. Um and so of course like

we're building. Um and so of course like uh you know that that is for like our long-term research program of course when it comes to um

product right like like I think that the the the cycle of iteration is much much faster.

>> Yeah. I think you know there with every launch you know we are trying to aim it to be something that's wildly successful on the product side and you know I I think from a fundamental research

perspective we're trying to create models with all the kind of core capabilities needed to build a very rich you know set of experiences and products and you know there are going to be

people who have some vision of like one particular thing they could build and you know we'll launch it and everything we launch we really hope it goes wildly successful ful and you know we get that feedback and if it's if it's not like

we'll kind of shape our our product strategy a little bit but yeah we we definitely also in the business of launching very useful wildly successful products. Yeah,

products. Yeah, >> it feels like because of the un sort of completely unbridled pace of progress that we've just spent a lot of

time talking about a lot is going to change over the next few years, right?

It gets really hard to predict I imagine 10 years out, let alone, you know, 10 months out. And so

months out. And so my question I guess is through all that change that the frontier of AI is going to bring what are some priors that you actually think should stay constant? Is

there anything well one clearly is that we don't have enough compute?

Is there anything else that you think doesn't change that you think would be strong reasonably held prior as constants? I think more broadly than

constants? I think more broadly than compute there's physical constraints of well energy but also like you know at

some point not too far like robotics will become a major focus um and so um so I think I think thinking about like

the the physical constraints um is is going to remain important um but yeah I do think on the intelligence front I would not make too

many assumptions >> very few startups can get to the scale that you have both from a you know employee perspective but also revenue

count and maintain that break neck speed that you probably had I mean seven eight years ago when you when you both joined um what's the secret sauce to doing that

and how do you continue to maintain this pressure almost to to ship as quickly as possible even though you know you're kind of on you know top now >> I think one of the clearest marker is

that we have really good research culture at least in my mind is you know I've worked at different companies before and there is a real thing which is a learning plateau right you go to a

company you you learn a lot for the first one or two years and then you just find kind of like >> you know I I know how to be fairly efficient in this framework and my my learning kind of stops and I've really

never felt that at at open just like like that experience you described of all these really cool results bubbling up um you're just learning so much we over a week and um and it it is a

full-time job to kind of stay on top of all of it and um that's just been very fulfilling. So um yeah, no I I think

fulfilling. So um yeah, no I I think that's a very accurate description just um we we just want to generate a lot of really high quality research and um it's almost a good thing like if you're

generating enough that you're barely able to keep on top of it.

>> Yeah, exactly. Yeah,

>> I think definitely like the developer of technology I think is a driving force here where >> you know maybe yeah maybe we would kind of u become comfortable after like a few

years working in a given paradigm but we are always on the cusp of that you know >> new thing and you know trying to reconfigure our thinking around the kind of new constraints and new possibilities

that we're going to be faced with >> um and so I think I think that kind of creates this this feeling of constant change and and the and that mindset of of like always kind of learning the new

thing. Well, you know, one thing that

thing. Well, you know, one thing that came up in our research um about things at OpenAI that have not changed through a lot of the change is the is the trust that the two of you guys have in each

other cuz uh that I think there was an article or profile of you guys recently in the MIT tech review and that was also one of the highlight themes that your chemistry, your trust with each other, your repo is something a lot of the

people um at OpenAI have come to um treat as a as a constant. So what's the backstory? How did you guys build trust

backstory? How did you guys build trust there? How did that How did that happen?

there? How did that How did that happen?

It's >> like asking you to to Have you ever seen that um um when Harry met Sally?

>> I feel like you're on the couch and now you got to Yeah, exactly.

>> Well, I I do think you know we we started working together a little bit more closely um when we kind of had the first seeds of working on reasoning. Um

I think you know we at the time you know that wasn't a very popular research direction to work on and I think uh both of us kind of saw glimmers of hope there and um you know we were kind of pushing

in um in this direction kind of figuring out how to make our work and um yeah I think over time kind of growing a very

small effort into increasing larger effort and um and I think that's kind of where I um yeah really got to kind of work with

Jakob in depth. I think um I he he's just really a phenomenal researcher. I

think you know um any of these Franklist like he should be number one. Um like

just his ability to you know take any very difficult technical challenge and and almost like personally just kind of think about it for two weeks and and

just crush it. Um uh it's incredible that he has kind of the the wide range that he does in terms of understanding as well as that kind of depth that you

can go and just personally solve a lot of these technical challenges.

>> Now you get to say some nice stuff about him.

>> You don't have to say anything nice about me.

>> Thanks Mark. Uh yeah. Yeah. I I think I think the big kind of the first like big thing that we did did together was like we started um seeing like okay like we think this algorithm is going to work

and so um you know I was thinking like okay like how do we you know direct people at this and we're talking with Mark like oh we should establish a team that's actually going to make this work and then you know Mark and Mark went and actually did this right like actually

kind of like got a group of like people working on very different things like got them all together and created a team with like incredible chemistry out of like this hold this third group and that

was like such an impressive thing to me.

Um, and uh yeah, I'm I'm uh I'm really grateful and inspired to like kind of get to, you know, work with Mark and kind of experience that. Um uh yeah, I

think uh this incredible capacity to both, you know, understand and engage and and and and you know, think about the the technical matter of the research itself. Uh but then coupled with this

itself. Uh but then coupled with this like great ability to um lead and inspire teams and create an organizational structure that you know in this whole kind of mess of chaotic

directions actually like like is coherent and and and able to gel together. Uh yeah very very inspiring.

together. Uh yeah very very inspiring.

>> It's awesome.

>> Well on that note um >> great note nod. Yeah, this look some some of the greatest discoveries in science, especially in physics, have often come from a pair of collaborators,

often across universities, across fields. And it seems like you guys have

fields. And it seems like you guys have have now added to that tradition. And

so, we're just super grateful that you guys made the time to chat today. Thanks

for coming by.

>> Thank you.

>> Thanks for being with us.

Heat. Hey, Heat.

[Music]

Loading...

Loading video analysis...