TLDW logo

How OpenAI Builds for 800 Million Weekly Users: Model Specialization and Fine-Tuning

By a16z

Summary

## Key takeaways - **ChatGPT hits 800M weekly users**: ChatGPT has 800 million weekly active users, representing 10% of the globe every week, achieved through its first-party app. [00:02], [10:34] - **Shift from one AGI to specialized models**: Even within OpenAI, thinking shifted from one model that rules them all to a proliferation of specialized models like Codex, GPT-4o, and o1. [00:10], [18:00] - **Reinforcement fine-tuning unlocks data**: Reinforcement fine-tuning (RFT) lets companies leverage their giant treasure troves of data far better than supervised fine-tuning, enabling SOTA performance on specific use cases. [21:29], [22:22] - **Developers build model relationships**: Developers stick with specific models due to both emotional familiarity and technical integration, making swaps surprisingly hard despite early expectations of interchangeability. [15:10], [12:44] - **Prompt engineering evolved to context design**: Prompt engineering isn't the point anymore; it's now about context engineering—what tools, data, and retrieval the model uses, especially with reasoning models. [24:44], [25:44] - **Node-based agents for procedural work**: Agent builder is deterministic and node-based for procedural SOP-oriented work like customer support, where deviation is undesirable, unlike undirected knowledge work. [45:47], [47:50]

Topics Covered

  • Full Video

Full Transcript

We want chat GBT as a first party app.

First party app is a really great way to get 800 million wows or whatever now.

>> Tenth of the globe, right? [laughter]

>> Yeah. Yeah. 10% of the globe uses it >> every week. Every week.

>> Yeah. Even within OpenAI, the the thinking was that there would be like one model that rose them all. It's like

definitely completely changed. It's

becoming increasingly clear that there will be room for a bunch of specialized models. There will likely be a

models. There will likely be a proliferation of other types of model.

Companies just have giant treasure troves of data that they are sitting on.

The big unlock that has happened recently is with the reinforcement finetuning. With that setup, we're now

finetuning. With that setup, we're now letting you actually run RL, which allows you to leverage your data way more.

Sherman, thanks very much for joining.

So, we're being joined by Sherman Woo.

Um, it'd be great actually if you provided the long form of your background as we get into this, just for those that that may not know you. I

mean, I I view Sherman as one of the the top AI thought leaders, so I'm really looking forward to this.

>> Yeah. Yeah. Thanks for having me. I'm

I'm really excited to be be on the podcast. Um, yeah. Yeah, a little bit

podcast. Um, yeah. Yeah, a little bit more of my my background. So, uh maybe we can start from present day and go backwards. So, I I currently lead um the

backwards. So, I I currently lead um the engineering team for the for OpenAI's developer platform. So, the biggest

developer platform. So, the biggest product in there of course is the the API. Um

API. Um >> is there more for the developer platform than the API? Um I just kind of assume there was synonymous.

>> Well, so I I also think about uh other things that we put into our platform side. So, like technically our

side. So, like technically our government work uh is is also like offering deploying this different areas.

Yeah. Like I've talked about >> Oh, so you have like a local deployment like >> Yeah. Yeah. So, we actually do have a

>> Yeah. Yeah. So, we actually do have a local deployment Atlas Alamos National Labs. It's super cool. I went to visit

Labs. It's super cool. I went to visit it. It's like very different than what

it. It's like very different than what I'm used to. Um, but yeah, in a like, you know, classified supercomput with with our with our model running there.

That's cool.

>> Um, so there's that. Um, but like mostly the the API. Um, cuz

>> Did you go to Los Alamos?

>> We did. Yeah, I did go to Los Alamos. It

was great. They showed us around. They

showed us some of the historic sites.

Real history. Um, yeah.

>> I used to work at Livermore, man. So,

I've got like an >> Oh, yeah. Yeah. My first job out of college. So,

college. So, >> Right. Right. Right. You sounded that

>> Right. Right. Right. You sounded that next.

>> Yeah. Yeah. Yeah. Well, we we hope to we hope to. Um but uh yeah, so I work on

hope to. Um but uh yeah, so I work on the developer platform. I've been

working on it for around uh 3 years uh now. So I joined in 2022. Uh was

now. So I joined in 2022. Uh was

basically hired to work on the the API product which at the time was the only product that that opening I had.

>> Um and I've basically just worked on it the uh the entire time. Um I've always been super interested in the developer side and kind of like the startup story of this technology. And so it's been really really cool to kind of see see this evolve. And so um that's my time in

this evolve. And so um that's my time in OpenAI. Before OpenAI um I was at uh

OpenAI. Before OpenAI um I was at uh Open Door uh for around 6 years. I was

working on the pricing side. My my

general background before >> I think it's such a dissonant like you know [laughter] >> pricing at open door to like running API and >> it's such a different uh it's been fascinating actually for me to see the differences between the companies like

they're run so differently. Um they both have open in the name so there's some overlap but like that's bas that's pretty much it. Um

>> but yeah I was there for around six years uh working on the pricing team. So

our team basically would run the ML models. This isn't actually pricing the

models. This isn't actually pricing the assets on Open Door like the inventory.

>> Exactly. So open so yeah, Open Door would buy and sell homes and their main project was buying homes directly from people selling them for with all cash offers and so my team was responsible for how much we would pay for them. Uh

and so it was a really fun like ML challenge. Uh it had a huge operational

challenge. Uh it had a huge operational element to it as well cuz not everything was automated obviously >> but it was a really fascinating technical challenge and um >> is there any sense of that on the API side like like GPU capacity buying or is

it just totally unrelated?

>> Oh on the on the API side um there's is a small bit of like how we price the the models but uh I don't think we do anything as sophisticated as uh uh as

open door. Open door is just like such a

open door. Open door is just like such a hard problem. It's like such a like

hard problem. It's like such a like expensive asset. It's like like the the

expensive asset. It's like like the the holding costs are very expensive. You're

like holding on to it for like months at a time. There's like a variability in

a time. There's like a variability in like the holding time.

>> A massive long tail of like potential things that could >> long tail. Yes. And like you know you have the you try to think about it from a portfolio perspective and like if one of them just like you're holding on it for two years it blows everything.

>> Everything like goes negative. So it's

uh it's a very very different different challenge. Yeah. Yeah. Six years there.

challenge. Yeah. Yeah. Six years there.

>> Uh lots of up and downs. uh saw a lot of the booms, saw a lot of the struggles and then we IPOed uh before I before I left.

>> Um but yeah, just in general it was a very great experience. I think uh for me it was uh it was also just like had such a very like business operations and like >> like a very like by the book type of

culture whereas open is like very different.

>> What's so interesting I was just thinking about it now is like even for a company like that like you don't think about it as a tech company but if there is a deep technology problem it actually is the pricing right. It's actually an ML problem. It's not what it drives.

ML problem. It's not what it drives.

like the website, it's not the, you know, it's not the platform, it's like, it's not the API, it's literally that.

>> Yep. Yep. Yep. And that's what attracted me to it. I think that's was interesting.

>> Um, uh, it's also a way like lower margin business than, uh, OpenAI. Uh,

cuz you're like making a tiny spread on these homes. Uh, they talk about like

these homes. Uh, they talk about like basis points like eating bits for breakfast and all that. So, um, anyways, I was at Open uh, Open Doorf around 6 years. Um, and then before that, uh, was

years. Um, and then before that, uh, was my first job out of college which was at Kora um, with Adam Dans from group there. Yeah. So I was working on the

there. Yeah. So I was working on the newsfeed. Oh yeah.

newsfeed. Oh yeah.

>> Uh so worked on newsfeed ranking for a bit. Worked on the product side.

bit. Worked on the product side.

>> Um but was uh that was actually my first exposure to like actual ML and industry and uh learned a lot from uh uh from from the engineers at core. We basically

hired a lot of the the early like feed engineers.

>> Was Charlie still there when you were there?

>> Charlie was not there when I when I right after that was a really legendary team, you know. I was still known to be kind of this super iconic founding team.

>> Yeah. Yeah. the the the early founding team was really solid. I still think that even while I was there, I was I I still like am amazed at the quality of the talent that we had. Um I think there's like one company's like 50 to 100 people, but like

>> yeah, like a bunch of the perplexity team was there. Dennis Dennis was on the feed team with me. Uh uh uh Johnny Ho, Jerry Ma, >> that's right. And then Alexander the

scale now, you know, like was there he was he was he was there between high school and college. Um it was an incredible team. I I don't think I I

incredible team. I I don't think I I think I kind of took it for granted while I was there. It was a good group.

But um >> uh and how did you how did you get to Kora? What did you study in undergrad?

Kora? What did you study in undergrad?

>> Yeah, so before that I was at MIT for undergrad. I studied computer science.

undergrad. I studied computer science.

Um uh did like one of those like computer science and the master's degree kind of like crammed it in.

>> Um >> uh I ended up at CORE because I got an uh what we call an externship there. So

like uh at MIT uh you actually get January off. So there's like the fall

January off. So there's like the fall semester and then January's off and then you have uh the spring semester. And so

it's kind of this it's called independent activities period. So some

people just like take classes, some people just do nothing, >> but it's uh some people will do like month-long internships and some crazy companies will like offer a month-long internship to a college student. And it

really is just kind of like a way to get uh people into >> Did you come out here from Boston or Yeah, it was crazy. So um so so so uh

you had to apply. I remember uh yeah this is like I think 2013 uh January or something um you had apply and I remember the core internship was the one that just paid the most. They paid I think it was like $8,000 $9,000 [laughter] and I was like wow that's

like for a month and you're kind of ramping up like half the time >> I can eat for a year.

>> Yeah. Yeah. As a college student like this like great and you get a and yeah they would they would kind of like fly out fly you out here and so I did the interviews and then luckily got an offer and so yeah I came out for a January. Um

that was right when they moved into their new Mountain View office and I basically uh yeah honestly just ramped up for like 2 weeks and then had like two weeks of good productivity working on the feed team.

>> So was that on the was that like userf facing like userf facing product? Yeah.

>> Yeah. I I distinctly remember my externship project for those two weeks was just to like add a couple features to our feature store >> and like that would make its way into the model. Um I was just going to say my

the model. Um I was just going to say my my I remember my mentor there um was is tutor uh who's now running I think it's called harmonic labs. Yeah. Crazy team.

Crazy.

>> This is un I mean by the way I think it's one of the untold stories of Silicon Valley is like how good that original team ended is. I mean a lot of them are still you know there and still good but like like the diaspora from Kora is everywhere.

>> Yeah. Yeah. Um that's actually how I ended up at OpenAI too. Kind of like you know fast forwarding from there cuz I open kind of kept a quiet profileish.

Uh, but I'd always kind of kept tabs on them because a bunch of the Corora core people I knew kind of like ended up there just kind of like checking in on on it and they were like, "Yeah, something crazy is happening here. You

know, you should definitely check it out." Um, so yeah, I definitely owe a

out." Um, so yeah, I definitely owe a lot to to Kora. Um, but yeah, part of the reason why I went there versus other options as a new grad was the team was just so incredible and I just felt like I could learn a ton from them. I didn't

think about everything afterwards. I was

just like, man, if I could just like absorb some knowledge from this like, you know, group of people, it' be it'd be great.

>> Awesome.

>> Yeah. So, one place I wanted to start um is uh something that I find very unique about um OpenAI uh is it's both a pretty horizontal company like it's got an API

like I would say we've got this massive portfolio of companies right and I would say a good fraction of them use the the API and then it's also a vertical

company in that you've got full-on apps right um you know like everybody uses chat GPT for example and so you know you're responsible for like the API and kind of the dev tools side. So maybe

just to begin with like is there an internal tension between the two like is that a discussion like you know like like the API may whatever it may help a

competitor to like the vertical version or is it not things are just growing so fast it's not an issue. I would just love to how you think about that. By the

way, it's very unusual for companies to have both of that these two things this early. It's very unusual.

early. It's very unusual.

>> Yeah. Yeah. I completely agree. Uh I

think there is there's there's some amount of tension. I uh I would say it's I think one thing that really helps here is uh Sam and Greg just from a like founder you know perspective have since

day one just been very principled in in in in the way in which we approach this.

They've always you know have kind of told us you know we want chatgbt as a first party app. uh we also want want the API and then and the nice thing is I think they're able to do this because uh at the end of the day it kind of comes

back to the the mission of OpenAI which is uh to create AI and then to distribute the benefits as broadly as possible and so if you interpret this you want it in as many surfaces as as you want you and the first party app is a really great way to get you know I

don't know it's like 800 million uh wows or whatever now and um uh but >> 100 million wows >> uh yeah it's pretty it's actually mindboggling to think about I don't think many people listening to this

don't understand how big that is That is crazy. Yeah,

crazy. Yeah, >> that's going to be like actually historic for the for the the time it's taken to get to 800 million.

>> It's historic. It's um also just like Yeah. the amount of time and just like

Yeah. the amount of time and just like how much we've had to scale up >> like tenth of the globe, right?

[laughter] >> Yeah. Yeah. 10% of the globe uses it

>> Yeah. Yeah. 10% of the globe uses it >> every week. Every week.

>> Yeah. And it's growing. And it's

growing. So like at some point, you know, it'll hit like, you know, it'll go even higher than that. And so, um so yeah, obviously the reach there is is unmatched. But then also just like uh

unmatched. But then also just like uh being able to have a platform where we can reach even more than just that. Like

one thing we we talk about internally sometimes is like what does our end user reach from the API like it's actually it's like really really it's really broad. It might might even it's hard

broad. It might might even it's hard because chat GB is growing so quickly but like it like at some points it was definitely larger than than chat GBT and the fact that we're able to get tap in all this and and and get the reach that we want I think is really good.

>> Um but yeah I mean there's definitely some tension sometimes. I think the um I think it's come up in a couple places. I

think uh one of them is is on the product side. So as you mentioned you

product side. So as you mentioned you know sometimes there are competitors kind of like building on our on our um >> platform um who you know might not be happy if chatbt launches something that competes with them.

>> Yeah.

>> Um >> I mean that you know that's the tale of the old is the cloud or operating systems or whatever. So like that's you know I think it's more like >> does chatbt worry about the competitor.

Yeah.

>> You know type thing like you know you enabling a competitor.

>> Yeah. Yeah. So I mean uh the interesting thing is like I would say not particularly mostly just because we've been growing so quickly that I get it's just such a you know force right now.

>> Yeah. Yeah. Growth solves so many so many different things and like and and the other way we think about is like everyone's kind of building building around AGI building towards AGI. Of

course there's going to be some overlap um here. Um so yeah I mean but but I

um here. Um so yeah I mean but but I would say like at least in my position I feel more of this tension from the customer like the API customers themselves right like oh my gosh you know you're like are you going to build this thing that I'm working on? Yeah,

that that story is as old as computer system. There's never not been a

system. There's never not been a computer platform that didn't have that problem. So

problem. So >> So, okay. So, I I I kind of go back and forth on this one. I want to try one out on you. Um

on you. Um >> which is the the the problem historically with, you know, offering um a core services and APIs, you can get disintermediated, right? And so I can

disintermediated, right? And so I can build on top of it, but then you know the user doesn't know like whatever I build on top of the cloud, but I just remediate from the cloud and then I can switch to another cloud or whatever.

>> And it occurs to me that that's kind of hard to do with these models because the models are so hard to abstract away.

Like they're just they're just unruly, right? If you try to like have

right? If you try to like have traditional software drive them, they just don't kind of manage very well. So

part of me thinks that it's almost like this >> like anti-isintermediation technology that you kind of have to expose it to the to the user directly.

Does that make sense? And so I'm wondering if like so even if I think chat GPT is really just trying to expose the model to the user, the API is kind of just trying to expose the model to the user. So I think there's almost this

the user. So I think there's almost this argument that's like if the real value is in the models, it doesn't really matter how you get it to them cuz it's going to be very tough for someone's going to to abstract it away in in in the classic sense of computer science of

like they don't know that they're using the model like you always know you're using GBD5.

>> Yeah. And and the interesting thing is I think like the entire industry kind of has slowly changed their mind around this too. I think like in the beginning

this too. I think like in the beginning we kind of thought like oh these are all going to be interchangeable.

>> It's just like software.

>> Yeah. Yeah. Exactly. A piece of infra swap out. Yeah. But I think we're

swap out. Yeah. But I think we're learning this on the product side with like, you know, the GBD5 launch and like 40 and like how so many people liked 03 and 40 and and all of that. We're also

>> I felt that I felt that when it changed.

[laughter] I'm like I'm like you're not as nice to me. I'm like I like the validation. So

validation. So >> it's actually fun cuz I I really loved GPD5's personality, but I think it's like the way I used, you know, chat GPT was very utilitarian. Like it's like, you know, mostly for work or just like information.

>> Yeah, I've definitely come around just, you know, but like I actually felt a dissonance when it changed. It's like

it's like like there's this emotional thing that goes on but it's almost like it's an anti- >> you know dissident mediation technology like you kind of have to show this to the user.

>> Yeah. Yeah. And and then you see a lot of like you know more successful products like cursor like do this directly especially the coding products where users want more control. Um we've

even seen some like you know like more general consumer products do this and so it's definitely been true on the on the consumer side. Um the interesting thing

consumer side. Um the interesting thing is I think it's also been true on the API side. Uh and that's also something

API side. Uh and that's also something that I think >> no exactly no that's exactly what I'm saying.

The argument could be that I could use the API >> to disintermediate you, but like you don't see that happening because it's so hard to put a layer of software between a model and and a person. You almost

have to expose the model.

>> Yes. Yes. And I think um if anything, I think the models are like >> almost like diverging in terms of like their what they're good at and like their specific use case. And I think there's going to be more and more of this. But yeah, basically it's been

this. But yeah, basically it's been surprisingly hard for uh or like the the retention of people building on our API is like surprisingly high, especially when people thought you could just kind of swap things around. You might have,

you know, like uh even tools that help you uh swap things around. Uh but yeah, the the stickiness of the of the of the model itself has been has been surprising. And then do you think that

surprising. And then do you think that is because of a relationship between the user and the model or do you think it's more of a technical thing which is like

my eval work for like open AAI and it's you know and like the correctness maintains.

>> Yeah. Yeah. I think it's both. Um, so I think there's there's definitely an end user piece here, which is what we've heard from from some of our customers, like they just get familiar with with the model itself. But I also think there's a technical piece, which is like

the also as a developer, especially with startups, you're like really going deep with these models and like really like iterating on it, trying to get get it really good within your particular harness. You're iterating on your

harness. You're iterating on your harness itself. You're giving it

harness itself. You're giving it different tools here and there. Uh, and

so you really do end up like building a product around the model. And so there is a technical piece where you know as you kind of keep building with a particular uh uh product like GPD5 um you're actually like building more

around it so that your product works uniquely well with that with that model.

>> So so I I use I use cursor and um a lot just for like a lot of stuff like like writing blogs and like yeah you know we're investors and I use it for sometimes for coding and >> it's remarkable how many models I use in

cursor. So like literally my go-to model

cursor. So like literally my go-to model is GP5. I love GPD5. I think is a

is GP5. I love GPD5. I think is a phenomenal like you know and then like I use like max mode with GBT5 for planning and then but you know like I mean I like the tab complete model that's in cursor

and like you know the new model they just dropped is for like some basic you know some stuff like yeah the composer one's good >> um and so like you know and I think that

like kind of reflects this too cuz it's like there's a particular model for each particular use case like I've talked to a bunch of people who've used the new composer model and it's just really good for like fast like first pass like keep you in flow kind of thing and then you

kind of like bubble out to another model if you want like you know deeper things.

I literally sit down I literally sit down GP5 to help me plan something out and it's really good at that and then you know like when I'm coding you know I'm doing like the quick chat thing then I'll use composer and then if there's like whatever there's like some crazy

bug or something like that like >> so you know do you remember like in the early days of all of this we're like there's going to be one model and like I mean like like >> even like investors like we will never

invest in a model company because like there will only be one model and it's going to be AGI but like the reality it feels like there's this massive proliferation of models Like you said before, they're doing many things. And

so maybe two questions, maybe too blunt or too crass, but the first one is what does that mean to a for AGI? [laughter]

And the second is what does that mean for OpenAI? Like does that mean that

for OpenAI? Like does that mean that like you end up with a model portfolio?

Do you select a subset? Do you think this all gets superseded by some god model in the future? Like how does that play out? Because it's against what most

play out? Because it's against what most people thought. Most people thought this

people thought. Most people thought this is all going towards one large model that does everything.

>> Yeah. I think the the crazy thing about all this is just like how everyone's thinking has just changed over time.

Totally. like the I I distinctly remember this like and and and the crazy thing is not that long ago. It's just

like three like two or three years ago.

>> I remember like even within OpenAI the the thinking was that there would be like one model that rules them all and it's like why would you I mean like this kind of goes to the fine tetuning API product. It's like why would you even

product. It's like why would you even have a fine tuning product? Why would

you even want to like iterate on it? Um

there's going to be this one model that just subsumes everything and that was also kind of the that is also like the most simplistic like view of what the what the AGI will look like.

>> Um and and yeah, it's like definitely completely changed since then. I think

one uh and and but then the other thing to keep in mind is like it might continue to change like even from where we are today but it's like becoming increasingly clear I think that um uh there will be room for a bunch of

specialized models. There will likely be

specialized models. There will likely be a proliferation of other types of models. I mean you see us do this with

models. I mean you see us do this with like the codeex model totally uh uh itself we have like you like GPD4 and like 40 and like five and and and and all of this. Um and so I I definitely

think there's there's room for all for all this. I I I don't think that's bad

all this. I I I don't think that's bad for what it's worth. like um if anything I think you know as we've tried to move towards AGI things have just been very unexpected and I think the market just evolved and the product portfolio

evolves because of that. Um so I don't think it's a bad thing at all. on what I do >> you could easily argue it's very good for open AI and very good for like the model companies to like yeah because not have like you know winner take all

consolidated dynamics right I mean you just have a healthier ecosystem a lot more solutions you can provide a lot yeah you know yeah and as as the ecosystem grows it generally is helpful like this is one thing we actually think about a lot too is is as the general

like AI ecosystem grows like open just stands to benefit a lot from this and and um this is also why we've uh like some of our products we've even started opening up to other models right like our ethals uh product now allows you to

bring in other models to all this. We

think it's like any any rising tide generally helps us here. Um but yeah, I think as we move into a world where there will be a bunch more models, this is why we've kind of invested in our model customization product with fine-tuning API with the reinforcement

fine-tuning opening that up as well. Uh

it's also why part of why we open sourced uh uh GPOSS as well because we want to be able to you know facilitate that.

>> I want to super I want to talk about that in in in just a bit cuz the open source is actually very interesting and I I mean actually I thought the open source model was great. Yeah. Um but

clearly it's something that companies have to be careful with.

>> But before that I want to talk a little bit about the fine-tuning um uh API. So

so so I I've noticed that you are moving towards kind of more sophisticated use of things like you know like fine-tuning um which you know in a way you could read that as a bit of a capitulation

that like you know there is product specific data and there's product specific use cases that a general model won't do to your point. Right. So like as opposed to

point. Right. So like as opposed to proliferation model you do that. Um it

seems like a lot of that data is actually very very valuable, right? And

so you know to what extent is there like interest in almost a tit fortat where you can like expose you know the ability to get product data

into fine-tuning and then you also benefit from that data because >> the uh the the vendors provide it to you. um versus like this is 100%, you

you. um versus like this is 100%, you know, like they keep their own data and there's kind of no interest in that >> cuz it feels to me like the next level of scaling. This is kind of where we're

of scaling. This is kind of where we're at. And so

at. And so >> just kind of curious how >> Yeah. So I mean maybe even like taking a

>> Yeah. So I mean maybe even like taking a step back the the main reason why we even invested in a fine-tuning API in the very beginning is uh one there's been huge demand from people to be able to customize the models a bit more. It

kind of goes into like prompt engineering and also like I think the industry's changed their mind on that as well like it's evolved. But the second thing is exactly what you said which is the companies just have giant treasure troves of data that they are sitting on

that they would like to utilize in some fashion in this AI wave and you can you know the simple thing is to put it in like you know some like vector like do rag with it or something but there's also you know if they have more technical team they do want to see how

they can use it to customize the models >> and uh and so that is actually the main reason why we've invested in in this the interesting thing was uh way back kind of back in like 22 23 our finetuning

offering was I' I'd say like too limited so that it was very difficult for people to to tap into and use this data. So it

was just like an SF like a supervised fine-tuning API and like we're like oh you can kind of use it but in practice it really is only useful for like >> like it's honestly just like instruction following plus+ you like kind of change

the tone and you're really like instructing it. Um, but I think the the

instructing it. Um, but I think the the the big unlock that has happened recently is with the reinforcement fine-tuning model because with that setup, we're now letting you actual run actually run RL, which is more finicky and it's like harder and and you know

like you need to invest more in it, but it allows you to leverage your data way more. This is this is just a naive

more. This is this is just a naive question for me which is >> it feels from from just my understanding from my own portfolio it feels like there's two modalities of use. One of

them is I've got a treasure trove of data that I've had for a long time and I create my model on that treasure trove of data and all that happens offline and then I deploy that. Y

>> there's another one which is like I actually have the product being used in real time. I've got a bunch of users.

real time. I've got a bunch of users.

>> Yeah.

>> And like I can actually get much closer to the user. I can kind of AB test and decide which data and like it's kind of more of a near real time thing. is is is like is this focus on like more product

stuff or more treasure trove or >> so the dream with the fine tuning API was that we should be able to handle both right it's like it's like we actually had this dream and and we have this whole like Laura setup with the fine-tuning inference where we should just be able to scale to like millions

and millions of of these fine tuned models which would is usually what would happen if you have like this online learning thing in practice it's mostly been the the format in practice mostly been like the offline data that they've like already created or they are

creating with experts or something and like using their product that they're that they're able to use here. Um but

the main thing I was trying to uh say around the reinforcement fine tuning API is it kind of changes the uh paradigm away from just like small incremental improve like tone improvements which is

what SFT did to actually improving the model to potentially soda level on a particular use case that you you know about like that's where people have really started using the reinforcement fine tuning API and that's why it's it's

it's gotten more more um uh more uptake because if if the discussion is less like hey I can make this model you know not like speak in a certain way less compentic

planning or something you can create the world's best model using your data set with RFT then it becomes a lot more >> and will you will you ever like or maybe do you will you ever like find ways to

get access to that data like you know listen if I if I had the data and I wanted cheap GPUs I'd trade you for it like I don't know like >> yeah we I mean we we we've talked about this and uh we've actually been piloting

some pricing here too where it's like um because this data is like really helpful uh and and it's kind of hard to get and um uh if you actually build with the reinforcement fine tuning API, you can actually get discounted inference and uh

potentially free training too if you're willing to share the data. It's always

kind of you know it's up to the the customer there but uh if they do um it is helpful for us and and there there will be benefits for the for the customer as well.

>> That's awesome.

>> Okay, you you said that views on prompt engineering have changed.

>> Yeah, I wasn't actually I wasn't aware of that. All the other things I was

of that. All the other things I was aware of this one I wasn't. How

>> I mean I think the prevailing view this is back in 2022. I remember I was talking to so many people and they're basically I mean this is similar to like the single model AGI view as well which is like >> like prompt engineering is just not going to be a thing and you're just not

going to have to think about what you're putting in the in the context window in the future like the model will just be good enough and it'll just like know it'll know what what what you need to do and that's not a thing. [laughter]

>> Yeah. Yeah. But like that that like I don't know maybe people forget it but like that was like a very common belief back then cuz like the scaling laws or whatever something with scaling laws and like you'll just mind mel with the model and like you just like >> like prompting and like instruction

following will will be so good that you won't really need to do it and if anything like yeah it's like clearly been wrong and um uh but it is interesting because I think it's a slightly different world that we're in now where the models have gotten really

really good at instruction following relative to the you know like GBD35 or something >> but I think the name of the game now is is less on like prompt engineering. as

we had thought about it two years ago.

It's more of like it's like the context engineering side where it's like what are the tools you give it? What is like the data that it pulls in? When does it pull in the right data?

>> Well, this is very interesting. I mean I mean to reduce it to like an almost absurdly simplistic level. Like the

weird thing about rag for example, the classic use of rag is like you're using like cosine similarity to choose something that you're going to feed into a super intelligence.

>> [laughter] >> you know, you're like, I'm going to random. I'm going to like randomly grab

random. I'm going to like randomly grab this thing based on like [ __ ] embedding space. It doesn't really, you

embedding space. It doesn't really, you know, and like and then I'm, you know, when you want the super intelligence decide the thing to do. And so it's like pushing intelligence in that retrieval clearly is something that makes a lot of sense like the pushing the intelligence

out in a way.

>> Exactly. And and and to be fair, I think like rag was kind of introduced when the models were like it's like pre-ereasoning models. It was like you

pre-ereasoning models. It was like you only had kind of like one shot to like do this and it wasn't that smart. Um but

now that we do have the reasoning models now that we have I mean if you like one of my favorite models is actually 03 because it was like one of the most diligent models you kind >> it would just like do all these tool calls and it's like really the

intelligence itself trying to like do the you know tool calls or rag or anything like that >> um or write the code to execute um and so the the paradigm has shifted there but yeah because of that I think like condex engineering prompt engineering

what you put what you give the model is like extra important now >> y >> okay so you have API so you the API which is horizontal You've got chat GBT and other products which are vertical.

We haven't even talked about pixels.

It's all just just language. Are agents

a new modality? Is that something else?

Like you know like um >> codeex or >> what do you mean by modality there?

>> Like um and they feel both vertical and horizontal to me in a way like to me chat GBT is a product >> right? It's like it's a product and like

>> right? It's like it's a product and like my mom uses it, right?

>> Y >> and an API is a dev thing. you kind of give it to a developer and like a CLI is kind of somewhere in between to me. It's

like is it a product? Is it like it is horizontal? Like

horizontal? Like >> you know how is it handled internally?

Is it a totally separate team that does agents or No. So it's um uh yeah it's interesting because like I I think the way that I I the way that you framed it just now almost seemed like

agents was like this like singular concept that like you know might or like might have its own particular >> maybe a better question is what is an agent to you?

>> Yeah. Yeah. Yeah. like even

[clears throat] getting a language [laughter] is like important for this conversation.

>> Yeah.

>> So, I I actually don't even know if it' be helpful for me to share, but my my general take on agents is it's it's a it's an it's an AI that will take actions on your behalf that can work over long time horizons. And I think that's the that's the pretty general

>> utilitarian definition. But like if you think about it that way, yeah, I mean maybe this is what you mean by modality, but it is just a like way of like using AI and it is a I guess it could be

viewed as a modality, but we don't view it as like a separate thing separate from API and let me just let me just try and kind of uh you know give you a sense of where this question is coming from. Like

I know how to build a product like and we know how to do go to market for products. We know how to do like, you

products. We know how to do like, you know, we know the implications of turning them into platforms. Like it's just we've been doing this for a very long time, right? We know how to do the same thing for APIs, right? We know how to do billing. We know like the tension

of like people build on top of it and all of that stuff. And like what I've been trying to and this is just maybe a personal inquiry. It's just not clear

personal inquiry. It's just not clear for me for an agent if you if it if it sits in one of those two camps. Is it

more like the product camp? Is it more like the [laughter] >> or cuz it's kind of both. Like I could like literally give you code.

>> Yeah. Yeah.

>> And like as a user and then you just talk to it or I could like build in a way kind of embed it in like my app and so like but then that means something to you as far as like you know how do you

price it and what does it mean for ecosystem like like for example like would you be fine if I started a company and just like built it around codecs? Is

that a thing?

>> Starting a company and building it around. I actually think that would be

around. I actually think that would be great. Like it's a we we like released

great. Like it's a we we like released like the Codex SDK and we like want people to be able to build it and hack on it. Yeah. Actually, I think this

on it. Yeah. Actually, I think this might be what you're getting at which is um >> uh and this is like a kind of a unique thing about OpenAI and kind of reflects on how how it's run which is at the end like at the end of the day OpenAI is

like a an AGI company. It's like an intelligence company.

>> And so agents are just like one way in which this intelligence kind of be manifested. And so the way that I'd say

manifested. And so the way that I'd say we actually think about internally is all of our different product lines. Sora

codeex API chatbt are just different interfaces and different ways of deploying this. you don't really need.

deploying this. you don't really need.

>> So there's no like single teams like this is you know like thinking about agents. I would say the way that it it

agents. I would say the way that it it it manifests itself more is like each product area thinks about like what is you know this intelligence is actually turning into a form where like it can actually agentic behavior is more

possible. What would that look like in a

possible. What would that look like in a first party product like chat GBT? What

would that look like? This is this is actually why Codex ended up becoming its own product. It's like what would it

own product. It's like what would it look like in a coding style product?

>> Like we explored it and chat GBT like kind of worked there but like actually the CLI interface actually makes a lot more sense. That's another interface to

more sense. That's another interface to deploy it. And then if you look about

deploy it. And then if you look about the API itself, it's like this is another interface to deploy it. It's

it's you're thinking about it in a slightly different way because it's a developer first mindset. We're helping

other people build it. The pricing is slightly different, but it's all these like different manifestations of this core like um intelligence that is the the the Asian behavior. Yeah,

>> it is so remarkable how much of this entire economy is basically just token laundering. It's like [laughter] a

laundering. It's like [laughter] a sense, right? It's literally like

sense, right? It's literally like anything I can do to get like [gasps] like English in or like a natural language in and then like you know the intelligence out and uh I mean and and and it's because things are so resistant

to layering. It's so hard to layer a

to layering. It's so hard to layer a language out like you know like >> I could even do it easily with like codecs. I could just like use it you

codecs. I could just like use it you know as as a component of a of a program and just you know basically launder intelligence through it. I mean of course you know I'd be charged to do that. So I I I actually my my my view of

that. So I I I actually my my my view of this and having seen now so many kind of launches of different products. I've

seen agent launches and the definition that you have. I've definitely seen APIs >> um and I've seen products on these is like >> um they're actually quite different than

like what we're used to. Like the cogs is different, the defensibility is different like all so we're kind of rewriting it.

>> Um and so it's kind of like you know you came from a kind of pricing background.

I mean you were working on a demo model for pricing. Now you have the API. So I

for pricing. Now you have the API. So I

just love your thoughts on like I mean how h how h how have you evolved your thinking and how do you price these you know access to intelligence where you know you don't know how many people

going to use it almost certainly usage based billing not something else like can you can you talk just a bit about like philosophy around pricing on these things is it different for product versus API like

>> yeah I think um the the the honest truth here is like it's evolved over time as well and and like I actually think the simplest like the reason why we've and uh usage based pricing on the API honestly is because it's been like it's

closest to how it's actually being used.

And so that's kind of how we how we started. I actually think usage based

started. I actually think usage based pricing on the API has has has like surprisingly held strong and like I actually think this might be something that we'll keep doing for for quite a long time. Mostly because um

long time. Mostly because um >> the so I don't know how you don't do usage base.

>> I just don't know how that >> Yeah. And then and then and then there's

>> Yeah. And then and then and then there's also the strategy of like how we price it and and and internally one thing we do is is uh we always make sure that we actually price our our our usage based pricing from a like cost plus perspective like we're we're actually

just like trying to make sure that we're being responsible from a from a margin perspective. Uh by the way this is a

perspective. Uh by the way this is a huge shift in the industry in general just because like I remember the shift from onrem to uh to recurring.

>> Yeah that was a big big deal like that created Zora like it like created whole companies like their whole books on like a bunch of consultants on how you do this. to change like

this. to change like >> you know and like I think the shift to to usage is is as bigger bigger and it's also even a really hard technical problem like [laughter]

>> I can't even imagine 800 million wow like how do you build >> yeah well well well 800 million WOW is a little easier because it's it's not user based pricing it's subscription so it's like that way but I mean there's still

like uh like a lot of users on the API that we need to like you know manage all the building side >> there's some like overages or stuff you've got to deal with on that or >> uh what do you mean by overages like >> I don't know. I guess

>> most people have quotas and then we'll we'll kind of like max that we don't let people go over but like in practice these quotas are like pretty pretty massive >> and that would literally be like one of the most complex systems somebody's ever built if you would do usage base at like

that scale. I mean these are very very

that scale. I mean these are very very very and like you have to be correct like these are very hard systems to scale.

>> Yep. Yep. Yep. Yeah. Yeah. I mean we have a whole team thinking about this now internally. Um yeah I mean usage

now internally. Um yeah I mean usage pricing is also interesting. So there's

um uh we acquired this company um uh called Roxet a while ago. a founder. His

name is Venot. He's right here. Awesome.

Venot's Vanc's incredible. I He's one of the best. Like

the best. Like >> Ven Cott, if you're listening, we're huge fans. I'm I'm a huge fan.

huge fans. I'm I'm a huge fan.

>> He's he's going to love this. Um

>> No, he's great, man. He's a legend.

>> Anyways, I was talking to him uh about pricing uh as well. And his his his take is is uh that pricing is kind of like a one-way ratchet. And like basically once

one-way ratchet. And like basically once you get a taste of usage based pricing, you're never going to go back to like the per the like per deployment uh type pricing. And I think that's definitely

pricing. And I think that's definitely true. And I think it's just cuz it's

true. And I think it's just cuz it's getting it gets closer and closer to like your true utility. You're getting

all this thing. The main pain point is like you have to maintain all infra.

Yeah. To like get it to work well. But

if you do have it, he he thinks it's like a one-way ratchet where like there's just like no no going back.

>> And then and I think the hot new thing now is like oh with AI you can now kind of measure like outcomes. And so that's like another, you know, like step forward. And if that works like maybe

forward. And if that works like maybe it's a oneway ratchet.

>> Um so we we thought about that is like you know is there some type of like outcome based pricing. This is more on the first party side on an API. It's

kind of hard to measure. very hard. I

mean, that's hard because you end up having to price and value non-computer science infrastructure, right? Like,

you're literally going into verticalization now. Like, you're like,

verticalization now. Like, you're like, >> I mean, listen, if it's like porting a codebase, maybe you'd have some expertise, but if it's like whatever, like increasing crop yield, [laughter] >> at some level, you need to like

>> but but there could be a world where like the AI is like good enough where it can like actually, you know, make judgments of these and do it in an accurate enough way where we can tie it to to billing. I think this is a problem with AI conversations because like at any point in time you're like but it

could get good enough. [laughter]

>> It's not a problem anymore.

>> Yeah. At some point it'll be solved.

>> It's so much like the prompt engineering and the single age I think from before.

Yeah. Yeah. It's like when you when you reach that level of of when you push it that far everything's kind of solved on uh outcome based pricing. Um it sounds very appealing like if it can work it can work. But one uh thing that uh we've

can work. But one uh thing that uh we've started realizing is >> um it actually ends up correlating quite a bit with usage based pricing especially with test time compute like if the thing is just like thinking quite a bit like actually you know if you

charge just by usage race usage based and not outcome based you're like basically approximating outcome based at this point.

>> Um if the thing is like thinking for like so long it's like highly correlated with with with what it's doing >> just adding more value.

>> Yeah. Yeah. Exactly. And so like maybe at the end of the day like usage based pricing is all you need and it's like we're just going to like you know live in this in this world forever. Um but uh yeah I don't know it's it's constantly evolving. I think our change our

evolving. I think our change our thinking has evolved here as well. Um I

I I personally am like keeping track of if um the outcome based pricing setups can actually work here but at least on the API side I think you know it's it's such a usage based setup we have to get infrastructure around this and so I think we'll probably stay with that for

a while.

>> So how do you think about open source? I

mean, you know, I think you're the only big lab that's releasing open source. Is

that >> No, Google has uh some of theirs. Yeah.

Mostly smaller models on their >> side. That's right. Yeah. So, how do you

>> side. That's right. Yeah. So, how do you think about open source visav, >> you know, competition, cannibalization, you know, like what's what's the strategical

what's the complexity?

>> Yeah. Yeah. Um, so, uh, I personally love open source. Like I I think I think it's great that there's a >> All of us grew up with it, right?

>> Yeah. grew up with it like the internet wouldn't exist without it like you know so much of the world world is built built on top of it >> cloud wouldn't exist without of it nothing would exist without of it except for maybe windows >> and so it was interesting cuz like I

felt like over the last before we launched the open source model I know Sam feels this way as well it's like there's this like weird like you know uh mindset where because openi hadn't launched anything it just seemed like it

was super like anti like open was like super anti open source >> um but I'd actually been having conversations with Sam ever since I joined about open sourcing a We were trying to think about like how can we sequence it? What comput is always a

sequence it? What comput is always a hard thing. It's like do we have the

hard thing. It's like do we have the compute to kind of like train train this thing? So we've always wanted to kind of

thing? So we've always wanted to kind of do this. I'm really glad that we were

do this. I'm really glad that we were able to finally do it.

>> I think it was ear was it earlier this year? I like lost sense of time.

year? I like lost sense of time.

>> AI time is so good.

>> Yeah. Was it last year? No, it was this year. Yeah. When GPOSS came out. Um and

year. Yeah. When GPOSS came out. Um and

so I was just really glad that that that we did that. The way that I generally think about it is one um I think as a um this is also particularly true for um for OpenAI because as you said we are

vertical and a horizontal company is like we want to continue investing in the ecosystem and just from a like brand perspective I think it's good but then also >> I think from OpenAI's perspective um

>> uh if uh the AI ecosystem grows more and more it's like a rising tide and like yeah this is all like really helpful for us uh and if if we can launch an open source model and it helps like unlock a whole bunch other use cases in the other

industries. I think that's, you know,

industries. I think that's, you know, that's that's actually not good for us.

>> I'll say what what what people don't talk about a lot is like how well these open source AI business models actually work because like this is very like like the cannibalization risk is actually very low.

>> Yeah.

>> Um and like you don't really enable competitors a lot because I mean when we say open source you really mean open weights, right? It's not like they could

weights, right? It's not like they could recreate it, right? you know, and like if I can distill your API as well as I can distill like you giving me the weights in some way like so like it doesn't really change that dynamic a lot.

>> Um but >> yeah I mean to be to be clear like we have not seen cannibalization at all from models and it's like it's like seems like a very different set of use cases. Um the customers tend to be like

cases. Um the customers tend to be like slightly different. The use cases are

slightly different. The use cases are very different.

>> Um >> and by the way it turns out inference is super hard like to actually have like scalable fast performant that's a hard hard problem.

>> Yeah. So, so like I'd say the way that I personally think about open source in relation to the API business in particular is uh well one it hasn't shown cannibalization risks. So, you

know, I'm not particularly worried about that. But also like especially for all

that. But also like especially for all these major labs, like there are usually like two or three models where like that is where you're making all of your impact, all of your revenue and those are the ones where we're throwing a bunch of resources into improving the

model and these tend to be the larger ones that are like extremely hard to inference. um we have a really cracked

inference. um we have a really cracked inference team at at OpenAI and my sense is like even if we just like you know open source them like if we just literally open sourced GBD5 or something it would be really really hard to

inference it at the level that we were able to to get it um to do. There's also

by the way like feedback loop between the inference team and like the training team too. So like we can kind of like

team too. So like we can kind of like optimize.

>> Can you can you like is it possible to verticalize models for products?

>> I have you like train models specifically for products?

>> Yeah, I mean to actually Yeah. Uh I I think I mean we've kind of done this with GPD5 codecs, right? Or do you mean like even more verticalization like verticalization >> like like deep deep deep verticalization

where like you know like the like like the the released model wouldn't you know it's like actually part of a product.

>> I think we're like basically starting to move in that direction. Um I think there's a question of how deeply you verticalize it. I think most of what

verticalize it. I think most of what we've done is mostly at like the post- trainining like the tool use level like codex is particularly good at using the sorry GB5 code is particularly good at using the codeex harness. Um but uh

there's like even deeper verticalization you can do like that and that one I think is more of an open question.

>> Yeah. Well, like a lot a lot of my I mean a lot of my mental model this comes from the pixel space which is like you you know um you can lura a bunch of image models right and you

can you can do a bunch of stuff to make it better and more suitable for some products for example um but like these open- source models are really really good and like I

you would believe that you could like verticalize a model for like editing or cut and paste or this or that you know like that's actually part of this but you actually don't see Did that happen?

>> Yeah, it it's almost always like you're just kind of exposing like a a model, not something like specific to a product.

>> Yeah, I think I think so. I think

there's a distinction to be made between the like the the image model space and the text model space also because the image models tend to be way smaller and like you can iterate on it a lot faster.

Like that's why you get that crazy cool proliferation of like the image model side whereas like I don't know for the text models there's always going to be this like really big fat pre-training step that like you have to invest in here and then even the post training side is like

>> you know it's not the it's not like the easiest thing like it's you know uh we like just from a compute perspective obviously it's much smaller but like it's still pretty heavy to do like a full mid train or like a post- training run

>> uh and so I I actually think like that's one of the big bigger bottlenecks um because I think you're you're you are right that like on the image side yeah you can like fine tune a like image diffusion model to be like extremely

good at like editing faces something very specific and build a product around that and it's like yeah you can just kind of put all these resources into and iterate on that one specific model whereas it's much it's a much heavier motion it seems like on the tech side

>> I I got to say I it is a bit of an anti-attern to do both languages like language based models and diffusion like pixel models in the same company like

>> um most that have tried like it found it very clunky to do it but >> I mean you and Google are the two kind of counter examples for this. And so

like >> is it possible to even like converge the infrastructures on these things? Like I

mean is it totally different orgs? Is it

shared infrastructure? Like how do you operationalize?

>> Yeah, I think uh I think you're totally right. It's an anti pattern. It's pretty

right. It's an anti pattern. It's pretty

tough to pull off. Um

>> uh I think honestly like props to Mark on our research team for like you know structuring things in a way we're we're able to do it. Um for from my perspective, I think the biggest thing is I think our like image uh like our

our um I think call like the world simulation team or like the team that builds Sora and all that under Aditia uh is just extremely solid like they are pro it's like the highest concentration of like talent that I've seen in a

while.

>> But is it the same like is it the is it like are they like totally separate infrastructure? Do they use the same

infrastructure? Do they use the same >> infra? Yeah. Yeah. So it's it's it's

>> infra? Yeah. Yeah. So it's it's it's actually like pretty separate. So and I think that's part of the the reason why we're able to kind of do this. Well,

it's like one is like the team needs to be extremely strong, which which which they are. And then two is um they're

they are. And then two is um they're they're they're run very separately.

They're kind of like thinking about their own uh particular road map. They

think about productization very separately as well, right? Which is how like the Sora app kind of came came out of that as well.

>> Um and then uh yeah, even like the inference stacks are are slightly different are are kind of like different. Um they they they own a lot

different. Um they they they own a lot more around their inference stack and they optimize their inference stack pretty pretty uh separately. And so um I think that that contributes to to helping us run things in parallel but uh uh it's pretty hard to pull off for

sure.

>> May May maybe you can educate this on me like so I think about APIs as mostly text based from Open AI. Do you guys do actual do you do actual pixel based stuff?

>> Yeah. Yeah, we do. We have a bunch uh so Dolly Dolly in the API the OG uh OG model. Uh Dolly 2 is in the API.

model. Uh Dolly 2 is in the API.

>> That was like the first real text image model right?

>> Yeah. Yeah. Yeah. Yeah. That that was actually the model that got me to go to OpenAI because it was it was this summer when I was looking for I was thinking about something new. It's when Dolly 2 came out and it just completely blew my mind. Wow.

mind. Wow.

>> Uh and I distinctly remember I was like asking it to do the simplest thing like draw a picture of a duck or something.

[laughter] It was like the simplest thing now and it just like it it generated a picture of a you know like a white duck and so uh uh that that was actually the thing that that kind of got me to to open in the first place. But

yeah, we have a bunch in our uh in our API. Uh the image gen model uh as well

API. Uh the image gen model uh as well is in our API and then Sora 2 is in our API. We launched it at dev day. It's

API. We launched it at dev day. It's

actually been a huge hit. I've been very very surprised. Um need more GPUs for

very surprised. Um need more GPUs for that. Um but uh the amount of use cases

that. Um but uh the amount of use cases >> and then from your standpoint like you can converge that like the API infrastructure probably like that.

>> Uh yeah. So so there's yeah I'd say on the API side a lot of the infrastructure is shared for those but once you reach the inference level they're they're separate right because you got to inference them differently. Um, and it is that team that has just like been

really laser focused on making that side particularly efficient and and uh uh uh yeah and and work well uh separate from the from the text models. Um but yeah, yeah, we have image gen, we have video

gen um and we'll continue adding more uh to the API there. So it's it it feels like uh we've been evolving our our our our thinking as an industry on a bunch of stuff, right? Like one of them for sure is like the models like we've

talked about. The other one is like

talked about. The other one is like context engineering. It seems to me that

context engineering. It seems to me that like actually how you build agents and expose them has evolved too. So maybe

you can talk a bit about that.

>> Yeah. Yeah. I think um so at dev day this year when we launched our agent builder I got a bunch of questions around this because agent builder was like the bunch of different nodes and it's like the deterministic thing and I was like oh is this really like the future

>> future of agents and >> um we obviously put a lot of thought into this when we were thinking about building that product. But the way I think about it is >> do you think they came from a point of being constrained by the way? They're

like, "Oh, this is too constraining."

And like, >> yeah, I think people are like, "It's too constraining. It's not like AGI forward,

constraining. It's not like AGI forward, you know, like at the end of again at the end of the day, the AGI will do everything." And so, like, [laughter]

everything." And so, like, [laughter] why not why have nodes in this like node builder thing?

>> Just tell it what to do.

>> Yeah. And so, I think there's like two things at play here. One of them is like there is a like practicality component.

And then the other thing is I think there are actually like different types of work that exist out there that could be automated into agents. And so on the practicality side is yeah like the models today just like maybe in some future world instruction following would

be so good that you just like ask it to do this fourstep process and it like always does the fourstep process.

Exactly. Um we're still not there yet and in the meantime you know this entire industry being born and a lot of you know people still want to use these models like what what can you build for them? So there's a practicality

them? So there's a practicality component of it.

>> When when when did you launch that >> uh dev day? So feels like forever ago.

earlier this month. Uh October

[laughter] uh it was like October 6th or something. Yeah. Yeah. Yeah. So, less

something. Yeah. Yeah. Yeah. So, less

than a month ago, actually.

>> Okay.

>> Um uh it's been it's it's been crazy seeing the the uh uh reception to it. By

the way, like it's uh the I think the video where Christina on my team demos agent builder is like one of the most viewed videos on our YouTube channel.

Now, >> I will say I will say just anecdotally from kind of my perspective, people love it and >> but I also saw the dissonance too. Like

I saw when it came out, people were like, "Wait, what is this? Is this no code? Low code?" [laughter]

code? Low code?" [laughter]

>> Yeah, exactly. It's another low code thing. Um

thing. Um >> I know people love it. Yeah. Yeah.

>> Yeah. So there's a practicality piece.

There's another piece which is like when we were talking to our customers, we've realized that there's like cuz at the end of the day, a lot of this um the agent work is just trying to automate work and like what people do in their day-to-day jobs. There's like actually

day-to-day jobs. There's like actually like two different types of work.

There's the work that we think about which is like maybe what like software engineers do which like it's very undirected. there's like a high level

undirected. there's like a high level goal and and then you have like you know you have your cursor and you're just like writing writing code and and you you're kind of like exploring things and going towards an objective that's like I don't know more like knowledge based

work like data analysis maybe like that like coding is kind of like this >> um but then there's another type of work which is actually what we realize is like maybe even more prevalent in industry than than than software we're just we're just not aware of it which is

work tends to be very procedural very like SOP oriented like customer support is a good example of this like customer support there's like very clear policy that these agents and people have to follow and uh it is actually not great

for them to deviate from this and like try something else. It's like the the team really the the the people running these teams just really want the the SOPs to be followed.

>> Uh and this pattern actually generalizes a different work >> a standard operating procedure. Yeah.

Sorry. Uh so it's like uh the way in which uh you need to operate the the um the um the support team. But like this extends to like marketing, this extends to like sales, extends to like a bunch way more than it has any right to. And

what we realiz on that side to have determinism here of an agent builder with nodes that kind of like helps enforce this thing ends up being very helpful. But I think a lot of us especially in Silicon Valley don't

really appreciate that there's like a ton of work that actually falls into this camp.

>> I got to say like there's a pattern that's similar to this. I'm wondering if you've seen it that I've seen where um some regulated industries actually can't let any generated content go to a user.

Yeah. Right.

>> And so what they do is think it's so interesting. They'll like either pass in

interesting. They'll like either pass in like like a a a conversation tree and like you can choose something from here.

>> Yeah. So there's some human element uh to it.

>> So so as part of the prompt they're like here are the viable things you can say choose which one to say. So the language reasoning is happening by the model but nothing generated comes out.

>> Interesting. Interesting.

>> Does that make sense?

>> Yeah. Yeah. Yeah. Yeah.

>> And then another one I've seen is like actual pseudo code. I'll pass in like a Python function >> and then it'll ask a human to like write like uh use the pseudo code to write actual code that that makes it in or

>> the it actually has a response catalog as part of it and it has like the logic to apply and then interesting >> and so like the model takes the language in from the it takes the language in

from the human user >> and then well like you know the logic of how to respond is I can Python code because it just turns out that like there's been a lot of code written for these types of things and then it

includes the responses that you would send out. Does that make sense?

send out. Does that make sense?

Actually, a lot of NPCs are done this way, like actually video game NPC. So,

so cuz cuz the way that I think about it is like >> So, so that way the with the NPCs, it's the the the actual code being generated by the model is not what ends up making it to the to the end user to the >> that's it's not the the code is not

being generated by the model. It's the

prompt has the code. So, like, so let's say let's say that I have an NPC and I want the NPC >> like let's say you're you're you're the gamer and um so you're you're coming in and you're talking to my NPC, but my NPC has some logic that it needs to do like

if you say a certain thing, I'll give you a key or maybe I'll barter >> like describing the game logic in English just doesn't work actually if you try and do it. And then and then like actually scripting the output

doesn't work either if you needed to use it in a game context like you would have to know like give like a specific direction or a specific this or that. So

how do you make these things behave in a more constrained way?

>> People will pass in functions like they'll actually describe the logic in Python. So like my prompt will be like

Python. So like my prompt will be like you're an NPC in a video game. The user

just asked you a question. Here's the

logic you should go through. If the user says this then do this. It's like the pseudo code like if the user has this you know in the belt do this like whatever whatever whatever and then here are the set of valid responses. And so

you're almost constraining.

>> Oh I see. I see. And then when it actually does do a response, you can you can validate that it's one of those responses.

>> I see. So it's like highly structured.

Yeah. Okay. So the NPC still only exists in that like the space that it can act in is still only within the space of the the program that you gave >> the the the Yeah. Well, the logic is in there. So it can have a normal

there. So it can have a normal conversation, but like in as much as you're trying to guide the logic for like like like game design or game logic. So, so you see this with NPCs,

logic. So, so you see this with NPCs, but you also see this with regulated industries where like I literally can't have it like >> Yeah, I was going to say what you described kind of sounds like, you know, giving the the SOPs to like your set of human operators to like have to stick to

it please.

>> Yeah. You must say these three things and here's like the >> you cannot give a refund if it's like less than this amount. Yeah. Yeah. Yeah.

Yeah. Very interesting. Yeah. I mean I mean Yeah. I don't want to equate them

mean Yeah. I don't want to equate them to NPCs, but like this is [laughter] very similar to similar. I'm just saying it's actually like if you want if you want to really guarantee what happens, you have there's like a set of

techniques that you do and like >> there's some situations where you want to constrain what they do. It could be from a regulatory standpoint. It could

be because you want it to run for a long time. And it also could be because I

time. And it also could be because I actually have game logic and my game logic is a traditional program like I have like a monetary system. I have an item system. I have a battle system.

item system. I have a battle system.

Like you can't describe that in English.

Like you have to kind of give it to them so it can behave within that.

>> Yes. And that is that is exactly the problem I think we're trying to solve here, right? like if you do not give it

here, right? like if you do not give it any of this like it can just kind of go off and do do whatever and yeah there are like regular regulatory concerns around this um and that is the exact use case that I think we're trying to target with Asian builder.

>> That's awesome. Well listen well we're running out of time and there's a million more things I want to ask you but I listen I really appreciate your time to come in. It was great kind of surveying like what's going on um and particularly like teasing apart

horizontal versus vertical in the space which I really want to do. So thank you so much.

>> Yeah, thank you.

>> [music] [music]

Loading...

Loading video analysis...