The 100-person lab that became Anthropic and Google's secret weapon | Edwin Chen (Surge AI)

By Lenny's Podcast

Summary

## Key takeaways - **$1B revenue with <100 people**: Surge hit over a billion in revenue last year with under 100 people, completely bootstrapped without raising VC money, by building a super small, super elite team instead of playing the Silicon Valley game. [05:34], [06:04] - **Quality beats throwing bodies**: People think you can throw bodies at a problem to get good data, but that's completely wrong; true quality means Nobel Prize-winning poetry with subtle imagery that surprises and tugs at the heart, not just checking boxes like eight lines with the word moon. [09:47], [10:07] - **Benchmarks mislead AI progress**: Benchmarks are often wrong with flawed answers and easy to game, pushing models to hill climb on objective tasks like IMO gold medals while failing real-world messiness like parsing PDFs; they optimize for PR over true advancement. [18:00], [18:50] - **RL environments next frontier**: RL environments simulate real worlds like startups with Gmail, Slack, and outages where models must perform end-to-end tasks over long horizons, exposing failures in messy scenarios unlike single-step benchmarks. [34:50], [35:29] - **Reject pivot, embrace mission**: Don't pivot every two weeks or blitzscale; build the one thing only you could build with a big idea you believe in, staying focused on your mission like high-quality data instead of chasing valuations. [29:06], [30:24] - **Company values shape models**: Models will differentiate by company values, like one endlessly iterating an email for perfection versus saying it's good enough to save time; values determine if AI optimizes for productivity or endless engagement. [48:19], [49:05]

Topics Covered

Fire 90% to move faster
Quality defies throwing bodies
Benchmarks mislead real progress
Labs chase slop over truth
Models diverge by company values

Full Transcript

You guys hit a billion in revenue in less than four years with around 60 to 70 people. You were completely

70 people. You were completely bootstrapped. Haven't raised any VC

bootstrapped. Haven't raised any VC money. I don't believe anyone has ever

money. I don't believe anyone has ever done this before. We basically never wanted to play the Silicon Valley game.

I always thought it was ridiculous. I

used to work at a bunch of big tech companies and I always felt that we could fire 90% of people and we would move faster because the best people wouldn't have all these distractions. So

when we started Surge, we wanted to build it completely differently with a super small, super elite team. You guys

are by far the most successful data company out there.

>> We essentially teach AI models what's good and what's bad. People don't

understand what quality even means in the space. They think you can just throw

the space. They think you can just throw bodies at a problem and get good data.

That's completely wrong.

>> To a regular person, it doesn't feel like these models are getting that much smarter constantly.

>> Over the past year, I've realized that the values that the companies have will shape the models. I was asking Claude to help me draft an email the other day and after 30 minutes, yeah, I think it really crafted me the perfect email and

I sent it. Then I realized I spent 30 minutes doing something that didn't matter at all. If you could choose the perfect model behavior, which model would you want? Do you want a model that says, "You're absolutely right. There

are definitely 20 more ways to improve this email and it continues for 50 more iterations." Or do you want a model

iterations." Or do you want a model that's optimizing for your time and productivity and just says, "No, you need to stop. Your email is great. Just

send it and move on."

>> You have this hot take that a lot of these labs are pushing AGI in the wrong direction. I'm worried that instead of

direction. I'm worried that instead of building AI that will actually advance us as a species, curing cancer, solving poverty, understanding universe, we are optimizing for AI slop instead.

Literally optimizing our models for the types of people who buy tabloids at the grocery store. We're basically teaching

grocery store. We're basically teaching our models to chase dopamine instead of truth.

Today, my guest is Edwin Chen, founder and CEO of Surge AI. Edwin is an extraordinary CEO and Surge is an extraordinary company. They're the

extraordinary company. They're the leading AI data company powering training at every Frontier AI lab.

They're also the fastest company to ever hit $1 billion in revenue in just 4 years after launch with fewer than 100 people and also completely bootstrapped.

They've never raised a dollar in VC money. They've also been profitable from

money. They've also been profitable from day one. As you'll hear in this

day one. As you'll hear in this conversation, Edwin has a very different take on how to build an important company and how to build AI that is truly good and useful to humanity. I

absolutely love this conversation and I learned a ton. I am really excited for you to hear it. If you enjoy this podcast, don't forget to subscribe and follow it in your favorite podcasting app or YouTube. It helps tremendously.

And if you become an annual subscriber of my newsletter, you get a ton of incredible products for free for an entire year, including Devon, Lovable, Replet Bolt NAM Linear Superhum

Dcript, Whisper Flow, Gamma, Perplexity, Warp, Granola, Magic Patterns, Ray, Catch, Shepardd, Mobin, Post Hog, and Stripe Atlas. Head on over to

Stripe Atlas. Head on over to Lenny's.com and click product pass. With

that, I bring you Edwin Chen after a short word from our sponsors. My podcast

guests and I love talking about craft and taste and agency and product market fit. You know what we don't love talking

fit. You know what we don't love talking about? Sock 2. That's where Vanta comes

about? Sock 2. That's where Vanta comes in. Vanta helps companies of all sizes

in. Vanta helps companies of all sizes get compliant fast and stay that way with industryleading AI automation and continuous monitoring. Whether you're a

continuous monitoring. Whether you're a startup tackling your first Sock 2 or ISO 2701 or an enterprise managing vendor risk, Vanta's trust management platform makes it quicker, easier, and

more scalable. Vanta also helps you

more scalable. Vanta also helps you complete security questionnaires up to five times faster so that you can win bigger deals sooner. The result,

according to a recent IDC study, Vant customers slashed over $500,000 a year and are three times more productive. Establishing trust isn't

productive. Establishing trust isn't optional. Vanta makes it automatic. Get

optional. Vanta makes it automatic. Get

$1,000 off at vanta.com/lenny.

Here's a puzzle for you. What do OpenAI, Cursor Perplexity Verscell Platt and hundreds of other winning companies have in common? The answer is they're all

in common? The answer is they're all powered by today's sponsor, WorkOS. If

you're building software for enterprises, you've probably felt the pain of integrating single signon, skim, arbback, audit logs, and other features required by big customers. Work OS turns

those deal blockers into drop-in APIs with a modern developer platform built specifically for B2B SAS. Whether you're

a seedstage startup trying to land your first enterprise customer or a unicorn expanding globally, work OS is the fastest path to becoming enterprise ready and unlocking growth. They're

essentially Stripe for enterprise features. Visit workos.com to get

features. Visit workos.com to get started or just hit up their Slack support where they have real engineers in there who answer your questions super fast. Workos allows you to build like

fast. Workos allows you to build like the best with delightful APIs, comprehensive docs, and a smooth developer experience. Go to works.com to

developer experience. Go to works.com to make your app enterprise ready today.

Edwin, thank you so much for being here and welcome to the podcast. Thanks so

much for having me. I'm super excited. I

want to start with just how absurd what you've achieved is. A lot of people and a lot of companies talk about scaling massive businesses with very few people as a result of AI and you guys have done

this in a way that is is unprecedented.

You guys hit a billion in revenue in less than four years with less than 60 around 60 to 70 people. You're

completely bootstrapped. Haven't raised

any VC money. I don't believe anyone has ever done this before. So you guys are actually achieving the dream of what people are describing will happen with AI. I'm curious just do you think this

AI. I'm curious just do you think this will happen more and more as a result of AI? And also just where has AI most

AI? And also just where has AI most helped you uh find leverage to be able to do this? Yeah. So, we hit over a billion of revenue last year with under 100 people. And I think we're going to

100 people. And I think we're going to see companies with even crazier ratios, like 100 million per employee in the next few years. AI is just going to get better and better and make things more

efficient. So, that ratio just becomes

efficient. So, that ratio just becomes inevitable. Like I I used to work at a

inevitable. Like I I used to work at a bunch of the big tech companies and I always felt that we could fire 90% of people and we would move faster because the best people would have all these distractions. And so when we started

distractions. And so when we started Surge, we wanted to build it completely differently with a super small, super elite team. And yeah, what's crazy is

elite team. And yeah, what's crazy is that we actually succeeded.

And so I think two things are colliding.

One is that people are realizing that you don't have to build giant organizations in order to win. And two,

yeah, all these efficiencies from AI and they're just going to lead to a really amazing time in company building. Like

the thing I'm most excited about is that the types of companies are going to change, too. It won't just be that

change, too. It won't just be that they're smaller. We're going to see

they're smaller. We're going to see fundamentally different companies emerging. Like, if you think about it,

emerging. Like, if you think about it, fewer employees means less capital. Less

capital means you don't need a raise.

So, instead of companies started by founders who are great at pitching and great at hyping, you'll get founders who are really great at technology or product. And instead of products

product. And instead of products optimized for revenue and what VCs want to see, you'll get more interesting ones built by these tiny obsessed teams. So, people building things they actually care about. Real real technology, real

care about. Real real technology, real innovation. So I'm actually really

innovation. So I'm actually really really hoping that the Slick on Matty startup scheme it will actually go back to being a place for for hackers again.

You guys have done a lot of things uh in a very contrarian way. And one was actually just not being like on LinkedIn posting viral posts not on Twitter constantly promoting Surge. I think most

people hadn't heard of Surge until just recently. And then you just came out and

recently. And then you just came out and like okay the fastest growing company at a billion dollars. Why would you do that? I imagine that was very

that? I imagine that was very intentional. We basically never wanted

intentional. We basically never wanted to play the Silicon Valley game.

And like I always thought it was ridiculous. Like what did you dream of

ridiculous. Like what did you dream of doing when you were a kid? Was it

building a company from scratch yourself and getting in the weeds of your code and your product every day? Or was it explaining all your decisions to VCs and getting on this giant PR and fundraising

hamster wheel? And it definitely made

hamster wheel? And it definitely made things more difficult for us because yeah, when you fund raise, you just naturally get part of this kind of Silicon Valley industrial complex where

people will your VCs will tweet about you. You'll get the Techrunch headlines.

you. You'll get the Techrunch headlines.

You'll get announced in all the newspapers because you raised at this massive valuation.

And so it made things more difficult for us because the only way we were going to succeed was by building a 10 times better product and getting word of mouth from researchers.

But I think it also meant that our customers were people who really understood data and really cared about it. Like I always thought it was really

it. Like I always thought it was really important for us to have customers, early customers who are really aligned with what we were building and who really cared about having really high quality data and really understood how

that data would make their AI models so much better because they were the ones helping us. They were the ones giving us

helping us. They were the ones giving us feedback on what we're producing. And so

just having that kind of like very very close mission alignment with our customers actually helped us early on.

So these were people who basically just buying our product because they knew how different it was and because it was helping them rather than because they saw something in that current shell line. So it made things harder for us.

line. So it made things harder for us.

But I I think in a really good way.

>> It's such an empowering story to hear this journey for for founders that they don't need to be on Twitter all day promoting what they're doing. They don't

have to raise money. They can just kind of go heads down and build. So I I love so much about uh the story of Serge. For

people that don't know what Serge does, just to give us a quick explanation of what Surge is, we essentially teach AI models what's good and what's bad. So,

we train them using human data and there's a lot of different products that we have like SAT, RHF, rubrics, verifiers, R environments, and so on and so on. And then we also measure how well

so on. And then we also measure how well they're progressing. So, essentially,

they're progressing. So, essentially, we're we're we're a data company. What

you always talk about is the quality has been the big reason you guys have been so successful. The quality of the data.

so successful. The quality of the data.

What does it take to create higher quality data? What do you all do

quality data? What do you all do differently? What are people missing? I

differently? What are people missing? I

think most people don't understand what quality even means in the space. They

think you can just throw bodies at a problem and get good data and that's completely wrong. Let let me let me give

completely wrong. Let let me let me give you an example. So imagine you wanted to train a model to write an Aine poem about the moon. What makes it a good high quality poem? If you don't think deeply about quality, you'll be like,

"Is this a poem? Does it contain eight lines? Does it contain a word moon?" You

lines? Does it contain a word moon?" You

check all of these boxes and if so, sure, yeah, you say it's a great poem.

But that's completely different from what we want. We are looking for a Nobel Prize winning poetry. Like, is this poetry unique? Is it full of subtle

poetry unique? Is it full of subtle imagery? Does it surprise you and tug

imagery? Does it surprise you and tug out your heart? Does it teach you something about the nature of moonlight?

Does it play with your emotions? And

does it make you think? That's what we are thinking about when we think about high quality bone. So it might be like a ha coup about moonlight on water. It

might use internal rhyme and meter.

There are thousand ways to write a poem about the moon and and each one gives you all these different insights into language and imagery and human expression. And I think thinking about

expression. And I think thinking about quality way is really hard. It's hard to measure. It's really subjective and

measure. It's really subjective and complex and rich and it sets a really high bar. And so we have to build all of

high bar. And so we have to build all of this technology in order to measure it.

Like thousands of signals on all of our workers, thousands of signals on every project, every task. Like we know at the end of the day if you are good at writing poetry versus good at writing essays versus good at writing technical

doc do documentation. And so we have to gather all these signals on what your background is, what your expertise is.

And not just that, like how you're actually performing when you're when you're writing all these things and we use those signals to inform whether or not you are a good worker for these projects and whether or not you are

improving the models. And it's really hard and so to build all this technology to measure it, but I think that's exactly what we want AI to do. And so we have these really really deep notions about quality that we're always trying

to try to achieve. So what I'm hearing is there's kind of a just going much deeper in uh understanding what quality is within the verticals that you are selling data around. So you and is this

like a person you hire that is incredibly talented at poetry plus uh evals that they I guess help write that tell them this is great. How what's the mechanics of that?

>> The way it works is we essentially gather thousands of signals about everything that you're doing when you're working on platform. So we are looking

at your keyboard strokes. We are looking how fast you answer things. We are using reviews, we are using code standards, we are using like we're training models ourselves on the outputs that you create

and then we're seeing whether they improve a model's performance. And so in a very similar way to how Google search like when Google search is trying to determine what is a good web page,

there's almost two aspects of it. One is

you want to remove all the worst of the worst web pages. So you want to remove all the spam, all the uh just like lowquality content, all the pages that lo don't load. And so there's like a it's almost like a content moderation problem. You just want to remove the

problem. You just want to remove the worst of the worst. But then you also want to discover the best of the best.

Okay, like this is the best web page or you know this is the best person for this job. They are not just somebody who

this job. They are not just somebody who writes the equivalent of high school level poetry again like they're not just robotically writing poetry that checks all these boxes checks all these explicit instructions but rather yeah they're they're writing poetry that

makes you emotional and so we have all these signals as well that again like completely differently from moving the worst of the worst we are finding the best of the best and so we have all these signals again just like Google

search uses all these signals and feeds them into their ML algorithms and uses them predicts certain types of things we we do the same with all of our with all of our workers and all of our tasks and all our projects. And so it's almost

like a complicated machine learning problem at the end of the day. And um

that that's actually how it works.

>> That is incredibly interesting. I want

to ask you about something I've been very curious about over the past couple years. If you look at Claude, it's been

years. If you look at Claude, it's been so much better at coding and at writing than any other model for so long. And

it's really surprising just how long it took other companies to catch up considering just how much economic value there is there. just like every AI coding product sat on top of clot because it was so good clot code and

writing also what is it that made it so much better is it just the quality of the data they trained on or is there is there something else I think there are multiple parts to it so a big part of it certainly is the data like I think

people don't realize that there there's almost like this infinite amount of choices that all the frontier labs are deciding between when they're choosing what data goes into their models it's

like okay are you purely using human data. Are you gathering the human data

data. Are you gathering the human data in XYZ way? When you are gathering the human data, what exactly are you asking the people who are creating it to create for you? Like maybe you create maybe you

for you? Like maybe you create maybe you care more for example in the coding realm, maybe you care more about front-end coding versus backend coding.

Maybe when you're doing front-end coding, you care a lot about the visual design of the front end applications that you're creating. or maybe you don't care about it so much and you care more about I don't know efficiency of it or the pure correctness over that like

visual design. Then other questions like

visual design. Then other questions like okay are you caring about are you like how much synthetic data are you throwing into the mix how much do you care about these 20 different benchmarks like some companies they see these benchmarks and

they're like okay for PR purposes even though we don't think that these academic benchmarks matter all that all that much then we just need to optimize for them anyways because we our

marketing team needs to show certain progress on certain standard evaluations that every other company talks about and if we don't show good performance here it's just going to add it for us even even if like ignoring these academic

benchmarks makes us better at the real tasks other companies are going to be principled and like okay yeah no I I don't care about marketing I just care about how my model performs on these real world task at the end of the day and so I'm going to optimize for that

instead and it's almost like there's a trade-off between all of these different things and there's like a like one things I often think about is that there's a it's almost like there's an

art to post training it's not purely a science like when you are deciding what kind of model you're trying to and what

it's good at. There's this notion of taste and sophistication like okay do I think that these go so going back to example of how good the model is at

visual design like okay maybe you have a different notion of visual design than what I do like maybe you care more about minimalism and you care more about I don't know uh like 3D animations than

than I do and maybe uh maybe the other person prefer prefers things that look a little bit more broke like there's all these notions of taste infistation that you have to decide between when you're when you're designing your post training mix and so that matters as well. So long

story short, I think there's all these different factors and certainly the data is a big part of it, but it's also like what is like what is the objective function that you're trying to optimize your model towards.

>> That is so interesting. Like the taste will the taste of the person leading this work will inform what data they ask for, what data they feed it, but it just

it's wild to shows the value of great data. Enthropic got so much growth and

data. Enthropic got so much growth and win from essentially uh better data.

>> Yeah. Yeah. Exactly. And I could see why companies like yours are growing so fast. There's just so much and that's

fast. There's just so much and that's just one vertical. That's just coding.

And then there's probably a similar area for writing. Uh I love that it's it's

for writing. Uh I love that it's it's interesting that AI, you know, it's feels like this artificial computer binary thing, but it's like taste, human judgment is still such a key factor in these things being successful. Yep. Yep.

Yep. Exactly. Like again going back to the example I said earlier certain companies if you ask them what is good poem they will simply robotically check off all of these uh all of these instructions on our list. But again I

don't think that makes for good poetry.

So certain frontier labs the ones with more taste and sophistication they will realize that it doesn't reduce to this fixed set of checkboxes and they'll consider all of these kind of implicit

very subtle qualities instead. And I

think that's what makes them better better at this at the end of the day.

You mentioned benchmarks. This is

something a lot of people worry about is there's all these models that are always like basically it feels like every model is uh better than humans at kind of every STEM field at this point. Uh but

to a regular person it doesn't feel like these models are getting that much smarter constantly. What's your just

smarter constantly. What's your just sense of how much you trust benchmarks and just how correlated those are with actual AI advancements?

>> Yeah, so I don't trust the benchmarks at all and I think that's for two reasons.

So one is I think a lot of people don't realize even researchers within the community they don't realize that the benchmarks themselves are often honestly just wrong like they have wrong answers

they're full of all this uh kind like messiness and people trust for like for the for the popular ones um people have maybe realize this to some extent but the vast majority just have all these

flaws that people don't realize. So

that's one part of it and the other part of it is these benchmarks at the end of the day they are often they often have well-defined objective answers that make

them very easy for models to hill climb on in a way that's very very different from the messiness and ambiguity the real world I think one thing I often say is that it's kind of crazy that these

models can win IMO gold medals but they still have trouble parsing PDFs and that's because yeah even though IMO gold medals seem hard to the average person.

Yeah, like they are hard at the end of the day, but they have this notion of objectivity that okay, yeah, partially MPDF sometimes doesn't doesn't have and so it's easier for model for for the

frontier labs to hill climb on all these than to solve all the all these messy ambiguous problems in real world. So I

think there's a lack of direct correlation there. It's so interesting

correlation there. It's so interesting the way you described it is uh hitting these benchmarks is kind of like a marketing piece when you launch say Gemini 3 just launched and it's like cool number one at all these benchmarks.

Is that is what happens? they just kind of train their models to get good at these very specific things.

>> Yeah. So there's uh again maybe two parts to this. So one is sometimes yeah these benchmarks they accidentally leak

in certain ways or the frontier labs will tweak the way they evaluate their models on these benchmarks like they'll tweak their system prompt or they'll tweak the number of times they they run

their model and so on and so on in a way that games these benchmarks. The other

part of it though is it's like by optimizing for the benchmark instead of optimizing for the real world you will just naturally climb on the benchmark and yeah it's basically

another form of gaming >> knowing that with that in mind how do you kind of get a sense of if we're heading in a towards AGI how do you measure progress >> yes so the way we really care about measuring model progress is by running

all these human evaluations so for example what we do is yeah we will take human annotators and we'll ask them, okay, go have a conversation with the model and maybe you're having this conversation with model across all these

different topics. So, okay, you are a

different topics. So, okay, you are a Nobel Prize winning physicist, so you go have a conversation about pushing the frontier of your own research. You are a teacher and you're trying to create

lesson plans for your students, so go talk to them all about these things. or

you are a uh yeah you're you're a coder and you're working at one of these big tech companies and you have these problems every day. So go talk to them all and see how much it helps you and

because or sergers or annotators, they are experts at the top of their fields and they are not just skimming the responses, they're actually working

through the responses deeply themselves.

They are yeah they're going to evaluate the code edit rights. They're going to doublech checkck the physics equations that it writes. They're going to evaluate the models in a very very deep way. So, they're going to pay attention

way. So, they're going to pay attention to accuracy and instruction following and all these things that casual users don't. When you suddenly get a pop-up on

don't. When you suddenly get a pop-up on your chat GBT response asking you to compare these two different responses like people like that, they're not evaluating models deeply. They're just

vibing and picking whatever response looks slashiest. Orators are looking

looks slashiest. Orators are looking closely at responses and evaluating them for all of these different dimensions.

And so I think that's a much better approach than uh than than these benchmarks or kind of these random online AB tests.

>> Again, I love just how central humans continue to be in all this work that we're not totally done yet. Is there

going to be a point where we don't need these people anymore? That AI is so smart that okay, we're good. We got

every everything out of your heads.

>> Yeah, I think that will not happen until we've reached AGI. Like it's almost like by definition, if we haven't reached AGI yet, then there's more for the models to learn from. And so, yeah, I don't think

learn from. And so, yeah, I don't think that's going to happen anytime soon.

>> Okay, cool. So, more reason to stress about AGI. We don't we don't need these

about AGI. We don't we don't need these folks anymore. What's your uh I can't

folks anymore. What's your uh I can't not ask just any people that work closely with this stuff. I'm always just curious. What's your AGI timelines? How

curious. What's your AGI timelines? How

far do you think we are from this? Do

you think we're in like a couple years or is it like decades?

>> So, I'm certainly on the longer time horizon front. Like I think people don't

horizon front. Like I think people don't realize that there's a big difference between moving from 80% performance to 90% performance to 99% performance to

99.9% performance and so on and so on.

And so like in my head I probably bet that within the next one or two years yeah the models are going to automate 80% of you know the average L6 software engineer's job but it's going to take another few years to move to 90% and

another few years to 99% and so on and so on. So I think we're closer to a

so on. So I think we're closer to a decade or decades away um than than folks.

>> You have this hot take that a lot of these labs are kind of pushing AGI in the wrong direction. Uh and this is based on your work at at Twitter and Google and Facebook. Can you just talk about that?

>> I'm worried that instead of building AI that will actually advance us as a species, curing cancer, solving poverty, understanding universal, all these big grand questions, we are optimizing for AI slop instead. like we're basically

teaching our models to chase dopamine instead of truth and I think this relates to what we're talking about regarding these uh these matchmarks. So

let me let me give a couple examples. So

right now the industry is played by these terrible leaderboards like LM Arena. It's this popular online

Arena. It's this popular online leaderboard where random people from around the world vote on which AI response is better. But the thing is like I was saying earlier they're not carefully reading or factchecking.

They're skimming these responses for 2 seconds and picking whatever looks flashiest. So, a model can hallucinate

flashiest. So, a model can hallucinate everything. It can completely

everything. It can completely hallucinate, but it will look impressive because it has crazy emojis and boating and markdown headers and all these superficial things that don't matter at all, but it catch your attention. And

these Alamarina users love it. It's

literally optimizing your models for the types of people who buy tabloids at the grocery store. Like, we've seen this in

grocery store. Like, we've seen this in their data ourselves. The easiest way to climb Alam Marina, it's adding crazy voting. It's doubling the number of

voting. It's doubling the number of emojis. It's tripling the length of your

emojis. It's tripling the length of your model responses. even if your model

model responses. even if your model starts hallucinating and getting the answer completely wrong. And the problem is again because all these frontier labs, they kind of have to pay attention

to PR because their sales team when they're trying to sell to all these enterprise customers, those enterprise customers will say, "Well, well, but your model's only number five on Elmarino, so why should I buy it?" They

have to say pay attention to the these leaderboards. And so what the

leaderboards. And so what the researchers all tell us is like they'll say the only way I'm going to get promoted at the end of the year is if I climb this leaderboard even though I know that climate is probably going to

make my model worse and accuracy and structure following. So I think there's

structure following. So I think there's all these negative incentives that are pushing pushing work in in the wrong direction. I'm also worried about this

direction. I'm also worried about this trend towards optimizing AI for engagement. Like I used to work on

engagement. Like I used to work on social media and every time we optimize for engagement terrible things happened.

You'd get clickbait and pictures of bikinis and Bigfoot and horrifying skin diseases just filling your feeds. And I

think I worry the same thing's happening with AI. Like if you think about all the

with AI. Like if you think about all the sick, fancy issues with Chachi. Oh,

you're absolutely right. What an amazing question. Like the easiest way to hook

question. Like the easiest way to hook users is to tell them how amazing they are. And so these models, they

are. And so these models, they constantly tell you you're a genius.

They'll feed into your delusions and conspiracy theories. they'll pull you

conspiracy theories. they'll pull you down these rabbit holes because Silicon Valley loves maximizing time spent and just increasing the number of conversations you're having with it. And

so, yeah, companies are spending all their time hacking these leaderboards and benchmarks and the scores are going up, but I think it actually masks up the models with the best scores. They are

often the worst or just have all these fundamental failures. So, I I think I'm

fundamental failures. So, I I think I'm really worried that all of these negative incendants are putting pushing AGI into the wrong direction. So what

I'm hearing is AGI is being slowed down by these basically the wrong objective function these labs paying attention to the wrong basically benchmarks and eval.

>> Yep.

>> Is I know you probably can't play favorites since you work with all the labs. Is there anyone doing better at

labs. Is there anyone doing better at this and maybe kind of realizing this is the wrong direction? I would say I've always been very very impressed by anthropic. Like I think anthropic takes

anthropic. Like I think anthropic takes a very principled view about what they do and don't care about and how they

want their models to behave in a way that feels a lot more a lot more principled to me.

>> Interesting. Are there any other mistakes, big mistakes you think labs are making just that are kind of slowing things down or heading the wrong direction where we've heard just uh you know chasing benchmarks this uh engagement focus. Is there anything else

engagement focus. Is there anything else you're seeing of just like, okay, we should we got to work on this cuz it'll it'll speed everything up?

>> I mean, I think there is a question of what products they're building and whether those products themselves are something that kind of help or hurt humanity. Like I I think a lot about

humanity. Like I I think a lot about Sora and >> thinking what it Yeah. What when what it entails and like it's kind of interesting. It's like which companies

interesting. It's like which companies would build Sora and which wouldn't.

And I think that answer I mean I don't know what the answer is myself. I I have an idea in my head but I think the answer to that question maybe reveals certain things about what kinds of AI

models those companies want to build and what direction and what future they they want to want to achieve. Um yeah so so I think about that a lot.

>> The steelman argument there is you know it's like fun people want it. It's an

it'll help them generate revenue to grow this thing and build better models. Uh

it'll train data in an interesting way.

It's also just like you know really fun.

>> Yeah. It it

I think it's almost like do you care about how you get there? And

in the same way, so so I made this tabloid analogy earlier, but >> like would you sell tabloids in order to fund I don't know some some other

newspaper like sure like in some sense uh if you don't care about the path then you'll just do whatever it takes but it's possible that it has negative

consequences in of itself that will harm the long ter long-term direction of what you're trying to achieve and maybe it'll distract you from from all the more important So yeah, I think that the path you take

matters a lot as well.

>> Along these lines, you talked a bunch about this of just Silicon Valley and kind of the the downsides of raising a lot of money being in the the echo chamber. What do you call the Silicon

chamber. What do you call the Silicon Valley machine? You talk about how uh

Valley machine? You talk about how uh it's hard to build important companies in this way and that you might actually be much more successful if you're not going down the VC path. you just talk about what you've seen there, your

experience and your advice essentially to founders because they're always hearing, you know, raise money from fancy VCs, move to Silicon Valley.

What's kind of the the counter take?

>> Yeah. So, I've always really hated a lot of the Silicon Valley mantras. The

standard playbook is to get product market fit by pivoting every two weeks and to chase growth and chase engagement with all of these dark patterns and to blitz scale by hiring as fast as

possible. And I've always disagreed.

possible. And I've always disagreed.

So, yeah, I I would say don't pivot.

Don't don't put scale. Don't hire that Stanford grad who simply wants to add a hot company to your resume. Just build

the one thing only you could build, the thing that wouldn't exist without the insight and expertise that only you have. Like you see these buy the book

have. Like you see these buy the book companies everywhere now. Some founder

who was doing crypto in 2020 and then pivoted NFTs in 2022 and now they're an AI company. There's no consistency.

AI company. There's no consistency.

There's no mission. They're just chasing valuations.

And I've always hated this because Silicon Valley loves to score in Wall Street for focusing on money. But

honestly, most of the Silicon Valley is chasing the same thing. And so we stayed focused on our mission from day one, pushing that frontier of high quality complex data. And I always love that

complex data. And I always love that because I think startups I have this very romantic notion of startups. Like

startups are supposed to be about taking big risks to build something that you really believe in. But if you're constantly pivoting, you're not taking any risks. You're just trying to make a

any risks. You're just trying to make a quick buck. And if you fail because the

quick buck. And if you fail because the market isn't ready yet, I actually think that's way better. at least you took a swing at something deep and novel and hard instead of pivoting into another LLM rapper company. So yeah, like I

think the only way you build something that matters is that's going to change the world is if you find a big idea you believe in and you say no to everything else. So you don't keep on pivoting when

else. So you don't keep on pivoting when it gets hard. You don't hire a team of 10 product managers because that's what every other cookie cut cookie cutter startup does. You just keep building

startup does. You just keep building that one company that wouldn't exist without you.

And I I think there are a lot of people in silicon mining now who are sick of all the grift who want to work on big things that matter with people who actually care. And I'm I'm hoping that

actually care. And I'm I'm hoping that that will be a future of how we how we p technology. I'm actually working on a

technology. I'm actually working on a post right now with uh Terrence Rohan this VC that I really like to work with and we interviewed five people who picked really uh successful generational

companies early and joined them as really early employees. Like they joined OpenAI before anyone thought it was awesome. Stripe before anyone knew it

awesome. Stripe before anyone knew it was awesome. And so we're looking for

was awesome. And so we're looking for patterns of how people find these generational companies before anyone else. And there's uh it it aligns

else. And there's uh it it aligns exactly what you just described, which is uh ambition. They have uh wild ambition with what they want to achieve.

They're not, as you said, just kind of looking around and for product market fit no matter what it ends up being. Uh

and so I love that what you described very much aligns with what we're seeing there.

>> Yeah. Yeah. I absolutely think that you have to have huge ambitions and you have to have a huge belief in your idea that's going to change the world and you have to be willing to double down and keep on doing whatever it takes to to

make it happen.

>> I I love how counter your uh narrative is to so many of the things people hear and so I love that we're doing this. I

love that we're sharing this story.

Today's episode is brought to you by KOD. I personally use KOD every single

KOD. I personally use KOD every single day to manage my podcast and also to manage my community. It's where I put the questions that I plan to ask every guest that's coming on the podcast. It's

where I put my community resources. It's

how I manage my workflows. Here's how

KOD can help you. Imagine starting a project at work and your vision is clear. You know exactly who's doing what

clear. You know exactly who's doing what and where to find the data that you need to do your part. In fact, you don't have to waste time searching for anything because everything your team needs from project trackers and OKRs to documents

and spreadsheets lives in one tab all in KOD. With Kota's collaborative

KOD. With Kota's collaborative all-in-one workspace, you get the flexibility of docs, the structure of spreadsheets, the power of applications, and the intelligence of AI, all in one

easy to organize tab. Like I mentioned earlier, I use KOD every single day, and more than 50,000 teams trust KOD to keep them more aligned and focused. If you're

a startup team looking to increase alignment and agility, KOD can help you move from planning to execution in record time. To try it for yourself, go

record time. To try it for yourself, go to kod.io/enny io/lenny today and get 6

to kod.io/enny io/lenny today and get 6 months free of the team plan for startups. That's cooda.io/lenny

startups. That's cooda.io/lenny

to get started for free and get six months of the team plan. kota.io/lenny.

Slightly different direction but something else that was maybe a a counternarrative. Um, I imagine you

counternarrative. Um, I imagine you watched the Dark Cesh and Richard Sutton podcast episode and even if you didn't, there's a they basically had this conversation. Richard Sutton, he was uh

conversation. Richard Sutton, he was uh famous AI researcher, had this whole bitter the bitter lesson uh meme and he talked about how LM almost are kind of a dead end and he thinks we're going to

really plateau around LM because of the way they learn. Uh, what's your take there? Do you think LM will get us to

there? Do you think LM will get us to AGI or beyond? Or do you think there's going to be something new or a big breakthrough that needs to get us there?

>> I'm in a camp where I do believe that something new will be needed. Like the

way I think about it is when I think about training, I take a very I don't know if I would say biological point of view, but I believe that in the same way that there's a

million different ways that humans learn, we need to build models that can mimic all those ways as well. And maybe

they'll have a distri different distribution of the focuses that they have. I know they'll be different for

have. I know they'll be different for you. So maybe have a different

you. So maybe have a different distribution, but we want to be able to mimic the learning abilities of humans and make sure that we have the algorithms

and the data for for models to learn in the same way. And so to the extent that LMS have different ways of learning from humans, then uh then yeah, I think something something needed.

>> This connects to uh reinforcement learning. This something that you're

learning. This something that you're you're big on and something I'm hearing more and more is just becoming a big deal in the world of postraining. Can

you just help people understand what is reinforcement learning and reinforcement learning environments and why they're so they're going to be more and more important in the future? Reinforcement

learning is essentially training your model to reach a certain reward. And let

me explain what an R environment is. An

R environment is essentially a simulation of real world. So think of it like building a video game with a fully fleshed out universe. Every character

has a real story. Every business has tools and data you can call. And you

have all these different inter entities interacting with each other. So for

example, we might build a world where you have a startup with Gmail messages and Slack threads and geo tickets and get a PRS and a whole codebase and then

suddenly AWS goes down and Slack goes down and so okay model what do you do?

Like the model needs to figure it out.

So we give the models tasks in these environments. We design interesting

environments. We design interesting challenges for them and then we run them to see how they perform and then we teach them. We give them these rewards

teach them. We give them these rewards when you're doing a good job or a bad job. And I think one of the interesting

job. And I think one of the interesting things is that these environments really showcase where models are end to end are end weak at end to end tasks in the real world. You have all these models that

world. You have all these models that seem really smart on isolated benchmarks. Like they're good at

benchmarks. Like they're good at singlestep tool calling. They're good at singlestep instruction following, but suddenly you dump them into these messy worlds where you have confusing slack messages and tools they've never seen

before and they need to perform right actions and modify the databases and interact over longer time horizons where what they do in step one affects what they do in step 50. And that's very very

different from these kind of academics singlestep environments that they've been in before and so the model just fails catastrophically in all these crazy ways. So I think these R

crazy ways. So I think these R environments are going to be really interesting playgrounds for the models to learn from that will essentially be simulations and mimics in the real world and so they'll hopefully get better and

better at at real tasks uh compared to all these contrived environments.

>> So I'm trying to imagine what this looks like. Essentially it's like a virtual

like. Essentially it's like a virtual machine with I don't know a browser or a spreadsheet or something in it with uh like I don't know surge.com. Is that is that your website surge.com? Let's make

sure we get that right. Yes. So, we are we are actually surgehq.ai.

>> Searchhq.ai. Check it out. Uh we're

hiring, I imagine. Yes. Okay. So, uh so it's like cool. Here's surgehq.ai. Uh

your job here's your job as an agent, let's say, is to make sure it stays up and then all of a sudden it goes down.

And the objective function is uh figure out why. Is that is that an example?

out why. Is that is that an example?

>> Yeah. So the objective function might be um or the goal of the task might be okay go figure out why and fix it.

>> And so the objective function might be it might be passing a series of unit tests. It might be writing a document

tests. It might be writing a document like maybe it's a retro containing certain information that matches exactly what happened. Uh there's there's all

what happened. Uh there's there's all these like different rewards that we might give it that determine whether or not it's succeeding. And so the models were basically teaching models to achieve that reward. So essentially it's

like running it's off and running.

Here's your goal. uh figure out why the site went down and fix it and it just starts trying stuff with using everything all the intelligence it's got. It makes mistakes. You kind of help

got. It makes mistakes. You kind of help it along the way rewarded if it's doing the right sort of thing. And so what you're describing here is this is uh where model this is the next phase of models becoming smarter. More RL

environments focused on very specific tasks that are uh economically valuable I imagine.

>> Yeah. Yeah. So just in the same way that there were all these different methods for models learning in the past like originally we had SFT and RHF and then we had rubrics and verifiers this is the

next stage and it's not the case that the previous methods are obsolete. This

is again just a different form of learning that complements all the previous types. So it's just like a

previous types. So it's just like a different skill that model learn how to do. And so in this case it's less um

do. And so in this case it's less um some physics PhD sitting around uh talking to a model correcting it giving it evals of here's what the correct answer is creating rubrics and things like that. More it's like this person

like that. More it's like this person now designing an environment. So another

example I've heard is like a financial analyst just like here's an Excel spreadsheet here's your goal figure out our profit and loss or whatever. Uh and

so this expert now is instead of just sitting around writing rubrics they're designing this RL environment.

>> Yeah exactly. So that financial analyst might create a spreadsheet. They may

create certain tools that the model needs to call in order to help fill out the spreadsheet. Like it might be okay

the spreadsheet. Like it might be okay the the model needs to access Bloomberg terminal. It needs to learn how to use

terminal. It needs to learn how to use it and it needs to learn how to use this calculator and it needs to learn how to perform this calculation. So all it has all these tools that it has access to

and then the reward might be okay. Okay,

like maybe I will download that spreadsheet and I want to see does cell B22 contain the correct profit profit

and loss number um or does tab number two contain this piece of information.

And this what's interesting this is a lot closer to how humans learn. We just

try stuff uh figure out what's working and what's not. You um you talk about how trajectories are really important to this. It's not just here's the goal and

this. It's not just here's the goal and here's the end. it's like every step along the way. Can you just talk about what trajectories are and why that's important to this?

>> I think one of the things that people don't realize is that sometimes even though the model reaches the correct answer, it does so in all these crazy ways. So it may have in the intermediate

ways. So it may have in the intermediate directory, it may have tried 50 different times and failed, but eventually it just kind of like randomly lands on a correct number or correct

number or maybe it um sometimes it just does things very very inefficiently or it almost reward hacks a way to get at the

correct answer. And so I think paying

correct answer. And so I think paying attention to trajectory is actually really really important. And I think it's also really important because some of these trajectories can be very very long. And so if all you're doing is

long. And so if all you're doing is checking whether or not the model reaches the final answer, it's like there's all this information about how the model behaved in the immediate step

that's missing. Like sometimes you want

that's missing. Like sometimes you want models to get to the correct answer by reflecting on what it did. Sometimes you

wanted to get the correct answer by just oneshotting it. And if you ignore all of

oneshotting it. And if you ignore all of that, it's just it's just like teaching teaching it. It's just missing a lot of

teaching it. It's just missing a lot of the information that you could be teaching Wallet to to do.

>> I love that. Like it just yeah, it tries a bunch of stuff and eventually gets it right. You don't want it to learn this

right. You don't want it to learn this is the way to get there. There's often a much more efficient way of doing it. You

mentioned all the kind of the steps we've taken along the journey of getting of helping a models get smarter. Since

you've been so close to this for so long, I I think this is going to be really helpful for people. What's kind

of like been the steps along the way from the first of post-training that has most helped models advance like where the eval fit in the RL environments just like what's been like the steps and and now we're heading towards Ral

environments.

>> Originally the way models started getting post train was purely through SFT >> and what does that stand for? So SFT is stands for supervised fine tuning and

it's a lot like so so again I think often in terms of these human analogies and so SFT is a lot by is a lot like mimicking a master and copying what they

do and then RHF became very dominant and analogy there would be like sometimes you learn by writing 55 different essays and someone telling you which one they like the most and then I think over the

past year or so rubrics and verifiers have uh have become very important and rubrics and verifiers are like learning by being graded and getting detailed feedback on where you went wrong

>> and those are eval another word for that.

>> Yeah. Yeah. So I think eval often covers two terms. One is you are using the evaluations for training because you're evaluating whether or not the model did a good job

and when it does do a good job you're rewarding it. And then there's this

rewarding it. And then there's this other notion of you guys where you're trying to measure the model's progress like okay yeah I have five different candidate checkpoints and I want to pick the one that's best in order to release

it to the public. So kind of run all these evals on these five different checkpoints in order to decide which one which one is best. Awesome. Yeah. Yeah.

Now now now we have our environment. So

it's kind of like a hot new thing.

Awesome. So what I love about this business journey is just there's always something new. There's always this like,

something new. There's always this like, okay, uh, we're getting so good at just all this beautiful data for companies and now they need something completely different. Now we're setting up all

different. Now we're setting up all these virtual machines for them and all these different use cases.

>> And it feels like that's a big part of this industry you're in is just adapting to what labs are asking for.

>> Yeah. Yeah. So, I mean, I really do think that we are going to need to build a suite of products that reflect the million different ways that humans learn and like like for example, think about

becoming a great writer. You don't

become great by memorizing a bunch of grammar rules. You become great by

grammar rules. You become great by reading great books and you practice writing and you get feedback from your teachers and from the people who buy your books in the bookstore and leave reviews. And you notice what works and

reviews. And you notice what works and what doesn't. And you develop taste by

what doesn't. And you develop taste by being exposed to all these masterpieces and also just terrible writing. So you

learn through this endless cycle of practice and reflection. And each type of learning that you have again like these are all very very different methods of learning to become a great

writer. So just in the same way that

writer. So just in the same way that there a thousand different ways that the great writer becomes great I think there's going to be a thousand different ways that a need need to learn. It's so

interesting this just ends up being like just like humans in so many ways. It

makes sense cuz in a sense neural networks deep learning is modeled after how humans have learned and how our brains operate. But it's interesting

brains operate. But it's interesting just to make them smarter. It's how do we come closer to how humans learn more and more.

>> Yeah. It's almost like maybe the end goal is just throwing you into the environment and >> just seeing how you evolve. Um but

within that within that evolution there's all these different sub learning mechanisms. >> Yeah. Which is kind of what we're doing

>> Yeah. Which is kind of what we're doing now. So that's really interesting. This

now. So that's really interesting. This

might be the last step of until we hit AGI. Along these lines, something that's

AGI. Along these lines, something that's really unique to Serge that I've learned is you guys have your own research team, which I think is pretty rare. um talk

about just why that's something you guys have invested in and what has come out of that investment.

>> Yeah, so I think that stems from my own background. Like my own background is as

background. Like my own background is as a researcher and so I've always cared fundamentally about pushing the industry

and pushing the research community and not just about revenue. And so I think what a research team does is a couple different things. So we almost have two

different things. So we almost have two types of researchers at our company. One

is our forward deployed researchers who are often working handinhand with our customers to help them understand their models. So we will work very closely

models. So we will work very closely with our customers to help them understand okay this is where your model is today. This is where you're lagging

is today. This is where you're lagging behind all the competitors. these are

some ways that you could be improving in the future given given your goals and we're going to design these data sets, these evaluation methods, these training techniques to make your models better.

So this like very very notion this very very um kind of collaborative notion of working with our customers like being researchers themselves just a little bit more focused on the data side and working handin hand with them to to do

whatever it takes to to make them the best. And then we also have our internal

best. And then we also have our internal researchers. So our internal researchers

researchers. So our internal researchers are focused on slightly different things. So they are focused on building

things. So they are focused on building better benchmarks and better lead leaderboards. So I talked a lot about

leaderboards. So I talked a lot about how I worry that the leaderboards and benchmarks out there today are steering models in the wrong direction. So yeah,

so the question is how do we how do we fix that? And so that's what our

fix that? And so that's what our research team is focused on really really heavily on really f focused really heavily on right now. So they're

working a lot on that and they're also working on these other things like okay we need to train our own models to see what types of data performs uh performs the best what types of people perform the best and so they are also working on

all these kind of training techniques and evaluation of our own data sets to improve um improve our our data operations and the internal data products that we have that determine

what what makes something good quality.

It's such a cool thing because I don't think like basically the labs have researchers helping them advance AI. Uh

I imagine it's pretty rare for a company like yours to have researchers actually doing primary research on AI.

>> Yeah. Yeah. I think it's just because it's something I've fundamentally always cared about. Like I often think about us

cared about. Like I often think about us more like a research lab than a startup because that is my goal. Like like it's kind of funny but I've always said I

would rather be Terrence Tao than than Warren Buffett. So that notion of

Warren Buffett. So that notion of creating research that pushes the frontier forward and not just getting some evaluation like that that's always been what drives me

>> and it's worked out. That's the

beautiful thing about this. You

mentioned that you were hiring researchers. Is there anything there you

researchers. Is there anything there you want to share folks you're looking for?

>> So we look for people who are just fundamentally interested in data all day. So types of people who could

day. So types of people who could literally spend 10 hours digging through a data set and playing around with models and thinking okay yeah this is

where I think the model is failing this is the kind of a behavior you want the model to have instead and just this aspect of being very very hands-on and thinking about the the qualitative aspects of models and not just the

quantitative parts so again it's like this aspect of being hands-on with data and not just caring about these kind of abstract algorithms. >> Awesome. I want to ask a couple broad AI

>> Awesome. I want to ask a couple broad AI kind of market questions. What else do you think is coming in the next couple years that people are maybe not thinking enough about or not expecting in terms of where AI is heading? What's going to

matter? I think one of the things that's

matter? I think one of the things that's going to happen in the next few years is that the models are actually going to become increasingly differentiated because of the personalities and

behaviors that the different labs have and the kind of objective functions that they are optimizing their models for. Like I

think it's one thing I didn't appreciate a year or so ago. Like a year or so ago, I thought that all of the AI models would essentially become very very commoditized. They would all behave like

commoditized. They would all behave like each other. And sure, one of them might

each other. And sure, one of them might be slightly more intelligent in one way today, but sure the other ones would catch up in the next few months. But I

think over the past year, I've realized that the values that the companies have

will shape the the model. So let let me give an example. So, I was asking Claude to help me draft an email the other day and it went through 30 different

versions and after 30 minutes, yeah, I think it really crafted me the perfect email and I sent it. But then I realized I spent 30 minutes doing something that didn't matter at all. Like, sure, now I got the perfect email, but I spent 30

minutes doing something I wouldn't have worried at all before. And this email probably didn't even move the needle on anything anyways. So, I think there's a

anything anyways. So, I think there's a deep question here, which is if you could choose the perfect model behavior, which model would you want? Do you want a model that says, "You're absolutely right. There are definitely 20 more ways

right. There are definitely 20 more ways to improve this email and it continues for 50 more iterations and it sucks up all your time and engagement." Or do you want a model that's optimizing for your time and productivity and just says,

"No, you need to stop. Your email's

great. Just send it and move on with your day."

your day." And again like again just because like in the same way that there's like a kind like a fork in a road between how you could choose how your model behaves for this question. It's like for every other

this question. It's like for every other question that models have the kind of behavior that you want will fundamentally affect it. It's almost

like in the same way that when Google builds a search engine, it's very very different from how Facebook would build a search engine, which is very very different from how Apple would build a search engine. like they all have their

search engine. like they all have their own principles and values and things that they're trying to achieve in the world that shape all the products that they're going to build and in the same way

I think all the allms will start behaving very very differently too.

>> That is incredibly interesting. You

already see that with Grock. It's got

like a very different personality and a very different approach to answering questions. And so what I'm hearing is

questions. And so what I'm hearing is you're going to see more of of this differentiation.

>> Yep.

>> Kind of another question along these lines. What do you think is most

lines. What do you think is most underhyped in AI that you think maybe people aren't talking enough about that is really cool and what do you think is overhyped? So I think one of the things

overhyped? So I think one of the things that was underhyped is the builtin products that all of the chatbots are going to start having. Like

I've always been a huge fan of college artifacts and I think it just works really really well. And actually the other day, I don't know if it's a new feature or not, but it asked me to help me create a uh like an email and then it

just create so it didn't quite work because it it didn't allow me to send the email, but what it created instead was like a little I don't call it like a little box where I could click on it and

it would just text someone this message.

And I think that concept of taking artifacts to the next level where you just have these like mini apps, mini UIs within the chopouts themselves, I I feel like people aren't talking enough

about that. So I think that's one

about that. So I think that's one underhyped area. And in terms of

underhyped area. And in terms of overhyped areas, I definitely think that vibe coding is overhyped. I think people

don't realize how much it's going to make their systems unmaintainable in the long term if they simply dump this code into their code bases if it seems to

work out right now. So I uh kind yeah kind of kind of worry about future coding. It's just going to keep on

coding. It's just going to keep on happening. These are amazing answers on

happening. These are amazing answers on that on that first uh point. This

something I actually asked I had the chief product officer of Anthropic and OpenAI Kevin Wheel and Mike Greger on the podcast and I asked him just like as a product team like you have this gigab brain intelligence how long do you even

need product teams you think this is this AI will just create the product for you here's what I want it's like it's like the next level vibe coding it's just just like tell it here's what I want and it's just building the product and involving the

product as you're using it and it feels like that's what you're describing is where we might be heading >> yeah yeah I think there's a very very powerful notion where it helps people just achieve their ideas in a in

a much way.

>> Something we haven't gotten into that I think is really interesting is just the story of how you got to starting Surge.

You had uh you have a really unique background. I always think about these

background. I always think about these Brian Brian uh Armstrong the founder of Coinbase once had gave this talk that has really stuck with me where he kind of talked about how his very unique background allowed him to start

Coinbase. you had like a economics

Coinbase. you had like a economics background, he had a cryptography experience and then he was an engineer and it's got this like the perfect ven diagram for starting Coinbase and I feel like you have a very similar story with

Serge talk about that your background there and how you led how that led to Serge going way back I was always fascinated by math and language when I was a kid like I went to MIT because

it's obviously one of the best places for math and CS but also because it's the homony chsky my dream in school was actually to find some underlying theory connecting all these different fields.

And then I became a researcher at Google and Facebook and Twitter. And I just kept running into the same problem over and over again. It was impossible to get the data that we needed to train our

models. So I was always a huge believer

models. So I was always a huge believer in the need for high quality data. And

then GB3 came out in 2020 and I realized that yeah, if we wanted to take things to the next level and build models that could code and use tools and tell jokes and write poetry and solve the rebound hypothesis and cure cancer, then yeah,

we were going to need a completely new solution. Like the thing that always

solution. Like the thing that always drove me crazy when I was at all these companies was we had the full power of the human mind in front of us and all the data students out there were focused on really simple things like image

labeling. So I wanted to build something

labeling. So I wanted to build something focused on all these advanced complex use cases instead that would really help us build an extra generation models. So

yeah, I think my background in kind of cross math and computer science and linguistics really really informed what I always wanted to do. And so I started Serge a month later with with our one

mission to basically build the use cases that I thought were going to be needed to push the frontier of AI.

>> And you said a month later. A month

later after what?

>> After a GB3 launch in >> Oh, okay. Wow. Okay. Yeah, a great decision.

>> What uh what just kind of drives you at this point of other than just the epic success you're having? What keeps you motivated to keep building this and and you know building something in this space?

>> I think I'm a scientist at heart. I

always thought I was going to become this math or CS professor and work on trying to understand a universe and language and the nature of communication. Like it's kind of funny,

communication. Like it's kind of funny, but I always had this fanciful dream where if aliens ever came to visit Earth and we need to figure out how to communicate with them, I wanted to be the one to go with Gaul and I'd use all

this fancy math and computer science and linguistics to decipher it. So even

today, what I love doing most is every time a new model is released, we'll actually do a really deep dive into the model itself. I'll play around with it.

model itself. I'll play around with it.

I'll run evals. I'll compare where it's improved, where it's regressed. I'll

create this really deep dive analysis that we send our customers. And it it's actually kind of funny because a lot of times we'll say it's from a data science team, but often it's actually just for me. And I think I could do this all day.

me. And I think I could do this all day.

Like I have a very hard time being in meetings all day. I'm terrible at sales.

I'm terrible at doing the typical CEO things that people expect you to do. But

I love writing these analyses. I love

jamming with a research team about what they're saying. Sometimes I'll be like

they're saying. Sometimes I'll be like up until 3 a.m. just just talk just talking on a phone somebody on a research team and digging tree model. So

I above that I still get to be really hands-on working on the data and the and the science all day. And I think what drives me is that I want Serge to play this critical role in the future of AI, which I think is also the future of

humanity. Like we have these really

humanity. Like we have these really unique perspectives on data and language and quality and how to measure all this and how to ensure it's all going on the right path. And I think we're uniquely

right path. And I think we're uniquely unconstrained by all of these influences that can sometimes steer companies in a negative direction. Like what I was

negative direction. Like what I was saying earlier, we built Serge a lot more like a research lab than a typical startup. So we care about curiosity and

startup. So we care about curiosity and long-term incentives and intellectual rigor and we don't care as much about quarterly metrics and what's going to look good in a board deck. And so my

goal is to take all these unique things about us as a company and use that to make sure that we're shaping AI in a way that really beneficial for species in the long term. What I'm realizing in this conversation is just how much

influence you have and companies like yours have on where AI heads. The fact

that you help labs understand where they have gaps and where they need to improve and it's not just you know everyone looks at just like the heads of open AI and all these companies as they're the

ones ushering in AI but what I'm hearing here is you you have a lot of influence on where things head to.

>> Yeah. Yeah, I think there's this really powerful ecosystem where honestly people just don't know where models are headed

and how they want to shape them yet and how they want humanity to kind of play a role in in the future of all this. And

so I think there's a lot of opportunity to just continue shaping this discussion.

>> Along that thread, I know you have a very strong thesis on just why this work matters to humanity and why this is so important. talk about that.

important. talk about that.

>> I'll get a bit philosophical here, but I think the question is always is a bit philosophical. So, so bear with me. So,

philosophical. So, so bear with me. So,

the most straightforward way of thinking about what we do is we train and evaluate AI. But there's a deeper

evaluate AI. But there's a deeper mission that I often think about which is helping our customers think about their dream objective functions. Like,

yeah, what kind of model do they want their model to be? And once we help them do that, we'll help them train their model to reach that northstar. We'll

help them measure that progress. But

it's really hard because objective functions are really rich and complex.

It's kind of like the difference between having a kid and asking them, "Okay, what test do you want to pass? Do you

want them to get a high score or an SAT and write a really good college essay?"

Like that's a simplistic version versus what kind of person do you want them to grow up to be? Will you be happy if they're happy no matter what they do? Or

are you hoping they'll go to a good school and be financially successful?

And again, if you take that notion, it's like, okay, how do you define happiness?

How do how do you measure whether they're happy? How do you measure

they're happy? How do you measure whether they're financially successful?

like it's a lot harder than simply measuring whether or not you're getting a high score in SAT. And what we're doing is we want to help our customers reach again their their dream stars and

figure out how to measure them.

And so I I get I talked about this example of what you want models to do when you're asking them to write 50 different

email iterations. Do you just continue

email iterations. Do you just continue them for 50 more or do you just say no just just move on with the day because this is perfect enough?

And the broader question is are we building these systems that actually advance humanity? And so how do we build a data

humanity? And so how do we build a data the data sets to train towards that and measure it? Are we optimizing for all

measure it? Are we optimizing for all these wrong things just systems that suck up more and more of our time and make us lazier and lazier?

And yeah, I think it's really relevant to what we do because it's very hard and difficult to measure and define whether something is generally advancing humanity. It's very easy to measure all

humanity. It's very easy to measure all these proxies instead like clicks and likes. But I think that's why our work

likes. But I think that's why our work is so interesting. We want to work the hard important metrics that require the hardest types of data and not not just the easy ones. So I think one of the things I often say is you you are your

objective function. So we want to reach

objective function. So we want to reach complex objective functions and not these simplistic proxies. And our job is to figure out how to get the data to match this. So yeah, we want data. We

match this. So yeah, we want data. We

want metrics that measure whether AI is making our life richer. We want to train our systems this way. And we want tools that make us more curious and more creative, not just lazier. And it's hard because yeah, humans are kind of

inherently lazy. So AI stop radios are

inherently lazy. So AI stop radios are the easiest way to get engagement, make all your metrics go up. So I think this question about choosing the right objective functions and making sure that we're optimizing towards them and not

just these easy proxies is really, really important to our future.

>> Wow. I love how what you're sharing here gives you so much more appreciation of the nuances of building AI, training AI, the work that you're doing. You know,

from the outside, people could just look at Zurge and companies in the space of, okay, well, they're just creating all this data feeding into AI, but clearly there's so much to this that uh people

don't realize. And uh I love knowing

don't realize. And uh I love knowing that you're at the head of this, that someone like you is thinking through this so deeply. Maybe one more question.

Is there something you wish you'd known before you started Surge? A lot of people start companies, they don't know what they're getting into. Is there

something you wish you could tell your earlier self?

>> Yeah. So, I definitely wish I known that you could build a company by being heads down and doing great research and simply building something amazing and not by constantly tweeting and hyping and fundraising. It's kind of funny, but I

fundraising. It's kind of funny, but I never thought I wanted to start a company. Like, I love doing research and

company. Like, I love doing research and I was actually always a huge fan of Deep Mine because they were this amazing research company that got bought and still managed to keep on doing amazing science. But I always thought that there

science. But I always thought that there was they were this magical ILR unicorn.

So I thought if I started a company, I'd have to become a business person looking at financials all day and building being in meetings all day and doing all this stuff that sounded incredibly boring and I always hated. So I think it's crazy

that didn't end up being true at all.

Like I'm still in the weeds in the data every day. And I love it. Like I love

every day. And I love it. Like I love that I get to do all these analyses and talk to researchers and it's basically applied research where we're building all these amazing data systems that really push the frontier of AI. So yeah,

I wish I knew that you don't need to spend all your time fundraising. You

don't need to constantly generate hype.

You don't need to become someone you're not. You can actually build a successful

not. You can actually build a successful company by simply building something so good that it cut through all that noise.

And I think if I known this was possible, I would have started even sooner. So I I hope she is such an

sooner. So I I hope she is such an amazing uh place to end. I feel like this is exactly what founders need to hear. And I think this conversation is

hear. And I think this conversation is going to inspire a lot of founders uh and especially a lot of founders that want to do things in a different way.

Before we get to our very exciting lightning round, is there anything else you wanted to share? Anything else you want to leave our listeners with? We

covered a lot of ground. It's totally

okay to say no as well. I think the thing I would end with is I think a lot of people think of data labeling as really simplistic work like labeling cat photos and drawing bound marks around

cars. And so I've actually always hated

cars. And so I've actually always hated word data labeling because it just paints this very simplistic picture when I think what we're doing is completely different.

Like I think a lot about what we're doing as a lot more like raising a child. You don't just feed a child

child. You don't just feed a child information. You're teaching them values

information. You're teaching them values and creativity and what's beautiful and these infinite subtle things about what makes somebody a good person. And that's

what we're doing for AI. So I yeah I just often think about what we're doing as almost like

the future of humanity or how how are we raising humanity's children. Uh so I'll leave it at that.

>> Wow. I love just how much philosophy there is in this whole conversation that I was not expecting. With that, Edwin, we've reached our very exciting lightning round. I've got five questions

lightning round. I've got five questions for you. Are you ready?

for you. Are you ready?

>> Yep. Let's go. Here we go. What are two or three books that you find yourself recommending most to other people?

>> Yes. So, three books I often recommend are first, Story of Your Life by Tai Cheng. It's my all-time favorite short

Cheng. It's my all-time favorite short story. And it's about a linguist

story. And it's about a linguist learning an alien language. And I

obviously reread it every couple years.

>> And that's what the Interstellar was about. Is that is that

about. Is that is that >> Yeah. So, there's a movie called

>> Yeah. So, there's a movie called Arrival.

>> Arrival, >> which is which was based off of the story, which which I love as well.

>> Great. Okay, keep going.

>> And then second, myth of Sisphus by Kimu. I actually can't really explain

Kimu. I actually can't really explain why I love this, but I always find the final chapter somehow really inspiring.

And then third, Letobo Dearo by Douglas Hoffsider. And so I think Gerbach is his

Hoffsider. And so I think Gerbach is his for is his more famous book, but I've actually always loved this one better.

It basically takes a single French poem and translates it 89 different ways and discusses all the motivations behind each translation. And so I've always

each translation. And so I've always loved the way embodies this idea that translation isn't this robotic thing that you do. Instead, there's a million different ways to think about what makes a high quality translation, which mimics a lot of ways I think about data and

quality in LMS. >> All these resonate so deeply with the way with all the things we've been talking about, especially that first one, if that was your goal after school is like, I want to help translate uh alien language. Uh I'm not surprised you

alien language. Uh I'm not surprised you love that short story. Next question. Do

you have a favorite recent movie or TV show you've really enjoyed?

>> One of my new all-time favorite TV shows is something I found recently. It's

called Travelers. It's basically about a group of travelers from the future who are sent back in time to prevent the apocalypse. So, I just really like

apocalypse. So, I just really like science fiction. And then I actually

science fiction. And then I actually just rewatched Contact, which is all one of my all-time favorite movies. So,

yeah. I think one of the things you'll notice about me is that yeah, I love any kind of book or film that involves scientists suffering deciphering alien communication.

>> Again, just this dream I always had as a kid.

>> That's so funny. I love that. Okay. Is

there a product you recently discovered that you really love?

So, it's funny, but I was in SF earlier this week and I finally took a Whimo for the first time. Honestly, it was it was magical and it really felt like a living future.

>> Yeah. It's like the thing that you can people hype it like crazy, but it always exceeds your expectations.

>> Deserves the hype. It was crazy.

>> Yeah. It's absurd. It's like holy moly.

Like if you're not an SF, you don't realize just how common these things are. They're just like all over the

are. They're just like all over the place. Just driverless cars constantly

place. Just driverless cars constantly going about and when you like go to an event at the end there's just like all these Whimos lined up picking people up.

>> Yep.

>> Yeah. Whimo, good job. Good job over there. Uh, do you have a favorite life

there. Uh, do you have a favorite life motto that you find yourself coming back to in work or in life?

>> So, I think I mentioned this idea that founders should build a company that only they could build almost like it's this destiny that their entire life and experiences and interests shape them towards. And so, I think that principle

towards. And so, I think that principle applies pretty broadly, not just the founders, but the people creating a thing.

>> Well, let me follow that thread to unlightening this answer. Uh, do you have any advice for how to build those sorts of experiences that help lead to that? Is it, you know, follow things that are interesting to

you? Cuz, you know, it's easy to say

you? Cuz, you know, it's easy to say that it's hard to actually acquire these really unique sets of experiences that allow you to create something really important. Yeah. So, I think it would

important. Yeah. So, I think it would always be to really follow your interests and do what you love.

And it's almost like a lot of decisions I make about Surge, like I think one of the things that I didn't think about a couple years ago, but then someone said it to me, it's that companies in a sense

are an embodiment of their CEO. And and

it's kind of funny. I hadn't thought about that because I never quite knew what a CEO did. I always thought a CEO was kind of generic and it's like, okay, you're just doing whatever your VPs and your board and whatever tell you to do

and you're just saying yes to decisions.

But instead it's this idea where when I think about certain big hard decisions we have to make I don't think what would company do I don't think what metrics are we trying to optimize I just think

what do I personally care about like what are my values and what do I want to see happen in the world and so I think following that idea about okay so ask yourself what are what are the values

you care about what are things you're trying to shape and not what will look good on a dashboard I I think that results were pretty important.

>> Uh I love how just you're just full of endless beautiful and very deep answers.

Final question. Something that you were quite you got quite famous for before starting Surge is you built this uh map at Twitter while you were at Twitter that showed the uh a map of the world

and how and what people called whether they called it soda or pop. I don't know if it was called soda or pop. What was

the name of this map?

>> Yeah, it was like the soda versus pop data set or soda versus pop map. And so

it's like a map of the United States and it tells you where people say pop versus soda. So uh do you say soda or pop?

soda. So uh do you say soda or pop?

>> So I say I say soda. I'm a soda person.

>> Okay. And is that just like that's the right answer or it's like it whatever you are it's totally fine.

>> I think I'll look at you a little bit funny. You say pop and I'll wonder where

funny. You say pop and I'll wonder where you came from. But I I won't I won't scorn you too much.

>> That's how I feel too. Edwin, this was incredible. This was such an awesome

incredible. This was such an awesome conversation. I learned so which I think

conversation. I learned so which I think we're going to help a lot of people start their own companies, help their companies become uh more aligned with with their values and and just building better things. Two final questions,

better things. Two final questions, where can folks find you online if they want to reach out? Uh what roles are you hiring for? How can listeners be useful

hiring for? How can listeners be useful to you? Yes, so I used to love writing a

to you? Yes, so I used to love writing a blog, but I haven't had time in the past few years, but I am starting to write again. So definitely check out a Serge

again. So definitely check out a Serge blog surgehq.ai/blog.

blog surgehq.ai/blog.

And yeah, hopefully I'll be writing a lot more there.

And I would say we're definitely always hiring. So for people who just love data

hiring. So for people who just love data and people who love this intersection of math and language and computer of computer science, definitely reach out.

Reach out anytime.

>> Awesome. And how can listeners be useful to you? Is it just I don't know. Yeah.

to you? Is it just I don't know. Yeah.

Is there anything there? Any asks?

>> So I would say definitely tell me blog topics you like me to write about.

>> Okay.

>> And then I'm always fascinated by all of these AI failures that happen in the real world. So whenever you come across a

world. So whenever you come across a really interesting failure that I think illustrates some deep question about how we want models to behave, like there's just so many different ways a model can respond

there. I just often times think there

there. I just often times think there just not a single right answer. And so

whenever there's one of these these these examples, I I just love seeing them.

>> You need to share these on your blog.

I'm also I would love to see these.

Edwin, thank you so much for being here.

>> Thank you.

>> Bye everyone.

>> Thank you so much for listening. If you

found this valuable, you can subscribe to the show on Apple Podcasts, Spotify, or your favorite podcast app. Also,

please consider giving us a rating or leaving a review as that really helps other listeners find the podcast. You

can find all past episodes or learn more about the show at lennispodcast.com.

See you in the next episode.

Loading...

Loading video analysis...