TLDW logo

Fireside Chat on AI Agents — Michele Catasta & Shunyu Yao

By The Curve

Summary

## Key takeaways - **Agents: LLM + Memory + Tools + While Loop**: An agent equals LLM plus memory plus tools plus a while loop. Memory retains trajectory history, tools enable external interaction like APIs or code execution, and the loop iterates since agents rarely succeed on the first try. [00:45], [01:54] - **Three Backgrounds Shape Agent Definitions**: Agent definitions vary by background: software engineering sees LLMs as probabilistic control flow like LangChain; language models add tools, reasoning, memory empirically; AI history emphasizes environment interaction, internal/external memory, action space, and planning. [02:43], [06:25] - **ReAct Adds Thinking as Internal Action**: ReAct treats thinking as a special internal action that changes the agent's context without affecting the external environment, like comments in code. This mirrors human-like internal state changes and motivates symmetry between internal and external environments. [11:09], [11:39] - **Internal Environment Powers LM Agents**: Unlike traditional agents focused on external environments like Atari or Go, LM agents have powerful internal environments where the agent autonomously decides what to write to memory. This human-like capacity for infinite thinking space differentiates them, as shown by o1. [06:39], [08:49] - **Human-in-Loop Beats Full Autonomy**: Fully autonomous agents are a pipe dream today; keeping humans in the loop like L2/L3 autonomous driving avoids compounding errors over long trajectories and provides better UX than systems running endlessly without progress. [14:47], [15:39] - **Agents Teach Coding via Reasoning Traces**: Users learn coding by reading agents' step-by-step reasoning traces, especially in debugging loops where the agent explains errors and fixes. This reverse imitation learning is more powerful than textbooks, like pair programming with a superior programmer. [20:19], [22:51]

Topics Covered

  • Agents Need Internal Environments
  • Thinking is Infinite Action Space
  • ReAct Adds Thinking to Agent Loops
  • Human-in-Loop Beats Full Autonomy
  • Agents Teach Coding via Traces

Full Transcript

all right uh why don't we start I can give a couple of definitions what people believe an agent is and then we start reefing on top of that okay so the I'm

going to go by memory so for pardon me if I forget something first one I'm going to use is from Arizon chase the founder of Lang chain and he says AI agents are software systems where large

large language models impact the control flow of the application so until before agents basically you have deter istic code most of the times unless you have crazy bugs and now when you build an

agent you have an llm which is fundamentally probabilistic and that LM can decide which's going to be the next action that the application takes and another definition that I like

that I think was floating around on X and I'm borrowing this from Wix is agun equal llm plus memory plus Tools Plus a y Loop and I hope I didn't forget

anything in that equation um I think I like both of them because they they describe like two different main aspects um of the problem on one hand is you you want to think of an agent as again a non

deterministic piece of software on the other hand a lot of people still think that an agent is purely doing call to an llm and figuring out how to make progress in reality is quite a lot more

than that um you need memory which is the concept of you want to make sure that every single action that the agent takes uh you actually don't discard everything that happened before in the trajectory

um you need tools because you want your agent to interact with the external word the external word could be anything you could be calling an API you could be running code and then see if there are errors in your code um it's basically

it's a way for the piece of intelligence which is the llm at the core of the agent is capable of interacting with external world and last but not least you want a while loop or like you want a

way to iterate because the the unfortunate through of building agents is that they never get things right at the first shot at least the vast majority of

times um and likely llms and the way the more advanced ways in which we're building agents allows you to make some level of progress at each step so that's why you want to keep attempting until

you know the until the outcome is successful So speaking of white Loops I think we got the best person today to talk about them because you work on react you work on a lot of different

architectures and probably reac is the I didn't I didn't recognize wild is associated with I just I kind of got it there you

go I actually kind of agree with both of your definitions I think uh uh I think in fact like different people definitely have different definitions and that's

because people come from different backgrounds course and I can I can tell like at least there are three very relevant backgrounds for how to defend AI agents so one background is like softw Engineers right yeah like soft

engineering From perspective of softare engineering like Harrison was a soft engineer so his very natural for him to have that definition right so in that perspective it's just another Cod base

but you just have calls to LM which is a very special new function codas and then you have to deal with all the problems associated with that function I think

that's my my reasonable view yeah uh I think another view come from like LP or you can say model right like uh like

from RN you know sck to sick to gpt1 gbt2 that kind of line of tradition right and and in that line of tradition the first thing that's uh agent like is

to use right you can imagine like two former that kind of stuff like you you have some external tool that can give you some external feedback no matter it's translator or culator that's the

first step ahead and then you have I don't know like reasoning and you have memory have all those piece lies and then start to group together into this big LM based system m where LM is a core

and then they have all those different things that's supposed to argument LM I think that's a very like language model oriented that's true it's also very

empirical and of Peace wise view so I think personally I like the third view most which is from the history of AI agents because that's the one was AR one

of the like longest history you know from 1950s people started to think about AI agents as like a you know humanik intelligence and we have like Ru based

systems like no assignment you know Ares we have agents we have all those different kind of agents and U I think a lot of things become simpler if we view

this wave of LM agents through the history of all the AI agents yeah and uh I think in that view uh

so the first important thing is environment like all the agents will have to have an environment inter right the agent give action to environment environment give feedback to the agent

that's the most economical view of agent and U like if you're a t AG your environment is are your environment is IDE I think the

second biggest concept is memory like you said like because we have contest window we have waste ofm we have all those information storing devices you

almost have an internal environment and um and that's like that's very different from traditional agents I always expl

maybe through the chat and then I think once you have those external environments and internal environments you have action space which concludes which includes like both external

actions like tools yeah like to call whatever you also have internal actions I call ring internal action basically you can imagine roning is like you have a working memory in your head and you

just try to app some information into that memory that's like interacting with intern I think I forgot got reasoning from that equation before no I think it was four elements so reasoning was one

of them yeah spot on yeah I think once you define the action space then you just need to Define what is called planning or decision- making which is how do you choose action out of the action space yeah so this is a very

abstract view so maybe it's not as relevant to practice but I think that's the more beautiful kind of way to think about this you have a like symmetry between external environment and internal environment and that's really

the motivation of the react paper that's coming out yeah you're right it seems like we've been working on this for 70 years now we just have a better piece of intelligence to put in the system I think from in the

history we have been always focusing on external environments right whether we solve atar whether we solvea whether we solve go but I think the internal environment part is kind of omitted

because the internal environment is usually just a new network or is a piece of code it's really not that complex you don't have a lot of autonomous over that

but only in this LM agent era like the agent can actually decide a lot about you know what I want to write into my own memory what I can do to change

myself that kind of autonomously so that makes the thing much more interesting I see so your point is the environment is becoming much more the internal environment is becoming much more powerful than it was before yeah it's

kind of like the difference between like a little rat versus human rat the r is just operating by instance it's kind of hard to argue you have a internal environment because the the degree of learning or degree of like what they can

write down to their memory might be limited compar to right but for us it's very interesting because we can think and think it's very special type of action you can think about anything and then doesn't really change the internal external environment change your

internal environment if you think about something angry it will change your internal State and then that might change a lot of things so I think that's something very human like in some sense

yeah and because we have system to thinking that previous models didn't have almost at all So speaking of given that you are open AI um it seems like o1

is probably like the first attempt at bringing that into modern llms yeah I think kind of shows my point right like thinking can be really L and

that's really uh I think there are two differences of thinking versus other actions right because for humans you can jump you can think you can you know punch punch someone you know all s

different actions but what's really special about thinking is twofold so first it doesn't really change the external word immediately it just change your internal state right it doesn't really change anything besides yourself

and second like you can think about anything it's a very huge SP you can think about a word you can think about a paragraph you can think about a passage you can think longer to get more like

decision making or power or capacity to to decide things right and I think o1 kind of shows that point basically the space to think is just infinite and is it can be OB long that's very different

from playing go for example because you only have 3601 actions but for thinking it's comori that you can think about infinite

things uh and really through this kind of long thinking you get to really Lage the power of that INF space yeah that's very true sorry I make

the conversation more philosophical no no I enjoy it yeah I enjoy it um do do we want to go back for a second like explaining uh how react Mar to that W

loop I was talking about just so everyone knows where we what are the fundamentals to build agents and then we we can take it from there yeah so uh the way I think about this is like all the

traditional agents like people have been building for deck is the basic Paradigm is like this right you have a environment you have an agent the agent will issue action to the

environment the environment will give you observation back M and essentially the agent maintains a text which is all the previous observations and actions

right so given this context the job of agent is to predict the action to to give to the environment next right so if you're playing go like the context is

just all the like turns and all the like like how the board look like across all the time and who takes what pieces right all of that information that's the context and then you're just trying to

based on that predict the next action which is where should I put the the next one yeah Stone right so um so if you think about this uh in like

implementation it just looks like a trajectory where you have observation action observation action and you're kind of intially just trying to condition all the previous stuff trying

to predict the action and then you kind of realize that's just very similar to what language model is doing because you have this appending context window

you're just trying to condition all the context to predict the next token or next action right and uh really what react

adds to this is just that uh you have this capacity to think which is a very special type of action uh it's almost like a comment in a code right because

it doesn't really change the function of the code it just kind of changes how you will write yeah just changes context

and maybe we we should debunk the fact that um it looks like there is a lot of hype around coding agents and I'm part of the problem in a

sense because that's what I've been building in the last few especially a few months R it uh but we should courageous for you to say that because yeah like at least I I admit my guilt um

and you you you also work on webench and you work on a lot of the building blocks that we are using today like to build coding agents but the truth is I wanted to start from this more generic

definition because by far that's not the only type of agent that can be built uh the reason why we do it is because ATMs are exceptionally good at writing and editing code I would say more than other

capabilities that they have and at the same time building a powerful coding agent is kind of like a hly Holy Grail to be like some level of intelligence but it doesn't mean that that's the only

application herea of agents and you know there are different tools that you can give access to an llm it doesn't have to be necessarily running your code it could be calling an external API or it could be like taking some specific

action in your in your software um and I see a lot of hot debates online where people try to argue if something is an agent or not like I hope the framework that we kind of went

through is a is a good starting point to understand if that's the case and like I think that one of the core features that needs to have is this reasoning by means of reactor like anything else um to

actually progress and reason what should be the next step in the plan I think I think it's really about like the degree of autonomy of the agent because you can even argue something that's very

deterministic and very like fixed right like Ru based agent like I always just take that like you have a rule to play go you can also call that an agent even

though it's not a very good Agent yeah but thing react is more about it's really you can argue one of the most autonomous form of agent because it really gets you choose what it thinks

and acts in a very free form with AKA like LM sampling there's literally kind of no constraint besides model itself you can argue like some other things that have more constraints to that U you

can also argue it's a form of agent but it's just less autonomous kind of agent that's how I think about this but but I think the the obviously you use abstraction because it's useful to some

degree if it's too far away from autonomous agent useful to think about those yeah I think there's always this tension between people like us that we

come from research background and we always hope to strike full autonomy as soon as possible because again that's that's kind of the mindset that we start with you want to try to hit you know the

most ambitious goal the truth is in that spectrum of autonomy there are a lot of interesting things that can be built where you are far less autonomous than than fully and what I'm building a rapid

is kind of a middle ground where you know we try to keep the the human in the loop as much as possible and the reason why this useful is first of all it leads to a completely different user

experience you're not building a system that maybe runs for Alpha day and sends a notification and if you didn't do any progress you're actually frustrated by it and you felt you spend a lot of money

for no reason because as a side note AI is very expensive today as as you all have realized um and at the same time it also allows you to correct what the

agent is doing so I think one of the main the main C of building agents is the fact that through a very long trajectory the compound probability of

the agent going uh the wrong way gets very high and having having a human in the loop that checks if the the work that is being done is correct or not is actually a very good trick to avoid that

the trajectory goes you think it's kind of similar to the situation of autonomous driving yeah yeah L2 versus L4 yeah I don't know exactly where we are now definitely we don't have L5 even

in the world of coding we don't have an L5 coding agent and I wouldn't know exactly where to put for instance rep agent today I somewhere like between L2

and L3 that's my I think what suby is that like driving is simpler problem in terms of goals specification because you have a you have two places and that's

specify the problem kind of yeah the planning is easy like like I think the problem definition is Easy A and B and then you go from A to B but yeah I think for coding maybe at the end of the the bottom neck is how do you define a

problem sometimes that's even harder than how to write the thing so therefore the bot like is almost always the product manager or whoever that defines

the product yeah so uh just one comment about the difference here no you're right like we analyze the prompts that people submit to to rapit agent and the distribution is very interesting there

are a lot of them are just one or two liners which totally makes sense um is there like a distribution of like what people use a for coding more is it front

end is it back end is it some kind of we we started as a full stock agent so usually people try to build entire web applications but the moment we made it also powerful just on building front

ends we're getting a lot of requests like that but also the other interesting Insight is it it looks like any other power distribution where there are a few

people that write very long prompts in markdown perfectly structured it looks like something I read from good PMS in companies and and and they hope that we could actually accomplish everything

that they right maybe that's just from the P yeah I think so but like it makes me feel you know the the the input prompt high expectations yeah exactly first so it's high expectations which

are are very hard to to set the hype is so big that you can always fall short from expectations but then at the same time there are again in the case of coding people that never learn how to do

it and don't want to learn how to write code and then when they actually see something running based exclusively on the natural language prom they use that's really magical and I I heard

feedback directly from them it's do you start to have some like heavy users that is like using all the time yeah I people tell me the after I think after three

weeks we launch it some one CL 100 hours working with the agent it's like a full is it like full job yeah yeah it's fulltime job but again think that the

fact that we didn't attempt to make it fully autonomous because I think today is a pipe dream to to make that happen this just much more practical yeah it's much more practical and you know you're

also you also have the capability of like reverting and then deciding okay the last two minutes were useless or I my prompt was too ambiguous for the agent to understand what I wanted to do so let me go back to the wideboard and

describe I think it would be interesting to look at a heavy user and see how the prompt evolved just to see like what point what has us to learn you know yeah how do they Lear tricks that is

effective to yeah it's kind of like a natural case of like a master class if they become very heavily using that yeah you're

right I've been asking this question a few times people ask me how do you train your users how to best prompt an agent and the truth is I absolutely have no clue right you just have to try it out

you just have to try it and I think we are we're teaching people at use agents in general by allowing them to try and then they figure out over time the limitations and

the capabilities of it and I feel this is a problem in general with anything AI today where even a chatbot is very limited in terms of user experience you you try it and then you figure out what

it does well and what it cannot do I know I know what original like motivation of Replay is like to make it easier for everybody to try coding or correct make it easier for everybody to

learn coding right like I guess I'm wondering like with those AI features now would it be easier for people to learn coding first of all I think the mission

given What's happen with AI moves from enabling them to learn how to write code efficiently to actually allowing them to create software regardless of the background where where they come from

and I found it very exciting like when I joined I thought the mission would have been going to what we have today but it we have taken I don't know like a decade and I tend to be very optimistic about

the ey like I've been doing for several years but I didn't expect the the evolution to move this fast um regardless the fact that now you for for

simple applications you don't really have to learn how to code I don't think that's the most important Mission we have at the moment like teaching how to write code and the interesting by

product is that agents can also explain what they're doing step by step if you surface the the reasoning tokens then a user will understand exactly why the

agent has made certain choices and I when we first started to work on that had this false assumption that people don't care about what the is doing under

theud actually terrible yeah like I I I I forc quite a lot the direction of the user experience to be very elegant and minimal and transparent and we were just saying okay we created a just care about

the final product with that is that's what I thought they actually look at all the TR and the Reon I I was wrong and there are quite a few people especially those that are more advanced or they

want to learn while using the agent they they keep telling us why don't you show us exactly every single step and why you made that choice and it's kind of Even in our UI like you can click a button

and see why what were the thinking tokens Beyond a certain action but I thought I guess people care about them but yeah I guess there's motivation for like both pros and Layman to both look

at a traffic right because for professionals you would by default not really trusting that technology because you will think this is definitely not as good as me right of course so you want

to look at how it works to make sure you know it's not doing something stupid or doing something Mis Alliant you want to like kind of uh oversee oversee the in

some right but for the for the LI you you probably are curious about what's going on you can probably learn from the reasoning the action that you yeah I think so because they're they're sitting

there so rather than staring at the screen and hoping for the best you want to invest your time in a in a useful way so they want to read what is going on and I maybe that's better way of to

learning coding or learning anything I think so just read the agent if the agent is already better than human in that job 100% especially on the debugging Loops where the agent explains

you okay we try to run this portion of code it broke in this way this is the exception that's how we fixed it this is almost like reverse imitation rning yeah the machine used to imitate humans exact

what the machines are doing which is scary but yeah it's probably like a very straightforward way to teach people and honestly that's how I learned how to write code many years ago so why not if

you have a very good programmer that shows shows you know how to debug how do you do every step how do you think think a lot yeah that's just much more powerful than any textbook yeah that that's why pair programming is so

powerful especially in company you want to sit next to someone who knows what they're doing and then you learn from them yeah so love to see more agent even the those that are not focused on coding

to go towards this kind of user experience I think there there is value to be gain from there for sure so I I'm also curious like uh have you seen any other

interesting startups or projects in this uh era of like AI coding is there any other like form of product or like that

that you think is really cool yeah I'm seeing a lot of startups focusing on specific verticals so pretty much the opposite approach that we have now R it

where we allow it to build bus yeah hopefully eventually will be anything uh but on the flip side there are startups that allow you to take one specific

function in a company and replace it or announce it with an agent I know let think of creating sales leads before you had had people that were doing that

manually scraping LinkedIn and funding a lot of different resources and then eventually putting everything into a spreadsheet part of that work now can be automated I don't think the the optimate

I at least for now they're not going to be capable of replacing what humans do one to one but I think it allows people to move much faster like a lot of dto

work is done by systems rather than by humans and then humans do the Last Mile and really bring the value in a and that's probably that's why we see a lot of startups especially around the year

where 10 people do an amazing amount of work that 5 years ago would have been unbelievable I I see very small startups with Incredible revenues and shipping on

on a weekly basis and then realize the team Beyond it is minus skill and I think AI is playing especially agents are playing a big big role in that I feel like what interesting thing I you

thinking about this is uh it feels like now this if you want to build a successful startup you have to be able to do two things right obviously you need to be familiar with the technology

the model but also you need to be familiar with like how or you have to think about how a and human should interact with each other because I think it's very long trivial agree problem

right because the naive approach will be to just do whatever the human job does but that's probably not the best way to

make human a collaborate especially if a cannot really replace that job 100% yet right uh I guess I guess one thing that

we was we chatting before is like take svage as example right like the live way of doing AI coding will be just to

replace human programmer by just replacing them with like a coding agent where you could take a very formally specified problem and then it could deliver a pro request yeah that sounds

like a very natural choice but I feel like in reality that's probably at least now is probably not the best way and

human can collaborate on right I agree I think we Ben showed us that you can take the work that Junior Engineers do in company and gradually replace it with

coding agents but it not only doesn't give us proof that you can build application 0 to one with an agent which which is actually the case as youve seen with RIT but also it doesn't allow us as

a field to make progress to Direction because like whenever there is a benchmark then everyone is obsessed to optimize the numbers there and then they

don't work actually on different you know use cases so yeah um I think we are 30 minutes now let do it

maybe we should do some cous yeah yeah I love that so um everyone feel free to raise your hand um but um uh first uh before like uh everyone's questions answered there's some

questions I have already prepared based on kind of uh people's submission in Discord um that I want to ask um so um

how is users who are not coders using rip different from like professional coders like software Engineers how is that their experience different or how are they using Li

differ people that have no background in engineering they exclusively use the agent as the entry point they only use natural language uh um and they have

high expectations of what the agent can deliver for them for those that eventually learn how to use also some other tools they tend to do all the edits again through AI

rather than typing code directly in the aor the key difference that we see with uh people that have a software engineering background is either they take the output of the agent bring it back into their local editor and keep

working on it because they need to do something that we currently cannot do on rabit or they open Itor in rap it and then do the last mile themselves so in a

sense they they both find it useful and I see even people that could actually build whatever our agent is building they still love to use it because it saves them a lot of time like personally

sure I can build a web up it's going to take me maybe 1 hour the agent takes two minutes I don't like to write baller plate code my time is valuable why would I do that so I think that's that that's the that's the key difference that we're

seeing uh I have a I have a question I love to hear your guys thoughts um about um goals for agent uh do you consider goals

for agent as an intrinsic thing or something that you know the environment with the end user like passi in and how like Dynamic is Agent to

go it's a great question do you want you want to take something so the question is how to think about goals in agents right like does agent have a goal right

and how how Dynamic is it how like what are what are the kind of the like yeah just curious do you think human have a goal it's a very philosophical is it something that's installed in your brain

or is it something that you make up for yourself I think human has a dynamic goal uh at any given moment there's some purpose it changes over time okay so

it's kind of like a temporary kind of reasoning saying in your your internal memory I guess correct if

I if I were to can uh model myself you know uh yeah yeah I think it's a very interesting conceptual question and I think it somehow depends on how you

train the model right or or in the human case how you update your own brain so if you are engaged in a very reinforcement

liking kind of activity right like you have a you have a goal and you keep practice to that it become a very close group thing then then it's very

different from you have a go and then you just do opened infuence and and you a very open way uh not sure if I address your

question can I just double sure I want to kind of push you a bit sure because you know there's there's a middle ground a fine Middle Ground between being a poetic humanist and being like an overly reductive technologist and wanting

everything to fit into the current paradigm MH and I hear him talking about what is the space between the current

you know rag based lmy approaches and any model of higher intentionality that we think exists at least temporarily in humans you know I left here for a minute because I had to grab a coffee that was

definitely guiding my behavior for at least at least 3 to five minutes there like how do you think through the gap between where we are now and more

capable yeah I think I think the ccept goal is only useful if you can learn from it right so uh like if you if you aim to get a coffee right and then

somehow you couldn't get it you can learn from it even if you get it you can somehow still learn from it uh I think the problem of to thism this agent is

that uh the goal seems to be more of a uh made up saying because it couldn't really learn from uh whether it achieved

the goal or not uh because it's a a inference time scene and you have this very clear separation between TR time and inference time that's very different from Human you don't have a training

time and INF time you always are in both modes um I think conceptually that's one of the biggest questions we have to think about like how to somehow engage

agents also in some some kind of training mode so that like whatever experience that has occurred is somehow useful for his own self-improvement right no it just doesn't improve right like you do

something and then you reset and it's the same thing so I kind related question to all these what are the things that we're seeing whether it's for reasoning or agents is that at the end of the day like you need to really

get traces they reasoning traces that domain specific or Regent planning increases that of need to be specific right it could just be weird today we should go collect this very hard

together doain specific data do you think we're stuck in that domain in that space for a few years you think eventually well generalize without it how you think of whether it's using

specific or each specific ches like that that is short to me yeah I mean I mean the traces are obviously very important right because uh so I have a follow-up paper of react called fire Act and the

idea is exactly you should generate a lot of traces and then you should train on that and that makes your model better but I think analogy I want to make about

this is like imagine the human world where every every day you wake up your memory resets you you don't really learn from yesterday however you live a lot of

traes on internet and then you can somehow train a better Human by training on all that traces um I

mean it could probably still work but it's probably not very good because uh leaving a lot of trats and then and then learn a big model together it's kind of like a very horizontal kind of thing

versus what you learn every day in your own life is a very vertical thing right so uh I'm not sure if doing one thing can replace the other thing totally like

maybe I think as a big research agenda we still need to figure out how to make language m not to reset everything when WX up or every stuck in the world

collecting specific jic but it's just apparently very hard if you have an API that you call and you have a separation of train time and test time I guess it's very

hard people have been doing research on continual learning for years and it's still not nearly remote to get in production for large models I think just very hard to skill it up yeah yeah yeah

that's the problem yeah I was going to ask about your uh predictions for the future so you mentioned being surprised by how fast things have gone in the last few years

and that would take 10 years to where you are now I'm curious for both of you like what you think things will look like in a [Music] year like what will be the impressive

things that as agents can do a year from now that they can't currently do I I want to see how the combination of scaling inference time compute and

agent is going to look like I think one shows that it's in the real of feasibility but it's not a model that is easy to use in production today it doesn't have function calling it doesn't have system prompts it doesn't have a

lot of belts and whistles required to build agents around it that's what I'm excited in to see in 2025 I know it's coming I know that several people are working on that um it's going to be it's

going to force us to rethink how we build models sorry how we build agents because most of what we are doing right now we singular calls with limited amount of reasoning and then we were

basically connecting them together by means of calling tools or doing act or um other like fancy approaches and I want to see how we're going to be building agents where a model is allowed

to reason for like several minutes I think it's going to be exciting but I cannot tell exactly what it's going to unlock maybe you know more because you see in action

internally I think it's going to be called our CEO is I think agan will be big in5 uh I I I think like on High level

like people have been doing imitation learning for many years right like you can argue preing and and and sft and even most of rhf is some kind of

imitation right rhf is just imitation with a weight kind of yeah like like a is like really a new era of reinforcement and you can argue

that's fundamentally different for for agents so I I think there'll be something exciting coming out uh but I think still the biggest question is uh I think there's somehow two questions

right one is a question of intelligence or capacity or capability and the second question is like how good it is to to find the pro market fit or to find the

real application right how useful it is for for your application right I mean like take an Tropics computer use as example I mean the mo like just the

model itself right it's probably very hard to imagine like what's the use coming out of it but for example you guys have this demo I think somehow

building better models and finding good ways to use model are kind of equally important if the ladder is not more important than this model it seems the ladder is really B in my opinion I agree

like we can already do more with current models somehow yeah we could stop model development today and I think we're going to have another few years of building on top of them in in a Smart

Way um I know I know we're not going to stop here so I think for people are building ages is kind of fanatic as much as it is it the laps where their training models um the other thing I

expect from one style in 2025 is perhaps higher reliability per step uh in the sense that we talking about traces you could think of taking a previous Trace

that your agent has created where it's solved a specific bug and now you put that tracing context you allow a model like a one to reason for long and I do expect the the failure rate of the

single step to become much smaller that's probably one of my in my wishful thinking uh for 2025 that is going to

happen um I have a question about kind of how it since we're on the topic um so earlier um we had the founder of cursor

come here give a talk and demo um cursor and um we asked him a lot of um like Mar can user experience question what he essentially said is that he is building

cursor for himself like when he's building cursor he's like um the the the fact that a lot of uh people who are not Engineers like to use it is just a nice side effect that come from like then

building for themselves so I'm curious for ripet who are you guys building to with like do you guys have a target audience do you have a core segment of people you're after yeah we're not building for us and I think that that's

a very hard exercise to do when you're at developer and work on something looks like a developer tool our Target is what we call citizen developers so people

with an eagerness to build software but without the necessary background you you can put them on a scale like some people are good enough at understanding maybe an error message in the console and then

back in the days they will copy paste it into Char GPD and as the model how you they will fix that and some others maybe are able to add this most n code but our

nor St is to go down the Spectrum to people that don't even want to look at code um so it's the basically it's compliment the complimentary user base compared to what csor is targeting which

is great because I think we need more powerful tools to make today's developers more efficient but at the same time I want to see the quent of

what say Kaa did for design I want to see the same happening for software like I I don't know how to draw I'm a terrible creative person if I going count back and do something decent to

sign a birdday card I want the same thing to happen with with software and that's that's what we're targeting I have a more specific

question sure uh consider like this very ambitious sort of ebil uh you give an agent a giant Fleet of gpus and a long time like a month and you say here's a

bunch of papers I want you to replicate all of them uh get back to me when you're done and show me the papers you as a result uh when will it be able to

do that are you thinking like a year 10 years 5 years 100 years like how good do you want the paper to be because there are a lot of just replicate just replicate not like fancy new science

just like you know do the due diligence replicate the code base necessary to implement that paper we Implement all these depends on the paper and I know

it's some unsatisfactory answer I mean you'd be surprised there are a lot of deep learning papers especially like pre-ms how solid the repo

is they take like all the papers from like 2023 or something and Europe or something like that okay and then be like dispatch maybe the problem is that API is not available anymore they're

using 3.5 and getting the computer to question about agentic environments how do you see that space evolving because obiously like a ceration of

different environments right now people you know AR using the same ones you think there's a need to standardize it around like one large flexible OSS environment you think it's be like

multiple comp ones that say I I I personally think like building agent infra is extremely important and I can argue the interface of the

environment is one of the most important infra and and I think there are couple different dimensions that are important so one Dimension is like when you're

user facing obviously how the interface makes AI human collaboration good that's the design part but I think uh also there's this skillability and

reliability part right like how how many agents can you run together in this interface how reliable is it I think U

we're way underinvested in that because right now I guess AG isn't a very high volume SC like everybody is just in the demo mode but

if it becomes very successful and if it becomes very like high quality uh we need better in infa for this right now it's like the just car is

just invented so you don't need very good roles whatever Ro can can hold those cars but if you have like 10 million cars very soon you need very good high quality Road and and that's

very long job but you don't see any projects trying to build toward it everyone's kind doing their own thing and no one's worrying about Foundation I I think I think infra right like if you

want to be very high quality very reliable very skillable infra maybe there won't be so many companies that's going to be good at I I don't know kind

of like browser or some other infa kind ofu um another question um thank you for bringing up the the infa agent I think there's a lot of actions recently like

browser base like um strikes U like Financial API like like I think I consider them all under the infra agent infra space um another thing I want to

ask is that what do you guys think about price in regard to AI agents this is something that also come up during the cursor talk like because right now um the flat price like like everyone's ad

using the flat price but obviously that's not going to work longterm because um some of the users are much heavier users they're rcking up like hundreds of do thousands in bills so how do you guys consider

pricing yeah flat price is a limitation I totally agree and it also has another issue with value capture in the sense that if you're providing a lot of value to the user to the extent that you're

saving them time and we are charging them proportionally to the amount of say tokens and CPU computer you're using usually there is a huge gap between how much you're charging them versus how

worth it is what your agent has created I don't think anyone out there has crack this problem correctly uh it's probably a bit easier for coding especially tools

like cursor where you're targeting developers they know exactly how AI work they know how much a token cost so you can do something proportional uh but when you're building an agent that solves I

know like a problem that you have in a company writing very hard to Value it's very very hard to value that so then you should Anor the value to how much you will be paying that individual in the

company and maybe scale it down because it's less reliable um but I've seen any kind of pricing model out there today so we're we're inventing the car as we speak

right now we don't even know you know I how much like the registration of the car cost that's where we stand I really think that's one of the promises of doing startups because like if you're targeting a vertical you can build a

very good price Model sorry business model that has captur value right for example like I assume you talking about

something like subscription based $20 $20 per mon or U or like like it's kind of like each user is the same price or

each token is the same price um but like arguably if you apply the same for a legal case the same amount of token might C like might give like a $2,000

value versus if you're just chat like $1 value so whenever startup I can capture those value Gap you Downstream think that's a huge opportunity and that's why

one reason I think Foundation model company cannot really like capture all the Val are wait I'm actually going to follow up on that so so are you saying

that the application startup that um either build really good uh vertical specific AI agents or somehow findan product Market there are the ones that's going to capture most of values not I'm

not saying it's going to capture most of value but I think that's one reason like there's a lot of opportunties yeah I think there's a lot of value across the world the fixation that only a few Labs will capture the

entire value of AI sounds a bit dystopian to me and also more of that vertical Direction but Foundation model can capture like more

Dimensions kind of yeah totally agree I a question about accelerate agent learning uh so my company is need and we

working in the cancer space and I'm curious like um like you to think about like bootstrap agent in a particular domain what are the leverage you can pull to accelerate learning

speed of the agent what do you mean by learning learning about the domain of knowledge and representations and PN like by

learning do you mean like fine tuning the model or improving the prompt or something else I think in multiple Dimensions like uh uh I mean I think

some of the ways that we are like like my team think about learning is uh there is like immun learning that you can dynamicly kind of update your prom you

can collect examples as you go and to refer to them back you know in a real time fashion and they're more like longterm like uh fine tuning or you know

maybe even like pre-training some Curious all your thoughts or like what are the levels I like one of the slides that they use at their developer day where

you see that you know you can start from in context learning rug fine tuning and each one of them brings you like an additional uh Delta in terms of performance I will start by building

within context learning because it takes practically no effort and then if you feel that your agent is not reliable enough that's when you start to put additional investment in doing rug or in

fine tuning uh but I don't think there is a one one standardized recipe that tells you exactly how to do it today I've seen a lot of people that are doing

specialized agents that require purely context learning what we do at rap is purely in context learning almost right now we have we have done experiments with a lot of f tuning on Frontier

models but the truth is the real Alpha right now is not there it's been building infrastructure 100% one comment about that right like

it's it's equally important to learn like improve the system versus improve the problem improve yourself right like if it's not a good problem then change a

problem maybe that's the most important learning do you have time for one well I guess maybe two last questions U so it'll be just you guys I'll be quick I

want to ask you a success question which is in the short rout like 6 to 12 months what are big agent wins you can see happening and is it very narrow to like specific benchmarks where like hey like

um you know some of the SW Asian bench stuff does really well or there things that can be done without Benchmark you think would be the landmark I I think benchmarks will

definitely keep increasing numbers that's for sure but I won't count that as a big win I will count a big win if it's actually deployed and bring value to to the humans okay so some company

using it generating Revenue where there's efficiency or new process where you they talk about it and you say wow it's amazing right I kind of big if it actually creates a lot of good real application and do you know if anybody

is trying right now or is everyone just still a very early experimentation phase where we have clear site like one of those L I know I know some that are

profitable okay so it's just a matter of scaling andoun say hey we've done we've done this it's a matter of how you do pricing if it's value based then it's relatively easy to make it margin positive if you play the rest to the

bottom of pricing it proportional to your tokens that's a harder company to scale sorry similar questions um you dodg the last last sort of prediction

question a little bit no no problem but since the last question I do want to push you to say if you have one prediction in the next let's say 12 to

18 months about which domains other than pure internet search other than pure online online search agents Prov really useful where from your advantage do you

see the agent outside utility next it's a tough one besides besides web navigation and besides coding yeah besides the thing where just pure

internet land right what do you mean pure internet like do a do a web search or write me in line Cod I would expect like one of the most

important direction to be better human simulation and better hum in the loop uh

advances um and maybe also multimod I love your prediction let's wrap it up here works thank you

[Applause]

Loading...

Loading video analysis...