Increasing Product Output with Concurrent Agents | TRAE Meetup @ San Francisco
By Trae
Summary
## Key takeaways - **Trey's planning and UI streamline**: Trey's planning capabilities and its Q context engine provide a good understanding of codebases. The Solo mode offers a visual interface for generating specs and documentation, which is more convenient than manually rendering diagrams in a gist. [01:15] - **Integrated browser for frontend debugging**: Trey's integrated browser offers tight feedback loops for frontend development, allowing agents to see browser errors and use a built-in selector to pinpoint UI issues. This is more efficient than text-only interfaces that require copy-pasting markup. [02:55] - **RAG and context engine vs. agentic search**: While agentic search is useful, it struggles with large codebases. Trey's RAG and context engine allow agents to understand concepts and connections, providing a more robust way to navigate complex projects compared to tools that start cold each time. [04:48] - **Concurrency boosts productivity with AI agents**: When individual AI agent tasks take longer, increasing concurrency is key to boosting overall product output. This approach minimizes idle time and maximizes the amount of work that can be done simultaneously. [08:37] - **Deep research primes AI agents effectively**: To improve AI agent performance, initiate tasks with deep research prompts that cover codebase understanding, dependencies, and intent. This 'warms up' the agent, enabling it to perform more effectively on subsequent tasks without needing massive, detailed prompts. [09:02] - **Model-specific prompting for optimal results**: Different AI models have varying preferences; for instance, GPT-5 benefits from concise prompts ('less is more'), while Grok benefits from narrow scopes. Understanding these model-specific nuances is crucial for maximizing their effectiveness. [11:20]
Topics Covered
- Visual Feedback Loops: Accelerating Frontend AI Development
- RAG: Why Context Engines Outperform Agentic Search in Large Codebases
- Concurrency: The Key to Productive AI-Assisted Coding
- Less is More: Prime Agents with Deep Research, Not Long Prompts
- What's Missing? Persistent Context and Smart Inter-Agent Communication
Full Transcript
So I'm going to be talking a bit about
uh how I use just agents in general uh
and as well as using Trey uh to kind of
increase my general product output with
uh increasing the concurrency. So I
guess a little bit about me. Um so I'm
part of the web info team at bite dance
and uh we work on a lot of build
tooling. So I don't know if any of you
have ever heard of RSpack or any of that
ecosystem. You might have heard of
Webpack. Uh so essentially I was on the
core team at Webpack and we ported
Webpack into Rust and created a whole
ecosystem around it. So I also work on
uh created something called module
federation which is kind of like
distributed code sharing at runtime in
the server and the browser. Um and I'm a
very heavy user of AI.
So, I guess getting into it, what I kind
of want to go through is a bit around,
it's going to be a hodgepodge of things
that I've kind of found useful. Um, but
I would say kind of one of the big
things that I've noticed overall is uh
the planning aspects. So, what I'm going
to start this com this talk with is a
bit about what I really enjoy about Trey
itself. And one big aspect is the
planning of it. So, the nice thing that
I really like is it's good at planning.
it generally I found the Q context
engine to be quite good at getting a
general understanding of your codebase
and it has a really especially in solo
mode it has a really nice user interface
so when it goes to you know spec out
things or create docs there's a good
viewer for it it's quite visual um and
you know overall it's just a convenient
and nice uh interface to work with
before using this I would usually
generate similar kind of docs and have
to stick it into a gist to get it to
render the diagrams to check everything.
So, it's really nice when you just see
it kind of build something out like this
and you get good UMLS um other kind of
visual cues in there as well. And what I
found is the the uh the visualizations
aren't necessarily super useful to me,
but when I pass these to other AI
agents, it's a really good way for it to
have a quick way to understand that this
connects to that. So it helps a lot in
as the codebase gets larger and larger
for it to understand kind of where it
should start exploring just by it
reading the kind of mermaid uh graphs
itself to use as a guide within uh
larger code bases.
So another really nice thing that I like
about Trey is just the kind of build and
iterate flow. So once it kind of gets
through its planning and does its docks,
it obviously has its ready to build
where it can kind of kick off and start,
you know, the build process. And a
really nice thing that I like is
especially in front end, it's got really
good feedback loops or tight feedback
loops. So if I compare this to say like
some CLI tools or other things like
that,
the feedback loop is a little bit more
tricky because it's mostly text only. So
uh like the thing that I love the most
about Trey is the integrated browser. So
I'm working on something especially in
the front-end space. Yes, I can write a
bunch of end to-end tests and that
sometimes works. Uh but if like So, how
many of you are front-end devs here or
work in front end? Okay. Okay. A few.
Not as wow. Not not a lot. Okay. Uh, so
in the frontend world, you know, you run
into a lot of these visual issues. So, a
big one is like, oh, okay, you know, I'm
using flexbox and maybe something's
floating off to the side. Or maybe, hey,
why is this blue? It's ugly. The rest of
the site, you know, looks different.
Now, I can type out, hey, the buttons
are blue. Make them not blue. But I have
to paste in the link or copy the markup
from the browser. Like it's a very
cumbersome process for the model to
understand, okay, what where is it
you're talking about this? Um, so a
really cool feature I love is the it's
got the built-in selector. So I can
select something. You'll see I have a
green box around it and there's a little
add to chat at the bottom. So I can just
add that into the chat and be like these
shouldn't be blue. And now it knows what
I'm talking about. On top of that, the
agent can actually see the browser
errors when it's running this. So it can
debug some runtime aspects as well which
is I've just found like this type of
interface is much more efficient versus
like the textorientoriented one where
you have to copy and paste or explain it
in. Uh it's really really great for
tweaks and modifications on your your
interface. So I really do appreciate
that.
So obviously a big thing that I've kind
of noticed with these models or with
just agentic coding is you know it's all
around context and context is very
difficult to to kind of wrangle and each
model I've discovered has different
preferences for context and prompting
but Trey does a really good job around
the context engine. So I would say like
a big a really big difference between
say like Trey IDE and your standalone
CLI agents and while I use both of these
tools quite heavily the a big thing that
the CLI lack is you know this idea of a
rag or the ability to kind of understand
beyond invoking it right now. So
generally when I'm using a CLI tool, it
starts cold each time. It has to
understand what is it about this
codebase? How does it work? Where are
things going? So on and so forth. And I
know Agentic Search got really big uh
kind of when CLI tools came out. And in
the beginning, I really fell in love
with Agentic Search because it was like,
okay, we're freed from like the need of
a rag, which is kind of slow and
annoying to index everything. But what I
had found is as a codebase gets larger,
you start to really need like conceptual
links because the uh agentic search is
really good for like finding a couple
things, but what happens when there's,
you know, 45,000 files in the codebase,
it's not going to be able to find every
single instance or if the words don't
exactly match, it's just a lot harder
for it to remember to keep tracing,
following, import, so on and so forth.
But with rag and the Q engine, you know,
I can give it a concept and it can
immediately what I really like as well
is like when I start the first chat in a
new chat, when I'm doing this in like a
CLI tool, there's some back and forth.
You see it kind of rumaging around, but
when I can do it in Trey, the first
message, it's like, "Oh, okay. Yes,
this." And it just starts doing it. And
it always kind of catches me off guard
because it kind of jumps right into it
from the first message. And so I I I owe
a lot of that just to it's got a really
good context engine. So I really do
enjoy the context engine that's
available in Trey. So solo is probably
my most favorite feature that they
introduced. The big thing that I have
found is as the agentic coding gets
better and better, the need for the
actual editor becomes kind of like the
thing you look at occasionally. I would
say most of the stuff that I do, I look
at the code in GitHub and that's where
I'm looking at code unless I need to
like go and like run the debugger myself
and do something specific. the first
time I'll see the code is in the diff of
the PR and then I can adjust it, tweak
it, send it back, so on and so forth. So
what's really important to me in this
kind of world is going to be real estate
especially for the terminal. So in the
case of where I want to use say my uh
Trey Solo um chat, but then I might want
to use CLI chat as well, the terminal I
absolutely love it because now I have
this big real estate yet I can still hop
over to the editor, copy a file path,
drag it back in here. So it's a lot more
convenient since most of my time is
spent like talking. So the layout is
probably one of my favorite things to
just have available.
So, okay. So, that's a bit about just
the things that I like and like how I
use these things. I kind of want to get
a bit into the workflow uh that that I
use day-to-day or things that I've just
kind of noticed. Um, so I would say
generally I have I would say I work on
usually around 11 code bases
concurrently throughout the day. Um, and
I would say
I can I can manage about 50 agents in
parallel before things start to get a
little unwieldy. Um, but in doing all of
that, there's some very interesting
lessons that have kind of been learned
along the way. So, the first thing, and
I think somebody kind of mentioned this,
where they're like, okay, agents aren't
quick or or research has shown you are
less productive when doing AI coding.
And I think this is a sliding scale.
Usually, I'll see this with like uh so
since I work in open source quite a bit,
I'll usually see like open source
authors will say this that AI is slower
than doing it themselves. Generally
that's because they know the codebase
very very well and they're usually quite
specialized. So it is most likely
quicker for them to go and do it. And
I've felt that quite a bit especially as
models went into thinking they got a lot
slower. So the way that I kind of solve
this problem is really you need to
increase the concurrency. So if it takes
longer to do one task on average you
also end up with a ton of idle time
because you're just sitting there
staring at the screen. So how do you
solve that problem where you have less
idle time and essentially you can now
increase the amount of output you're
capable of doing by just having more
concurrent work being done.
So a big thing that I also do is
whenever I kind of start off with
something I'm a big fan of like the deep
research. So, I kind of got the idea
from, you know, OpenAI. Uh, but the idea
is when whether it's a CLI or anything,
when I start on kind of a subject matter
or something I'm going to work on, the
first thing I'll do is say, "Hey, go
deep research this area." If I kind of
know in general, we're going to be
working on A, B, and C. I'll drag the
folders in or maybe certain files that I
know this is where we're going to begin
the work and I say, "Hey, deep research
this." essentially prime the agent to
understand what does it import, who
depends on it, how is it connected in
the codebase, what's the intent behind
what the code is actually doing and
really try to heat up the agents to make
sure that they they generally understand
where they're going, what they're doing,
and how the codebase kind of exists
around them. Um, often I'll try to use
like sub agent tasks when possible to
speed up the research because it can
take maybe 10 to 15 minutes for it to
just crawl through everything. But you
know a big thing is I'll say okay maybe
let it have 20 minutes to just dig
around look at things understand and
from there then I would actually start
working. The big thing is is uh I know a
lot of uh a lot of people at least I see
this on LinkedIn a lot is like you know
here's my prompt to change the world and
it's like you know these massive you
know you've seen these kind of batty
prompt things on LinkedIn. So I come
across a lot of that. And what's
interesting is especially on Twitter
when I kind of talk about some of the
things I might have done or how I'm
making AI do something. People are like,
"Oh, what's the prompt you're using?"
And my usual replies are something like,
"Fix this." And I, you know, I pass it a
file. So I'm probably the laziest
prompter in the world. Most of my
prompts are one sentence, maybe two. And
but I still get really, really good
results from it. I don't have a lot of
churn. And I think generally the
projects that I'm also working on are
quite large and complex and they're
existing projects versus new things. But
I found if you if you prime it, research
this, understand how it works. And then
often if you know I want to kind of warm
it up a bit more, I'll ask it questions.
How does this work? How does that go
over there? I'll know how that does
that. But prompting the model to do the
research so that it understands the why
already. I don't have to write this
massive kind of stencled out prompt for
it to go and do something. I can just
say hey you know the test is failing on
this go and fix it and it has enough
context now to understand what to do. I
think it's also kind of important uh
depending on the model that you're using
there's also different preferences. So
for example GBT5 I found does better
with less prompting. Like I know in uh
like in cloud models you know you would
capitalize something to really emphasize
it. That doesn't really work in GBT5 and
it doesn't really care what you think.
It's going to kind of just go off and do
its own thing. So I found less is more
in in in uh in in that specific model. I
found things like grot code models are
much better with like a defined narrow
scope as well. So knowing your model and
kind of how it works is I would say like
it's very very advantageous and they
just have some different preferences and
how you kind of prompt them. Um and
figuring that out gets you a lot more
mileage out of what they're doing and
how well they do it. Uh I would say one
last thing is you know a model is not a
replacement for knowledge. So the way
that I use all of these is if if my
knowledge cap is here, I might push the
model like a little higher, but
generally I'm not going to have it go
and do something that I wouldn't be able
to do myself. Now I found my need to
understand a language is slightly less,
but your need to understand architecture
and um critical thinking, details,
depth, things like that is still very
very important. But ultimately, the
model is only going to it's going to be
slightly worse than you are. So, it's
not a replacement for how well you
understand things. It's really just
giving you more hands to go and do
stuff. Maybe slightly worse than you
can, but you know, in general, it's more
hands. It's not, you know, a silver
bullet.
So, a big thing again that I try to do
with these things is to parallelize all
the tasks. And this is kind of tricky to
do. The industry is kind of still
warming up to how we do this. Well, um,
so I've kind of, you know, done it in
various ways. So, one scenario is I'll
just manually do it. So I have lots of
instances of tray running um doing
things. I will use git work trees to
kind of split things up. Um or I'll just
have lots of terminal tabs open with
different CLIs running in there doing
different things. Sometimes it'll be on
the same branch, sometimes it might be
on a different one. Uh but you know the
idea is okay what can I do that's not
related to this area or might not cause
a huge conflict and try to figure out
how you can map that out. So, you know,
usual cases I'll find is like, okay,
hey, let's go in and harden our tests
here because we could actually do better
in testing here um while you go and work
on this feature. So, there's two things
that I can kind of do to to speed it up
or split it up or you know, I work in
open source a lot. So, usually I'll have
them go look at git issues and try to
solve them in the background on a
different work tree. So things that you
know is either direct delivery that I
want to do or delivery that I'm not able
to do due to capacity constraints that
as long as I can manage the concurrency
I have a lot more capacity to do stuff.
And so the way I've kind of described
how I imagine this stuff the ideal case
I would want in the future for this is
almost like you have a TV with tons of
channels. And so my job is to page
through the channels really quickly, see
what programs I like and kind of keep
going because that's essentially what
you can get to depending on how
autonomous you go. Uh but the the
interface to be able to do that is still
kind of lagging behind a bit. Uh but
essentially try to parallelize stuff.
Uh some a big other issue I found is
collision avoidance. And this is still a
really tough one that nobody's quite uh
solved out in a super automatic way, but
there's a few kind of tricks that you
can do to help uh work around collision
avoidance. So depending on the tool
you're using, different kind of tricks
to it, but uh in like CLI tools, most of
them have some kind of plug-in system.
Usually what I'll do is I'll implement
something like mutx if anybody's from
like a rust background or or you know
kind of backend languages. I just know
mutx from rust but essentially some kind
of like lock file on it. So if you have
multiple agents who are going to go and
edit the file if one is already they
essentially can reserve the file with
like a little node script and just await
until the lock is removed. So now four
or five of them want to edit this. They
will just sit there and wait until the
lock is removed and then the edits will
kind of follow up. Uh so something like
that does help a huge amount. And I
usually for these whatever tool I use
I'll just use AI to write the tool like
to write the the the handler for this.
Uh and you know so it works pretty well
and I don't really have to do a whole
lot. Uh another good one is work trees.
I see work trees probably as the best
scenario. it's just kind of annoying to
manage that by hand. Um, but with work
trees, you can do things, you know,
branch it off here and then the idea
would be is this a separate pull request
or is it concurrent work you want to do?
And then what I would do is I would have
something like GBT 5 uh the codeex
models rebase the work trees back onto
the main branch. So I could work on
several conflicting things for the
feature and then have at the end of it
somebody rebase all the work trees back
together and they can kind of understand
well what was the intent behind each
thing now collapse it all back into the
one branch and I've essentially avoided
any potential collisions. Um and then I
would say you know another big one
really is to try to keep the scope
narrow. This is kind of good for context
um token cost and just making sure the
model doesn't go too crazy trying to do
stuff is uh how can you define okay hey
you're going to work on this package
don't drift off into the other ones and
again this is also kind of difficult to
to uh describe in the current kind of
constru constructs of prompting I found
so a big thing I always do is combining
tools so some of my favorite combos
would be taking tray and like an agent
MCP so most CL CLI tools have an MCP for
them and I want Trey's context engine to
drive this thing, but I might want a
workhorse to go off and do all of the
work. So Trey is really good at
understanding the context and I can farm
out work to um a different model and
Trey can essentially manage the model.
So if the model comes back and says,
"Hey, you know, these are the next
steps," the the Trey agent can just say,
"Okay, yeah, do the next steps or review
it, make some suggestions." and it's
essentially those two talking to each
other and I'm not really dealing with
the kind of dynamics between the models
themselves but there's some kind of
management layer. Um the other one is
kind of CLI with MCP agents as well can
give you a lot of concurrency but it is
tricky to tame this. So you do kind of
want to watch what they're doing and how
many nested levels you allow them to
create. uh and you know a big thing is
task control and the kind of maturity of
this area is is not uh is not great yet.
Uh another kind of really big thing that
I do in a lot of my especially in
frontend stuff is uh so this is actually
something our infert team created which
is called midsene um which is computer
vision. So at bite dance we've trained a
bunch of models on vision and they're
really really good but this also works
with like other models from providers.
Uh, but a big thing that I like to do is
to give these things machine vision
because I find that's kind of one of the
last areas missing from a model is it
should be able to see what it's doing
and look at it similar to how I'm
looking at it. And so Midscene's also
kind of useful because through the model
like when I put it in tray it can Trey
can then control my Mac OS not just my
browser like browser use but I've seen
you know whole OS operating systems
controlled. I've seen Android phones
controlled browsers obviously and the
craziest one I've seen is uh robotic
arms inside of like certain
manufacturers where they'll actually
have arms interacting with the
touchcreens to test physical hardware.
So really really neat that you can
squeeze all of this out of like SDKs or
MCPS with these tools.
So, a big thing in like my endeavors
into this is I would say most of this
also came because I don't know if you
remember when Anthropic said they were
lowering the the max plan limits
that kind of happened. That was some
news recently. I don't know if anybody
kind of saw that news. Well, um so I
know a few people inside of Anthropic
and my bad for that. It was me and I
think three other people who were the
top users doing tens of thousands a
month on it. Um, but what I discovered
along the way is token conservation is a
really big problem, especially after
doing like I think $80,000 a month in
subsidized usage.
My bad. Uh, so it is a problem. I would
say though, definitely rely on
subsidized tokens where you can because
you're getting a great deal. Um, and
some ways to kind of try and conserve
tokens or make it better, especially
when you're doing things like sub aents.
I see often a lot of the times they're
all reading the same things. Or if it's
like fix the lint errors, what do all
the subtasks do? They run the lint
command. So everybody's reading this
massive thing or just clogging up your
process because they're all trying to
run tests in parallel, not aware of the
other one. So generally trying to find a
good way to like you know and this isn't
something a lot of tools have yet but
like middle management almost where
somebody runs it figures out what's
needed and then breaks it up and says
okay you're going to work on these
things these these these and just work
on that small area and not everybody
charge off at once and do the same
discovery work. So contact sharing and
things like that I think is something
that'll mature more but you know ways
that I've kind of done it memory files
or context files those do help a decent
amount. Um a good also tool is a repo
mix it'll just kind of create an XML
bundle of your entire codebase. So often
I'll like drag and drop that into chat
GPT if I want GPT pro to like review
something in the codebase and I can just
say hey here's an XML with the entire
code best codebase in there or just
dropping that into a prompt. doesn't
have to go off and search through all
the code and spend 10 minutes if I know
the model's got a large enough context
window. I can just say here's literally
everything or a section of everything.
Now look at it all and make your plan
and it charges off faster in the right
direction.
Um so you know kind of what's missing I
would say in general from what I've seen
or what I'd like to see in the future
from from some tools is context
container is this idea that I had. So, I
actually got the terms from from the
Trey team because they've been talking
about this idea of like container. Uh,
but the idea that I kind of had for it
was uh if I work in a project every day,
I'm in the IDE. Every time I start a new
chat, I kind of start from near zero.
Now, some are a little bit better about
understanding what I've been doing. But
in general, what it would almost be nice
to see is some there's somebody watching
kind of what do I do? So, what did I do
this week as a whole? and kind of just
understanding that in the background so
they understand more of what is the
intent of the user, how do they run
things without me having to like write
this all out in prompts. Uh something to
just see like the habits of how things
work, how they kind of work together. Um
and you know longer term persistent
memory would be really nice to kind of
see inter agent communication as well. I
think that would be a really big one.
I've experimented with like paging other
models inside of sub aents. A big
challenge I think with this is if you
build something where two models chat to
each other, they will get stuck in an
infinite loop of waste. So like the
nearest scenario that I found that's
quite useful is almost like
notifications. Uh hey, you have a new
notification, check your notifications.
And there can be a message from a
different model, but they can't spin
each other off into a loop. They can
only read their notifications or, you
know, broadcast one that another one may
or may not decide to act on now or
sometime in the future. Um but you know
I would love to see better inter agent
communication. I think conflict
management especially for concurrency is
going to be a really big one um as well
and you know some solution to work tree
management kind of baked in would
probably solve a lot of these type of
challenges. So hoping to see that at
some point and yeah so you know that's
just kind of things that I've discovered
along the way tactics I use workflows
that I have uh to kind of show you. Let
me jump out of the, you know, just so
you can kind of see like this is what
I've been doing today and this is more
or less what every day looks like. But,
you know, so these are the active
projects I'm working on at the moment.
So something in Rust. Um, this is an MCP
system I've built for CLI tools where
when you use RSpack and you go build, if
it throws an error, the AI will
automatically start fixing it. Adds like
chat widgets in the front end through
the build plugins, MCPs for every tool
we've ever built. Uh so you know kind of
working on that improving that passively
in the background. Um you know over here
busy trying to solve some CI tests uh
with again a tandem of using these and
these together. Uh let's see what else
is kind of going on
over here. Also working on an examples
repo where I've been porting everything
from like Cypress to playright so on and
so forth. But you know the idea is I'm
doing this a lot
and this essentially you some people say
this is sad some people don't but
I I'm I can't really say either way. I
personally I love doing this because yes
I could write this but I also could do
10 other things in parallel if I don't
have to. I just need to make sure that
when it's done it's how I would have
done it. And again that comes in later.
So, a lot of this is how quickly can I
navigate and and digest and manage all
these windows and potentially all these
nested terminals if I'm really getting
into a lot of concurrency. Um, so the
last thing kind of just wanted to get
into so using actually Trey to build out
an early concept where I discovered a
lot of these ideas. Trey is really great
at the front end and so what I had been
using it for is kind of to build out
this concept of okay, how could I manage
these agent tools better? uh and you
know this is the UI that it had come up
and helped me come up with building out
work trees being able to switch to them
new tasks recent sessions of chats that
I've been having on these things um
being able to edit the agents you know
if I pop into the browser here there's
also uh like you know access to my
GitHub space fix with AI research every
getit issue that comes in or any pull
requests that we have here's all the PRs
what's the status on them oh if that
one's failing I want to press a button
see it create a work tree and go off in
the background and fix it without me
having to think about it. Just message
me when you have a solution to it. And
you know, I would love a lot of that
where it's more pushing like pushing
notifications to me instead of me having
to pull and send things out to the
agents to get them to work.
Anyway, um you know, another thing you
should definitely check out as well is
uh the trace CLI. So, it's still kind of
getting built u built out, but it's it's
performed pretty well. And again, I
really do like these kind of scenarios
because I've also had cases where it's a
while loop until until there's a
breakout point. So you can take that,
put in choose your model, send it into a
while loop, and it will break the while
loop once it solves whatever it wants to
solve. And I can go to sleep and wake up
in the morning and hopefully it's not
still iterating. Um, but yeah, so you
know, it's a combo of various CLI tools,
whatever you like to kind of mix
together. Um, and of course using Trey
mostly because the UI is just super
awesome for vibe coding and the context
engine is really really nice. So I love
using them together. Anyway, thank you.
>> Thank you so much for the amazing
sharing from that. It's really nice.
Anyone has a question? Okay, let's do
it.
>> How do you tell this agent like a trade?
Can you define the coding styles such as
don't put the CSS in your HTML or
separate the controller logic from the
service layer control uh service layer
logic? I don't want to tell them if I
don't want to tell them every time I ask
question. Are there a way that define
one for all?
>> Yeah. So, okay. So, there's there's two
there's two scenarios. I don't know if
you guys know somebody called Theo. He's
a big YouTuber streamer.
>> Theo. Yeah. Uh, wait, no, no, um, T3. He
usually goes by T3. Um, so I was talking
to him about, uh, about these type of
scenarios. Um, I've actually lost track
of my thought. Can you ask the question
again so I can remember it?
>> Okay.
>> So, so there's two scenarios I was
chatting with him him about on this. So
the one is like it'd be really nice if
somebody was watching and this doesn't
exist but in my head I think this would
be what would be really great. So ideal
state and then how I deal with it today.
Um ideal state imagine you have somebody
who passively watches. So when you send
these corrections in there's a model who
watches what you do all day every day
kind of summarizes it almost like the
cheap pulse that just came out where at
night it thinks about everything you
spoke about during the day and kind of
writes new things to show you. That
would be great because I don't want to
have to prompt it like you said not
every time. And I also really hate
having to create prompt files and do it
every time, especially if you use a lot
of tools. It's a nightmare. Um, but it
would be nice if like, hey, as I say
these things, what am I having to
corrected on a lot? Remember that
automatically kind of guide these
agents. Now, how I deal with it today
usually is a lot of the time it's really
deep research. Like deep research works
so well, at least in the all the cases
that I've tried it in. And again, I'm
working on like bundlers, compilers,
module loading systems. So they're
generally fairly in-depth, which is a
nice test for this type of stuff. But I
often find like, okay, deep research,
how are things working now? And if you
have existing codebase, which is what I
mostly work in, it works quite well. Are
the standards already set up? Awesome.
Now, you can add like an agents MD file
or something to give it some guidance,
but generally it'll be, hey, research.
How do the patterns work? Where do
things already go? Who imports what? How
does the codebase currently work? You
should be minimally invasive when adding
new features. You should try to follow
all the existing patterns and guides for
things like don't put a type file just
don't define a type at the top of a file
if there's a types file somewhere else.
But a lot of this relies on did the
model see that that was there in a big
way is how does this thing work? What's
the you know you could almost say like
ask 10 questions to prime it. So you go
hey here's the 10 questions you need to
answer before doing work. Now it
understands through code and I always
think code is probably the best prompt
you can give it. So go and look at how
things work and follow those guidances
and keep following and usually don't
have to nag it because it already sees
and its job is to match and align with
the existing styles within the codebase
and that has worked quite well for me so
far. Not saying it's bulletproof.
>> Okay, anyone another question?
>> Let's do it one by one.
>> Uh thanks Z. Uh that was great talk. Uh
I'm intro interested in the concept
introduced as the warm up. Uh can you
elaborate a little bit more on that?
>> Say say that again.
>> A warm up. Warm up at the agent.
>> Yeah. Um
>> so like it's usually when I start a new
chat with something it starts from zero.
So the biggest case I'll see is I say,
"Hey, run the tests." So if I do that,
it's probably going to half the time
depending on like Trey is generally
better about this, but I'm trying to
speak more broadly about all the tools
that I've encountered is I imagine
everybody uses a mix of tools. But a
common pitfall I see is oh it'll try to
use npm or you know something silly like
that. Um
so by warming it up the idea is like
familiarize yourself of h like what we
use what's our llinter how where are the
commands run what are the commands so it
doesn't run like six commands before
realizing it's an NX repo and then
running the NX commands like go and
understand how the codebase works read
through like what is it that we're
working on so if I have the plan to add
this feature what in my head do I know
are the areas that we're going to be
touching Okay. Hey, here's like the
files like the entry points. Go look at
this, this, and this. Maybe this folder.
Now, research how these aspects work
together. What is it doing? Where does
it go? Who depends on it? Where are the
tests for it? Uh, you know, what types
does it use? What are the APIs
available? Just trying to let it look
around. So when you give it a feature,
it doesn't just go off, you know, the
sentence that you gave it and try and
figure out this codebase based on one
sentence, but you're kind of like
explore, look around, tell me how it
works, ask it questions you already
know, but that are going to make it dig
around before you give it a task. This
kind of also helps the and it depends on
the model, but some models I found are
too eager to start working. Like I see
this with Grock. Grock, I love how fast
it is, but it's I'm like, whoa, buddy.
Okay, like let's look around first.
Let's not just start. But what I've also
found is like mixing the models. So
often I'll use GBT5 with Grock as a sub
aent so that my rights are really cheap
and fast, but I have a like a good brain
driving and reviewing the work. Um, but
again, it comes into okay, how can I how
can you warm this up? How can you make
sure that the model has a better bearing
so that it's not just blindly hunting
around for, you know, like GP for test
and look at a million unrelated things.
kind of point it in there and then have
it go and explore on its own so that
you're not writing a massive prompt file
and the model is not just aimlessly
searching either too broad or too
narrow.
>> Um one more question just based on what
you said what exactly is the sub agent?
What exactly is the sub agent that you
just said that you use GPD5 then use
Grock as sub agent? So the sub agent so
most of these tools and this is because
I've like taken apart many of them but
like a lot of the tools like even inside
a tray um in most CLI tools there's when
you're doing the work it's not the chat
that you're talking with that's carrying
the work out. The chat probably is
talking to like an edit agent who's
actually performing the code edits. Now
some tools allow you to modify or change
which model which piece uses or to
define new sub aents. Um, so often I'll
do that where I'll kind of mix and match
it. So I'll say, "Hey, use the code
editor sub agent which has access to all
the same tools as the parent one, but
the parent one might be a bigger, slower
thinking model that can tell a dumb but
fast model to go and do something and
then it can check it." So it's kind of
trying to also think for mixing costs.
The biggest cost in AI is egress is
output tokens. So what I often try to
think about in like custom workflows is
okay, how do I combine good thinking
with fast cheap output because if the
model's good enough to review what's
going on, I can use a really dumb model
to just write it down because writing is
the least valuable part of the workflow
and it's also the most expensive. So
finding a model that's just good at
editing files but not good at thinking
or you know just do this and do that and
now you have somebody correcting that
one and it drops the cost and and also
makes it faster. But it depends on the
tools that you have or are using.
>> Yeah. So you um you talked about like
splitting agents between tasks and kind
of you showed us there that you had like
eight different I don't know if they're
different projects or if they're all
>> they're all So usually so I have like
two that are the same project and I just
duplicated the repo poor man's work
trees but then the rest of them are
separate repos.
>> Okay. So say you were working on like a
web app or something, how would you
would you open like multiple instances
of say tray to then work on different
parts of the web app at the same time?
It depends on like how deviated the task
is. And I think this is also a challenge
that we need to solve in the interface
aspect in general even in CLIs like
everywhere. I think a big challenge is
also AI labs are usually producing the
tools and AI labs aren't building with
the tools every day. So, this does
create like some time for them to figure
out how we're using it, I guess. But I
think there's a couple concepts here.
The one way that I found, and maybe this
is because of ADHD, but often I'll be
working on something and then I'm going
to branch into, "Oh, wait, but I can go
and do this." And that's often how a lot
of dev workflows that I see is halfway
through you realize you could do
something better. Now, do you derail
what you're currently doing or do you
hope that you remember it for later? But
so so there's I think when you're
working on a certain task, is this
something that's going to bubble up onto
the same branch? Like it's technically
part of the same deliverable, but you
could tackle three parts in parallel to
it. That's kind of akin to, you know,
forking or branching off of the off the
conversation. Um, and in doing that, I
think a lot of this comes down to what
is the task? So I'm trying to think of
some examples that I've had. Usually
it's things where I know they're not
necessarily going to conflict, so I
don't have to get into conflict
management. But if you know I'm working
on fixing this and it's like oh well you
like when I was building the UI out it's
like okay here's my chat interface now
I'm working on adding this work tree
management stuff but I want my tool
calls to render when they're streamed
back in the chat so I'm not going to
create two different PRs for that I just
want to quickly go and fix it can spin
something up and I know that these are
you know this isn't that folder this
isn't this one the chances of them
touching each other are slim and if
there is some overlap usually the models
can see that oh it was recently edited
after I did it and it kind of knows not
to mess with it or or override it again.
Um but yeah, I would say managing this
is really it depends on what you're
trying to do and some of it is just
thinking okay upfront where are the
areas I know I could split it off more
or less or as you get into it what are
other areas that you can go and scope it
into. So I find monor repos is a good
example of where this is useful because
you also have separate builds that you
could run type checking that you could
run that's not necessarily going to mess
with like the single app.
>> Okay. All right. One last question.
Okay, may maybe we can do it.
>> Yeah, just following on Jack's question.
So, does Solo have the ability to really
see the uh front end and seeing what's
actually on the web or is it not?
Because that's one of the big problems
with claude is you can't like you got to
like tell it and you got to input it and
it'll look at if you upload the file,
but
inside of the instance is it to do that
or not?
>> Do you want machine vision?
I wanted to see what the what they
exactly.
>> Okay. Um I think in the current version
of solo it's not there yet but I have
spoken to all of the team and saying
like hey look you got everything already
there just take a screenshot and attach
it as well. Um, so not currently, but I
would say Trey is definitely the one
where this would not be as difficult as
other tools because the integrated
browser, it already gets I mean it gets
your HTML. It has everything except the
screenshot. So it understands terminal,
you know, the console errors. It can see
the markup. It has a selector so I can
send it the X path to the node I'm
talking about. But I'm hoping by the
next release I can convince them to add
a vision support. Uh, what I'd really
love to see in there is like computer
use. Sure.
>> Uh now if you wanted to sideloadad this
on midcene would be really useful
because we offer it's also from bite
dance uh but uh it offers you can do
either uh uh MCP and it will add like
computer use browser use. So it uses
machine vision only to interact and
navigate everything. So that's often
what I'll do inside of Trey as well is
say hey here's a MCP use that to click
navigate and actually use the app. But
it's all vision based. It's not DOM or
you know style.
>> And just a second question is it smart
enough to like make sure the CSS is
organized so that the main frames got
all the positioning and the subframes
are flex you know or not yet?
>> It's this really depends on the model
interpreting visuals.
>> Okay.
>> So this is not even like a a tray thing
or a tool thing. It's is and I found a
lot of models in the US are not as good
at machine vision and I'm not 100% sure
why but it it really does depend on on
who's who the model is. Some of them are
trained better to understand layout
which I think is a big thing. Um, I
would say uh for like a US model
probably the the the good ones are so
there's Quinn the Quinn VLM well that's
not a US model but so Quinn the VLM
model open source model but it's one of
the best that we've used. I would say if
I ranked it, it would be bite danc's
duba uh dow I can't pronounce duba our
seed model our seed VLM model is like
the best that we've seen in benchmarks
then I would say it's Quinn uh Quinn's
VLM and then below that I would say
Gemini 2.5 Pro. So if you do if you're
looking for a model to use for machine
vision or for something like a mids
scene, one of those three models is
going to give you really good results,
really reliable clicking, interacting
with navs. Um yeah, so
>> thank you so much. Due to the time
limitation, if you get another question
directly reach out to Zach, you know,
after keynote. So let's welcome to Nest
Tree Power user. Okay. Red.
Loading video analysis...