TLDW logo

Increasing Product Output with Concurrent Agents | TRAE Meetup @ San Francisco

By Trae

Summary

## Key takeaways - **Trey's planning and UI streamline**: Trey's planning capabilities and its Q context engine provide a good understanding of codebases. The Solo mode offers a visual interface for generating specs and documentation, which is more convenient than manually rendering diagrams in a gist. [01:15] - **Integrated browser for frontend debugging**: Trey's integrated browser offers tight feedback loops for frontend development, allowing agents to see browser errors and use a built-in selector to pinpoint UI issues. This is more efficient than text-only interfaces that require copy-pasting markup. [02:55] - **RAG and context engine vs. agentic search**: While agentic search is useful, it struggles with large codebases. Trey's RAG and context engine allow agents to understand concepts and connections, providing a more robust way to navigate complex projects compared to tools that start cold each time. [04:48] - **Concurrency boosts productivity with AI agents**: When individual AI agent tasks take longer, increasing concurrency is key to boosting overall product output. This approach minimizes idle time and maximizes the amount of work that can be done simultaneously. [08:37] - **Deep research primes AI agents effectively**: To improve AI agent performance, initiate tasks with deep research prompts that cover codebase understanding, dependencies, and intent. This 'warms up' the agent, enabling it to perform more effectively on subsequent tasks without needing massive, detailed prompts. [09:02] - **Model-specific prompting for optimal results**: Different AI models have varying preferences; for instance, GPT-5 benefits from concise prompts ('less is more'), while Grok benefits from narrow scopes. Understanding these model-specific nuances is crucial for maximizing their effectiveness. [11:20]

Topics Covered

  • Visual Feedback Loops: Accelerating Frontend AI Development
  • RAG: Why Context Engines Outperform Agentic Search in Large Codebases
  • Concurrency: The Key to Productive AI-Assisted Coding
  • Less is More: Prime Agents with Deep Research, Not Long Prompts
  • What's Missing? Persistent Context and Smart Inter-Agent Communication

Full Transcript

So I'm going to be talking a bit about

uh how I use just agents in general uh

and as well as using Trey uh to kind of

increase my general product output with

uh increasing the concurrency. So I

guess a little bit about me. Um so I'm

part of the web info team at bite dance

and uh we work on a lot of build

tooling. So I don't know if any of you

have ever heard of RSpack or any of that

ecosystem. You might have heard of

Webpack. Uh so essentially I was on the

core team at Webpack and we ported

Webpack into Rust and created a whole

ecosystem around it. So I also work on

uh created something called module

federation which is kind of like

distributed code sharing at runtime in

the server and the browser. Um and I'm a

very heavy user of AI.

So, I guess getting into it, what I kind

of want to go through is a bit around,

it's going to be a hodgepodge of things

that I've kind of found useful. Um, but

I would say kind of one of the big

things that I've noticed overall is uh

the planning aspects. So, what I'm going

to start this com this talk with is a

bit about what I really enjoy about Trey

itself. And one big aspect is the

planning of it. So, the nice thing that

I really like is it's good at planning.

it generally I found the Q context

engine to be quite good at getting a

general understanding of your codebase

and it has a really especially in solo

mode it has a really nice user interface

so when it goes to you know spec out

things or create docs there's a good

viewer for it it's quite visual um and

you know overall it's just a convenient

and nice uh interface to work with

before using this I would usually

generate similar kind of docs and have

to stick it into a gist to get it to

render the diagrams to check everything.

So, it's really nice when you just see

it kind of build something out like this

and you get good UMLS um other kind of

visual cues in there as well. And what I

found is the the uh the visualizations

aren't necessarily super useful to me,

but when I pass these to other AI

agents, it's a really good way for it to

have a quick way to understand that this

connects to that. So it helps a lot in

as the codebase gets larger and larger

for it to understand kind of where it

should start exploring just by it

reading the kind of mermaid uh graphs

itself to use as a guide within uh

larger code bases.

So another really nice thing that I like

about Trey is just the kind of build and

iterate flow. So once it kind of gets

through its planning and does its docks,

it obviously has its ready to build

where it can kind of kick off and start,

you know, the build process. And a

really nice thing that I like is

especially in front end, it's got really

good feedback loops or tight feedback

loops. So if I compare this to say like

some CLI tools or other things like

that,

the feedback loop is a little bit more

tricky because it's mostly text only. So

uh like the thing that I love the most

about Trey is the integrated browser. So

I'm working on something especially in

the front-end space. Yes, I can write a

bunch of end to-end tests and that

sometimes works. Uh but if like So, how

many of you are front-end devs here or

work in front end? Okay. Okay. A few.

Not as wow. Not not a lot. Okay. Uh, so

in the frontend world, you know, you run

into a lot of these visual issues. So, a

big one is like, oh, okay, you know, I'm

using flexbox and maybe something's

floating off to the side. Or maybe, hey,

why is this blue? It's ugly. The rest of

the site, you know, looks different.

Now, I can type out, hey, the buttons

are blue. Make them not blue. But I have

to paste in the link or copy the markup

from the browser. Like it's a very

cumbersome process for the model to

understand, okay, what where is it

you're talking about this? Um, so a

really cool feature I love is the it's

got the built-in selector. So I can

select something. You'll see I have a

green box around it and there's a little

add to chat at the bottom. So I can just

add that into the chat and be like these

shouldn't be blue. And now it knows what

I'm talking about. On top of that, the

agent can actually see the browser

errors when it's running this. So it can

debug some runtime aspects as well which

is I've just found like this type of

interface is much more efficient versus

like the textorientoriented one where

you have to copy and paste or explain it

in. Uh it's really really great for

tweaks and modifications on your your

interface. So I really do appreciate

that.

So obviously a big thing that I've kind

of noticed with these models or with

just agentic coding is you know it's all

around context and context is very

difficult to to kind of wrangle and each

model I've discovered has different

preferences for context and prompting

but Trey does a really good job around

the context engine. So I would say like

a big a really big difference between

say like Trey IDE and your standalone

CLI agents and while I use both of these

tools quite heavily the a big thing that

the CLI lack is you know this idea of a

rag or the ability to kind of understand

beyond invoking it right now. So

generally when I'm using a CLI tool, it

starts cold each time. It has to

understand what is it about this

codebase? How does it work? Where are

things going? So on and so forth. And I

know Agentic Search got really big uh

kind of when CLI tools came out. And in

the beginning, I really fell in love

with Agentic Search because it was like,

okay, we're freed from like the need of

a rag, which is kind of slow and

annoying to index everything. But what I

had found is as a codebase gets larger,

you start to really need like conceptual

links because the uh agentic search is

really good for like finding a couple

things, but what happens when there's,

you know, 45,000 files in the codebase,

it's not going to be able to find every

single instance or if the words don't

exactly match, it's just a lot harder

for it to remember to keep tracing,

following, import, so on and so forth.

But with rag and the Q engine, you know,

I can give it a concept and it can

immediately what I really like as well

is like when I start the first chat in a

new chat, when I'm doing this in like a

CLI tool, there's some back and forth.

You see it kind of rumaging around, but

when I can do it in Trey, the first

message, it's like, "Oh, okay. Yes,

this." And it just starts doing it. And

it always kind of catches me off guard

because it kind of jumps right into it

from the first message. And so I I I owe

a lot of that just to it's got a really

good context engine. So I really do

enjoy the context engine that's

available in Trey. So solo is probably

my most favorite feature that they

introduced. The big thing that I have

found is as the agentic coding gets

better and better, the need for the

actual editor becomes kind of like the

thing you look at occasionally. I would

say most of the stuff that I do, I look

at the code in GitHub and that's where

I'm looking at code unless I need to

like go and like run the debugger myself

and do something specific. the first

time I'll see the code is in the diff of

the PR and then I can adjust it, tweak

it, send it back, so on and so forth. So

what's really important to me in this

kind of world is going to be real estate

especially for the terminal. So in the

case of where I want to use say my uh

Trey Solo um chat, but then I might want

to use CLI chat as well, the terminal I

absolutely love it because now I have

this big real estate yet I can still hop

over to the editor, copy a file path,

drag it back in here. So it's a lot more

convenient since most of my time is

spent like talking. So the layout is

probably one of my favorite things to

just have available.

So, okay. So, that's a bit about just

the things that I like and like how I

use these things. I kind of want to get

a bit into the workflow uh that that I

use day-to-day or things that I've just

kind of noticed. Um, so I would say

generally I have I would say I work on

usually around 11 code bases

concurrently throughout the day. Um, and

I would say

I can I can manage about 50 agents in

parallel before things start to get a

little unwieldy. Um, but in doing all of

that, there's some very interesting

lessons that have kind of been learned

along the way. So, the first thing, and

I think somebody kind of mentioned this,

where they're like, okay, agents aren't

quick or or research has shown you are

less productive when doing AI coding.

And I think this is a sliding scale.

Usually, I'll see this with like uh so

since I work in open source quite a bit,

I'll usually see like open source

authors will say this that AI is slower

than doing it themselves. Generally

that's because they know the codebase

very very well and they're usually quite

specialized. So it is most likely

quicker for them to go and do it. And

I've felt that quite a bit especially as

models went into thinking they got a lot

slower. So the way that I kind of solve

this problem is really you need to

increase the concurrency. So if it takes

longer to do one task on average you

also end up with a ton of idle time

because you're just sitting there

staring at the screen. So how do you

solve that problem where you have less

idle time and essentially you can now

increase the amount of output you're

capable of doing by just having more

concurrent work being done.

So a big thing that I also do is

whenever I kind of start off with

something I'm a big fan of like the deep

research. So, I kind of got the idea

from, you know, OpenAI. Uh, but the idea

is when whether it's a CLI or anything,

when I start on kind of a subject matter

or something I'm going to work on, the

first thing I'll do is say, "Hey, go

deep research this area." If I kind of

know in general, we're going to be

working on A, B, and C. I'll drag the

folders in or maybe certain files that I

know this is where we're going to begin

the work and I say, "Hey, deep research

this." essentially prime the agent to

understand what does it import, who

depends on it, how is it connected in

the codebase, what's the intent behind

what the code is actually doing and

really try to heat up the agents to make

sure that they they generally understand

where they're going, what they're doing,

and how the codebase kind of exists

around them. Um, often I'll try to use

like sub agent tasks when possible to

speed up the research because it can

take maybe 10 to 15 minutes for it to

just crawl through everything. But you

know a big thing is I'll say okay maybe

let it have 20 minutes to just dig

around look at things understand and

from there then I would actually start

working. The big thing is is uh I know a

lot of uh a lot of people at least I see

this on LinkedIn a lot is like you know

here's my prompt to change the world and

it's like you know these massive you

know you've seen these kind of batty

prompt things on LinkedIn. So I come

across a lot of that. And what's

interesting is especially on Twitter

when I kind of talk about some of the

things I might have done or how I'm

making AI do something. People are like,

"Oh, what's the prompt you're using?"

And my usual replies are something like,

"Fix this." And I, you know, I pass it a

file. So I'm probably the laziest

prompter in the world. Most of my

prompts are one sentence, maybe two. And

but I still get really, really good

results from it. I don't have a lot of

churn. And I think generally the

projects that I'm also working on are

quite large and complex and they're

existing projects versus new things. But

I found if you if you prime it, research

this, understand how it works. And then

often if you know I want to kind of warm

it up a bit more, I'll ask it questions.

How does this work? How does that go

over there? I'll know how that does

that. But prompting the model to do the

research so that it understands the why

already. I don't have to write this

massive kind of stencled out prompt for

it to go and do something. I can just

say hey you know the test is failing on

this go and fix it and it has enough

context now to understand what to do. I

think it's also kind of important uh

depending on the model that you're using

there's also different preferences. So

for example GBT5 I found does better

with less prompting. Like I know in uh

like in cloud models you know you would

capitalize something to really emphasize

it. That doesn't really work in GBT5 and

it doesn't really care what you think.

It's going to kind of just go off and do

its own thing. So I found less is more

in in in uh in in that specific model. I

found things like grot code models are

much better with like a defined narrow

scope as well. So knowing your model and

kind of how it works is I would say like

it's very very advantageous and they

just have some different preferences and

how you kind of prompt them. Um and

figuring that out gets you a lot more

mileage out of what they're doing and

how well they do it. Uh I would say one

last thing is you know a model is not a

replacement for knowledge. So the way

that I use all of these is if if my

knowledge cap is here, I might push the

model like a little higher, but

generally I'm not going to have it go

and do something that I wouldn't be able

to do myself. Now I found my need to

understand a language is slightly less,

but your need to understand architecture

and um critical thinking, details,

depth, things like that is still very

very important. But ultimately, the

model is only going to it's going to be

slightly worse than you are. So, it's

not a replacement for how well you

understand things. It's really just

giving you more hands to go and do

stuff. Maybe slightly worse than you

can, but you know, in general, it's more

hands. It's not, you know, a silver

bullet.

So, a big thing again that I try to do

with these things is to parallelize all

the tasks. And this is kind of tricky to

do. The industry is kind of still

warming up to how we do this. Well, um,

so I've kind of, you know, done it in

various ways. So, one scenario is I'll

just manually do it. So I have lots of

instances of tray running um doing

things. I will use git work trees to

kind of split things up. Um or I'll just

have lots of terminal tabs open with

different CLIs running in there doing

different things. Sometimes it'll be on

the same branch, sometimes it might be

on a different one. Uh but you know the

idea is okay what can I do that's not

related to this area or might not cause

a huge conflict and try to figure out

how you can map that out. So, you know,

usual cases I'll find is like, okay,

hey, let's go in and harden our tests

here because we could actually do better

in testing here um while you go and work

on this feature. So, there's two things

that I can kind of do to to speed it up

or split it up or you know, I work in

open source a lot. So, usually I'll have

them go look at git issues and try to

solve them in the background on a

different work tree. So things that you

know is either direct delivery that I

want to do or delivery that I'm not able

to do due to capacity constraints that

as long as I can manage the concurrency

I have a lot more capacity to do stuff.

And so the way I've kind of described

how I imagine this stuff the ideal case

I would want in the future for this is

almost like you have a TV with tons of

channels. And so my job is to page

through the channels really quickly, see

what programs I like and kind of keep

going because that's essentially what

you can get to depending on how

autonomous you go. Uh but the the

interface to be able to do that is still

kind of lagging behind a bit. Uh but

essentially try to parallelize stuff.

Uh some a big other issue I found is

collision avoidance. And this is still a

really tough one that nobody's quite uh

solved out in a super automatic way, but

there's a few kind of tricks that you

can do to help uh work around collision

avoidance. So depending on the tool

you're using, different kind of tricks

to it, but uh in like CLI tools, most of

them have some kind of plug-in system.

Usually what I'll do is I'll implement

something like mutx if anybody's from

like a rust background or or you know

kind of backend languages. I just know

mutx from rust but essentially some kind

of like lock file on it. So if you have

multiple agents who are going to go and

edit the file if one is already they

essentially can reserve the file with

like a little node script and just await

until the lock is removed. So now four

or five of them want to edit this. They

will just sit there and wait until the

lock is removed and then the edits will

kind of follow up. Uh so something like

that does help a huge amount. And I

usually for these whatever tool I use

I'll just use AI to write the tool like

to write the the the handler for this.

Uh and you know so it works pretty well

and I don't really have to do a whole

lot. Uh another good one is work trees.

I see work trees probably as the best

scenario. it's just kind of annoying to

manage that by hand. Um, but with work

trees, you can do things, you know,

branch it off here and then the idea

would be is this a separate pull request

or is it concurrent work you want to do?

And then what I would do is I would have

something like GBT 5 uh the codeex

models rebase the work trees back onto

the main branch. So I could work on

several conflicting things for the

feature and then have at the end of it

somebody rebase all the work trees back

together and they can kind of understand

well what was the intent behind each

thing now collapse it all back into the

one branch and I've essentially avoided

any potential collisions. Um and then I

would say you know another big one

really is to try to keep the scope

narrow. This is kind of good for context

um token cost and just making sure the

model doesn't go too crazy trying to do

stuff is uh how can you define okay hey

you're going to work on this package

don't drift off into the other ones and

again this is also kind of difficult to

to uh describe in the current kind of

constru constructs of prompting I found

so a big thing I always do is combining

tools so some of my favorite combos

would be taking tray and like an agent

MCP so most CL CLI tools have an MCP for

them and I want Trey's context engine to

drive this thing, but I might want a

workhorse to go off and do all of the

work. So Trey is really good at

understanding the context and I can farm

out work to um a different model and

Trey can essentially manage the model.

So if the model comes back and says,

"Hey, you know, these are the next

steps," the the Trey agent can just say,

"Okay, yeah, do the next steps or review

it, make some suggestions." and it's

essentially those two talking to each

other and I'm not really dealing with

the kind of dynamics between the models

themselves but there's some kind of

management layer. Um the other one is

kind of CLI with MCP agents as well can

give you a lot of concurrency but it is

tricky to tame this. So you do kind of

want to watch what they're doing and how

many nested levels you allow them to

create. uh and you know a big thing is

task control and the kind of maturity of

this area is is not uh is not great yet.

Uh another kind of really big thing that

I do in a lot of my especially in

frontend stuff is uh so this is actually

something our infert team created which

is called midsene um which is computer

vision. So at bite dance we've trained a

bunch of models on vision and they're

really really good but this also works

with like other models from providers.

Uh, but a big thing that I like to do is

to give these things machine vision

because I find that's kind of one of the

last areas missing from a model is it

should be able to see what it's doing

and look at it similar to how I'm

looking at it. And so Midscene's also

kind of useful because through the model

like when I put it in tray it can Trey

can then control my Mac OS not just my

browser like browser use but I've seen

you know whole OS operating systems

controlled. I've seen Android phones

controlled browsers obviously and the

craziest one I've seen is uh robotic

arms inside of like certain

manufacturers where they'll actually

have arms interacting with the

touchcreens to test physical hardware.

So really really neat that you can

squeeze all of this out of like SDKs or

MCPS with these tools.

So, a big thing in like my endeavors

into this is I would say most of this

also came because I don't know if you

remember when Anthropic said they were

lowering the the max plan limits

that kind of happened. That was some

news recently. I don't know if anybody

kind of saw that news. Well, um so I

know a few people inside of Anthropic

and my bad for that. It was me and I

think three other people who were the

top users doing tens of thousands a

month on it. Um, but what I discovered

along the way is token conservation is a

really big problem, especially after

doing like I think $80,000 a month in

subsidized usage.

My bad. Uh, so it is a problem. I would

say though, definitely rely on

subsidized tokens where you can because

you're getting a great deal. Um, and

some ways to kind of try and conserve

tokens or make it better, especially

when you're doing things like sub aents.

I see often a lot of the times they're

all reading the same things. Or if it's

like fix the lint errors, what do all

the subtasks do? They run the lint

command. So everybody's reading this

massive thing or just clogging up your

process because they're all trying to

run tests in parallel, not aware of the

other one. So generally trying to find a

good way to like you know and this isn't

something a lot of tools have yet but

like middle management almost where

somebody runs it figures out what's

needed and then breaks it up and says

okay you're going to work on these

things these these these and just work

on that small area and not everybody

charge off at once and do the same

discovery work. So contact sharing and

things like that I think is something

that'll mature more but you know ways

that I've kind of done it memory files

or context files those do help a decent

amount. Um a good also tool is a repo

mix it'll just kind of create an XML

bundle of your entire codebase. So often

I'll like drag and drop that into chat

GPT if I want GPT pro to like review

something in the codebase and I can just

say hey here's an XML with the entire

code best codebase in there or just

dropping that into a prompt. doesn't

have to go off and search through all

the code and spend 10 minutes if I know

the model's got a large enough context

window. I can just say here's literally

everything or a section of everything.

Now look at it all and make your plan

and it charges off faster in the right

direction.

Um so you know kind of what's missing I

would say in general from what I've seen

or what I'd like to see in the future

from from some tools is context

container is this idea that I had. So, I

actually got the terms from from the

Trey team because they've been talking

about this idea of like container. Uh,

but the idea that I kind of had for it

was uh if I work in a project every day,

I'm in the IDE. Every time I start a new

chat, I kind of start from near zero.

Now, some are a little bit better about

understanding what I've been doing. But

in general, what it would almost be nice

to see is some there's somebody watching

kind of what do I do? So, what did I do

this week as a whole? and kind of just

understanding that in the background so

they understand more of what is the

intent of the user, how do they run

things without me having to like write

this all out in prompts. Uh something to

just see like the habits of how things

work, how they kind of work together. Um

and you know longer term persistent

memory would be really nice to kind of

see inter agent communication as well. I

think that would be a really big one.

I've experimented with like paging other

models inside of sub aents. A big

challenge I think with this is if you

build something where two models chat to

each other, they will get stuck in an

infinite loop of waste. So like the

nearest scenario that I found that's

quite useful is almost like

notifications. Uh hey, you have a new

notification, check your notifications.

And there can be a message from a

different model, but they can't spin

each other off into a loop. They can

only read their notifications or, you

know, broadcast one that another one may

or may not decide to act on now or

sometime in the future. Um but you know

I would love to see better inter agent

communication. I think conflict

management especially for concurrency is

going to be a really big one um as well

and you know some solution to work tree

management kind of baked in would

probably solve a lot of these type of

challenges. So hoping to see that at

some point and yeah so you know that's

just kind of things that I've discovered

along the way tactics I use workflows

that I have uh to kind of show you. Let

me jump out of the, you know, just so

you can kind of see like this is what

I've been doing today and this is more

or less what every day looks like. But,

you know, so these are the active

projects I'm working on at the moment.

So something in Rust. Um, this is an MCP

system I've built for CLI tools where

when you use RSpack and you go build, if

it throws an error, the AI will

automatically start fixing it. Adds like

chat widgets in the front end through

the build plugins, MCPs for every tool

we've ever built. Uh so you know kind of

working on that improving that passively

in the background. Um you know over here

busy trying to solve some CI tests uh

with again a tandem of using these and

these together. Uh let's see what else

is kind of going on

over here. Also working on an examples

repo where I've been porting everything

from like Cypress to playright so on and

so forth. But you know the idea is I'm

doing this a lot

and this essentially you some people say

this is sad some people don't but

I I'm I can't really say either way. I

personally I love doing this because yes

I could write this but I also could do

10 other things in parallel if I don't

have to. I just need to make sure that

when it's done it's how I would have

done it. And again that comes in later.

So, a lot of this is how quickly can I

navigate and and digest and manage all

these windows and potentially all these

nested terminals if I'm really getting

into a lot of concurrency. Um, so the

last thing kind of just wanted to get

into so using actually Trey to build out

an early concept where I discovered a

lot of these ideas. Trey is really great

at the front end and so what I had been

using it for is kind of to build out

this concept of okay, how could I manage

these agent tools better? uh and you

know this is the UI that it had come up

and helped me come up with building out

work trees being able to switch to them

new tasks recent sessions of chats that

I've been having on these things um

being able to edit the agents you know

if I pop into the browser here there's

also uh like you know access to my

GitHub space fix with AI research every

getit issue that comes in or any pull

requests that we have here's all the PRs

what's the status on them oh if that

one's failing I want to press a button

see it create a work tree and go off in

the background and fix it without me

having to think about it. Just message

me when you have a solution to it. And

you know, I would love a lot of that

where it's more pushing like pushing

notifications to me instead of me having

to pull and send things out to the

agents to get them to work.

Anyway, um you know, another thing you

should definitely check out as well is

uh the trace CLI. So, it's still kind of

getting built u built out, but it's it's

performed pretty well. And again, I

really do like these kind of scenarios

because I've also had cases where it's a

while loop until until there's a

breakout point. So you can take that,

put in choose your model, send it into a

while loop, and it will break the while

loop once it solves whatever it wants to

solve. And I can go to sleep and wake up

in the morning and hopefully it's not

still iterating. Um, but yeah, so you

know, it's a combo of various CLI tools,

whatever you like to kind of mix

together. Um, and of course using Trey

mostly because the UI is just super

awesome for vibe coding and the context

engine is really really nice. So I love

using them together. Anyway, thank you.

>> Thank you so much for the amazing

sharing from that. It's really nice.

Anyone has a question? Okay, let's do

it.

>> How do you tell this agent like a trade?

Can you define the coding styles such as

don't put the CSS in your HTML or

separate the controller logic from the

service layer control uh service layer

logic? I don't want to tell them if I

don't want to tell them every time I ask

question. Are there a way that define

one for all?

>> Yeah. So, okay. So, there's there's two

there's two scenarios. I don't know if

you guys know somebody called Theo. He's

a big YouTuber streamer.

>> Theo. Yeah. Uh, wait, no, no, um, T3. He

usually goes by T3. Um, so I was talking

to him about, uh, about these type of

scenarios. Um, I've actually lost track

of my thought. Can you ask the question

again so I can remember it?

>> Okay.

>> So, so there's two scenarios I was

chatting with him him about on this. So

the one is like it'd be really nice if

somebody was watching and this doesn't

exist but in my head I think this would

be what would be really great. So ideal

state and then how I deal with it today.

Um ideal state imagine you have somebody

who passively watches. So when you send

these corrections in there's a model who

watches what you do all day every day

kind of summarizes it almost like the

cheap pulse that just came out where at

night it thinks about everything you

spoke about during the day and kind of

writes new things to show you. That

would be great because I don't want to

have to prompt it like you said not

every time. And I also really hate

having to create prompt files and do it

every time, especially if you use a lot

of tools. It's a nightmare. Um, but it

would be nice if like, hey, as I say

these things, what am I having to

corrected on a lot? Remember that

automatically kind of guide these

agents. Now, how I deal with it today

usually is a lot of the time it's really

deep research. Like deep research works

so well, at least in the all the cases

that I've tried it in. And again, I'm

working on like bundlers, compilers,

module loading systems. So they're

generally fairly in-depth, which is a

nice test for this type of stuff. But I

often find like, okay, deep research,

how are things working now? And if you

have existing codebase, which is what I

mostly work in, it works quite well. Are

the standards already set up? Awesome.

Now, you can add like an agents MD file

or something to give it some guidance,

but generally it'll be, hey, research.

How do the patterns work? Where do

things already go? Who imports what? How

does the codebase currently work? You

should be minimally invasive when adding

new features. You should try to follow

all the existing patterns and guides for

things like don't put a type file just

don't define a type at the top of a file

if there's a types file somewhere else.

But a lot of this relies on did the

model see that that was there in a big

way is how does this thing work? What's

the you know you could almost say like

ask 10 questions to prime it. So you go

hey here's the 10 questions you need to

answer before doing work. Now it

understands through code and I always

think code is probably the best prompt

you can give it. So go and look at how

things work and follow those guidances

and keep following and usually don't

have to nag it because it already sees

and its job is to match and align with

the existing styles within the codebase

and that has worked quite well for me so

far. Not saying it's bulletproof.

>> Okay, anyone another question?

>> Let's do it one by one.

>> Uh thanks Z. Uh that was great talk. Uh

I'm intro interested in the concept

introduced as the warm up. Uh can you

elaborate a little bit more on that?

>> Say say that again.

>> A warm up. Warm up at the agent.

>> Yeah. Um

>> so like it's usually when I start a new

chat with something it starts from zero.

So the biggest case I'll see is I say,

"Hey, run the tests." So if I do that,

it's probably going to half the time

depending on like Trey is generally

better about this, but I'm trying to

speak more broadly about all the tools

that I've encountered is I imagine

everybody uses a mix of tools. But a

common pitfall I see is oh it'll try to

use npm or you know something silly like

that. Um

so by warming it up the idea is like

familiarize yourself of h like what we

use what's our llinter how where are the

commands run what are the commands so it

doesn't run like six commands before

realizing it's an NX repo and then

running the NX commands like go and

understand how the codebase works read

through like what is it that we're

working on so if I have the plan to add

this feature what in my head do I know

are the areas that we're going to be

touching Okay. Hey, here's like the

files like the entry points. Go look at

this, this, and this. Maybe this folder.

Now, research how these aspects work

together. What is it doing? Where does

it go? Who depends on it? Where are the

tests for it? Uh, you know, what types

does it use? What are the APIs

available? Just trying to let it look

around. So when you give it a feature,

it doesn't just go off, you know, the

sentence that you gave it and try and

figure out this codebase based on one

sentence, but you're kind of like

explore, look around, tell me how it

works, ask it questions you already

know, but that are going to make it dig

around before you give it a task. This

kind of also helps the and it depends on

the model, but some models I found are

too eager to start working. Like I see

this with Grock. Grock, I love how fast

it is, but it's I'm like, whoa, buddy.

Okay, like let's look around first.

Let's not just start. But what I've also

found is like mixing the models. So

often I'll use GBT5 with Grock as a sub

aent so that my rights are really cheap

and fast, but I have a like a good brain

driving and reviewing the work. Um, but

again, it comes into okay, how can I how

can you warm this up? How can you make

sure that the model has a better bearing

so that it's not just blindly hunting

around for, you know, like GP for test

and look at a million unrelated things.

kind of point it in there and then have

it go and explore on its own so that

you're not writing a massive prompt file

and the model is not just aimlessly

searching either too broad or too

narrow.

>> Um one more question just based on what

you said what exactly is the sub agent?

What exactly is the sub agent that you

just said that you use GPD5 then use

Grock as sub agent? So the sub agent so

most of these tools and this is because

I've like taken apart many of them but

like a lot of the tools like even inside

a tray um in most CLI tools there's when

you're doing the work it's not the chat

that you're talking with that's carrying

the work out. The chat probably is

talking to like an edit agent who's

actually performing the code edits. Now

some tools allow you to modify or change

which model which piece uses or to

define new sub aents. Um, so often I'll

do that where I'll kind of mix and match

it. So I'll say, "Hey, use the code

editor sub agent which has access to all

the same tools as the parent one, but

the parent one might be a bigger, slower

thinking model that can tell a dumb but

fast model to go and do something and

then it can check it." So it's kind of

trying to also think for mixing costs.

The biggest cost in AI is egress is

output tokens. So what I often try to

think about in like custom workflows is

okay, how do I combine good thinking

with fast cheap output because if the

model's good enough to review what's

going on, I can use a really dumb model

to just write it down because writing is

the least valuable part of the workflow

and it's also the most expensive. So

finding a model that's just good at

editing files but not good at thinking

or you know just do this and do that and

now you have somebody correcting that

one and it drops the cost and and also

makes it faster. But it depends on the

tools that you have or are using.

>> Yeah. So you um you talked about like

splitting agents between tasks and kind

of you showed us there that you had like

eight different I don't know if they're

different projects or if they're all

>> they're all So usually so I have like

two that are the same project and I just

duplicated the repo poor man's work

trees but then the rest of them are

separate repos.

>> Okay. So say you were working on like a

web app or something, how would you

would you open like multiple instances

of say tray to then work on different

parts of the web app at the same time?

It depends on like how deviated the task

is. And I think this is also a challenge

that we need to solve in the interface

aspect in general even in CLIs like

everywhere. I think a big challenge is

also AI labs are usually producing the

tools and AI labs aren't building with

the tools every day. So, this does

create like some time for them to figure

out how we're using it, I guess. But I

think there's a couple concepts here.

The one way that I found, and maybe this

is because of ADHD, but often I'll be

working on something and then I'm going

to branch into, "Oh, wait, but I can go

and do this." And that's often how a lot

of dev workflows that I see is halfway

through you realize you could do

something better. Now, do you derail

what you're currently doing or do you

hope that you remember it for later? But

so so there's I think when you're

working on a certain task, is this

something that's going to bubble up onto

the same branch? Like it's technically

part of the same deliverable, but you

could tackle three parts in parallel to

it. That's kind of akin to, you know,

forking or branching off of the off the

conversation. Um, and in doing that, I

think a lot of this comes down to what

is the task? So I'm trying to think of

some examples that I've had. Usually

it's things where I know they're not

necessarily going to conflict, so I

don't have to get into conflict

management. But if you know I'm working

on fixing this and it's like oh well you

like when I was building the UI out it's

like okay here's my chat interface now

I'm working on adding this work tree

management stuff but I want my tool

calls to render when they're streamed

back in the chat so I'm not going to

create two different PRs for that I just

want to quickly go and fix it can spin

something up and I know that these are

you know this isn't that folder this

isn't this one the chances of them

touching each other are slim and if

there is some overlap usually the models

can see that oh it was recently edited

after I did it and it kind of knows not

to mess with it or or override it again.

Um but yeah, I would say managing this

is really it depends on what you're

trying to do and some of it is just

thinking okay upfront where are the

areas I know I could split it off more

or less or as you get into it what are

other areas that you can go and scope it

into. So I find monor repos is a good

example of where this is useful because

you also have separate builds that you

could run type checking that you could

run that's not necessarily going to mess

with like the single app.

>> Okay. All right. One last question.

Okay, may maybe we can do it.

>> Yeah, just following on Jack's question.

So, does Solo have the ability to really

see the uh front end and seeing what's

actually on the web or is it not?

Because that's one of the big problems

with claude is you can't like you got to

like tell it and you got to input it and

it'll look at if you upload the file,

but

inside of the instance is it to do that

or not?

>> Do you want machine vision?

I wanted to see what the what they

exactly.

>> Okay. Um I think in the current version

of solo it's not there yet but I have

spoken to all of the team and saying

like hey look you got everything already

there just take a screenshot and attach

it as well. Um, so not currently, but I

would say Trey is definitely the one

where this would not be as difficult as

other tools because the integrated

browser, it already gets I mean it gets

your HTML. It has everything except the

screenshot. So it understands terminal,

you know, the console errors. It can see

the markup. It has a selector so I can

send it the X path to the node I'm

talking about. But I'm hoping by the

next release I can convince them to add

a vision support. Uh, what I'd really

love to see in there is like computer

use. Sure.

>> Uh now if you wanted to sideloadad this

on midcene would be really useful

because we offer it's also from bite

dance uh but uh it offers you can do

either uh uh MCP and it will add like

computer use browser use. So it uses

machine vision only to interact and

navigate everything. So that's often

what I'll do inside of Trey as well is

say hey here's a MCP use that to click

navigate and actually use the app. But

it's all vision based. It's not DOM or

you know style.

>> And just a second question is it smart

enough to like make sure the CSS is

organized so that the main frames got

all the positioning and the subframes

are flex you know or not yet?

>> It's this really depends on the model

interpreting visuals.

>> Okay.

>> So this is not even like a a tray thing

or a tool thing. It's is and I found a

lot of models in the US are not as good

at machine vision and I'm not 100% sure

why but it it really does depend on on

who's who the model is. Some of them are

trained better to understand layout

which I think is a big thing. Um, I

would say uh for like a US model

probably the the the good ones are so

there's Quinn the Quinn VLM well that's

not a US model but so Quinn the VLM

model open source model but it's one of

the best that we've used. I would say if

I ranked it, it would be bite danc's

duba uh dow I can't pronounce duba our

seed model our seed VLM model is like

the best that we've seen in benchmarks

then I would say it's Quinn uh Quinn's

VLM and then below that I would say

Gemini 2.5 Pro. So if you do if you're

looking for a model to use for machine

vision or for something like a mids

scene, one of those three models is

going to give you really good results,

really reliable clicking, interacting

with navs. Um yeah, so

>> thank you so much. Due to the time

limitation, if you get another question

directly reach out to Zach, you know,

after keynote. So let's welcome to Nest

Tree Power user. Okay. Red.

Loading...

Loading video analysis...