How Claude Opus 4.5 DESTROYED Gemini 3 on Launch Day
By IndyDevDan
Summary
## Key takeaways - **Opus 4.5 Crushes Gemini 3**: Opus 4.5 destroyed Gemini 3 on launch day due to its superior agentic coding for engineers. Within one week, Gemini 3 topped the leaderboard but Opus 4.5 made it a no-brainer for software engineering work. [00:40], [06:19] - **Slashed Pricing Revolution**: Anthropic slashed Opus 4.5 pricing by a third from previous Opus 4's $15/$75 per million tokens. Now state-of-the-art performance at ~60 tokens per second with premium pricing for premium compute. [05:10], [05:57] - **Enhanced Agent Delegation**: Opus 4.5 excels at managing teams of sub-agents, enabling well-coordinated multi-agent systems as stated in Anthropic's blog. Primary agent prompts sub-agents who respond back to primary, not directly to you. [01:21], [03:02] - **5 Sub-Agents Parallel Testing**: One Opus 4.5 agent kicked off five Opus 4.5 sub-agents, each operating their own browser for tasks like summarizing releases, screenshotting images, and checking system cards. This deploys more compute at scale against specific problems. [01:40], [01:47] - **One-Shots Full-Stack Apps**: Opus 4.5 one-shotted full-stack applications like live voice notes with 11 Labs Scribe 2.5 transcriptions in agent sandboxes via plan-build-host-test workflow. Agents built, hosted, and browser-tested them autonomously. [11:57], [16:13] - **Master Agents Over Prompts**: The prompt is still the primitive, but the agent is now the compositional unit. Master the agent to master knowledge work: operate single agent, better agent, then more agents via sub-agents and orchestration. [24:50], [25:16]
Topics Covered
- Opus 4.5 masters sub-agent delegation
- Premium pricing delivers engineering value
- Agent sandboxes scale full-stack prototypes
- Master agents to master engineering
Full Transcript
Engineers, the king is back. This model
is like that top tier engineer. When
they walk into the meeting, everyone shuts up. Check this out. Cl
shuts up. Check this out. Cl
Man, have I missed running this command.
Claude code opus.
I'm always looking for new capabilities that were impossible before the models release. Let me show you two unique
release. Let me show you two unique advantages Opus 4.5 can give you and your engineering. One is obvious and
your engineering. One is obvious and it's the reason why Opus 4.5 destroyed Gemini 3 on launch and the other is less obvious and it's the reason why Opus 4.5
is the best model for a gentic coding for engineers. Let's take this claw code
for engineers. Let's take this claw code Opus 4.5 instance and have it delegate a non-trivial task to sub agents slashgeneric
browser test. We're going to paste in a
browser test. We're going to paste in a URL plan file parallel true headed true.
Enthropic has their eyes on a key pillar of great engineering. Enthropic briefly
mentions this inside of their blog. If
we search for sub aents, they explicitly mention Opus 4.5 is also very effective at managing a team of sub aents enabling the construction of well-coordinated
multi- aent systems. This is one of the key pillars of what makes Opus 4.5 so great that most engineers miss. Our Opus
4.5 agent is kicking off five Opus 4.5 sub agents to accomplish work. Now, what
are these agents actually doing? They're
all operating their own browser. They're
all running tasks on the Opus 4.5 release. This one is in the system card
release. This one is in the system card and it's going to download and do some work there. You can see this one is
work there. You can see this one is doing a models overview check. We've
deployed more compute against a specific problem at scale. Every time a new powerful model like this is released, specifically one of the clawed models,
you don't just get the benefits one time. You get it end times if and only
time. You get it end times if and only if you're delegating to multiple agents.
Opus out of the box is better at writing prompts for your agents and your sub agents. This is the first capability
agents. This is the first capability Opus 4.5 unlocks for your engineering work. It's enhanced agent delegation.
work. It's enhanced agent delegation.
This release concretely showcases that you can automate your UI testing. You
can automate entire swaths of work when you spin up multiple agents. And to be clear here, every one of these agents is operating Claude Opus 4.5. So, you've
seen this diagram before if you've been following the channel. This is what it looks like to automate with sub agents.
It can be easy to misinterpret what's happening in this workflow. You might
think that you are prompting your sub agents. You're not. Your sub agents are
agents. You're not. Your sub agents are not responding to you. What's happening
here is you prompt your primary agent.
Your primary agent prompts your sub agents. And then your sub agents, they
agents. And then your sub agents, they don't respond to you. They respond back to your primary agent. And your primary agent responds back to you. We covered
this a while back in our first sub agents video, but this is critical to understand because with every great model release, you can see we're summarizing the workflow. With every one of these releases, Anthropic is dialed
into this signal. To be super clear here, what am I saying? I'm saying
they're training Opus 4.5 to be a better prompt engineer, right? Just like you or I prompting our agents, right? Prompting
one to N agents. First, you want better agents, then you want more agents.
They're training Opus to prompt sub agents. And to be super clear here, what
agents. And to be super clear here, what does that mean? They're calling the task tool and they're writing a prompt to the task tool. That means if you can prompt
task tool. That means if you can prompt a sub agent, you can prompt any agent.
Okay, this is what makes Obus 4.5 in the clawed series distinct. They're
specializing their models to be engineering models, right? Where you
delegate and you hand off work to other units of compute. This is super super critical for scaling your compute to scale your impact. All right, so what
happened there? Here's our five agent
happened there? Here's our five agent summary. You can roughly look at tool
summary. You can roughly look at tool uses and token usage as value generated from your agents. Of course, they have to be doing useful work. We have a summarized document. And then you can
summarized document. And then you can see here everything that we asked them to do. Superior agentic capabilities for
to do. Superior agentic capabilities for autonomous task. We love to hear that.
autonomous task. We love to hear that.
Multi- aent orchestration for managing teams of agents, for managing sub aent teams. If your agent can prompt sub agents, it can prompt any agent. Agents
will be calling agents in the future.
This is a big theme we focus on on the channel. Make sure you're subscribed so
channel. Make sure you're subscribed so you don't miss what we do with this incredible pattern. You know, we asked
incredible pattern. You know, we asked for a nice model recommendation. This is
great. We can dial into these results as much as we want. Before we do that, let's talk about one of the most important pieces of this release, the pricing.
So, the pricing looks like this.
Remember that previous Opus 4.1 abysmal pricing $15 per in the output was $75 per million output token. They've
slashed that all by a third. This is now a state-of-the-art model with state-of-the-art pricing. Okay, premium
state-of-the-art pricing. Okay, premium pricing for premium compute. I like this positioning from Enthropic. A lot of engineers want everything to be free.
That is simply not reality. Valuable
things are by nature not free. If
something is free and it is valuable, someone put a lot of work into making it that way for you or you are the product.
Okay, this is common knowledge. Keep
this close to heart as you see these model prices drop and drop and drop. You
see this on the free plans, all that data is being collected by the generative AI company, right? Open
router reports Opus at about 60 tokens per second. Somehow it's both very
per second. Somehow it's both very cheap, very fast, and state-of-the-art for engineering work. Um, that's
absolutely incredible. I have no idea how they broke that barrier. I assume
there's some model optimizations happening here. Maybe they have some
happening here. Maybe they have some chip optimizations happening with their collaborations with Google and AWS, Amazon. Who knows? It's really wild that
Amazon. Who knows? It's really wild that within just one week, uh, Gemini 3 really was at the top of the leaderboard. I was using the Gemini CLI
leaderboard. I was using the Gemini CLI a lot more and then Opus 4.5 dropped and now it's a nobrainer. You can't compete with this model. Specifically, I want to make this super clear for software
engineering work. Anthropic is putting
engineering work. Anthropic is putting their foot down and they're saying the most important thing for these models is to be great at engineering and for you and I, the engineer with our boots on the ground every single day looking for
the value, looking for the signal out of these models. This is the best model. It
these models. This is the best model. It
is the perfect model for scaling real engineering work. A nice side effect of
engineering work. A nice side effect of that is that it is great for product development work as well and many other areas. hints, custom agents, and the
areas. hints, custom agents, and the cloud agent SDK. Let's go ahead and dial in and see what our agents did for us.
If this model is so great at delegation, let's see what the results actually look like.
Open results in cursor. Here is our browser UI testing results. Here's that
original prompt. Remember, if I open the terminal, hit up a few times, we had a URL, and then we passed in a prompt, a markdown file, which itself was just a prompt. Let's go and copy this. Open
prompt. Let's go and copy this. Open
this up and you can see exactly what this looks like. And so we have a taskbased list. Summarize the release.
taskbased list. Summarize the release.
Screenshot every image. Get the price.
Get the system card and summarize the orchestration details. I'm always
orchestration details. I'm always looking for orchestration keywords, you know, sub aent keywords, long running, right? Duration. So I had my agent look
right? Duration. So I had my agent look for this stuff. Look for signals inside the system card of the Opus 4.5 release.
And then I wanted to understand how to use every one of these models because remember all of these models can be useful at specific points and times for different problems. Right? If Haiku can
solve the problem, why would you use Sonnet or Opus? It's much faster. It's
much cheaper for you. Of course, if you're using the Max or Pro plan and you don't care about waiting a little bit, just go ahead and use whatever model you want, right? But if you're deploying
want, right? But if you're deploying these agents into real custom agents into your applications into your products like you should be, it's not just about agentic engineering. It's
about deploying agents against real problems that you and your users have.
Right? So the way I like to think about this is the model stack. Fast, cheap.
You have your workhorse and then you have your powerful model. Right? Now it
looks like Opus is going to be both the workhorse and the powerful model, right?
But let's go ahead and see what Opus has said with these results. So summarize
the release that was summarized right here. We had screenshots of every single
here. We had screenshots of every single image on the page, right? Browser test
plans and you can see all the images. If
we just click through these, you know, we can see here was that Ader polyglot performance shout out Ader, the original AI coding tool. I don't know if you know this, but Cloud Code is actually
inspired by ADER benchmarks. A couple
additional benchmark items here. And
then we just have a bunch of, you know, SVGs that they have on the site. So, you
know, one of our agents, just one of our agents went through the site, downloaded all the images. It looked at all the images, and then it named the files.
Very powerful workflow. What are the advantages we're getting out of this new Opus 4.5 model? We can delegate a lot of work, detailed work to our sub agents.
In combination with that, every single agent, sub agent or not, they can do longer running, harder tasks. Those are
the big two takeaway items from this release. And we're going to push that
release. And we're going to push that capability even further in just a second here. Right? So we got the pricing
here. Right? So we got the pricing information here. We have the system
information here. We have the system card and we want to summarize orchestration details. So we have the
orchestration details. So we have the PDF here. Obviously this is not useful
PDF here. Obviously this is not useful to read here but it used PDF to text and it extracted the text and then from there it searched for specific keywords and it reported on these details. Then
we had another sub agent look up all the prices for the current generation claw models. So, all the 4.5 models,
models. So, all the 4.5 models, summarize the pricing, and then tell us how to use each. Of course, we know that Haiku is for speed and cost. Sonnet.
Now, it's actually not so clear. This is
the model's recommended default, but I don't think this is true anymore. I
think by default, we want to use that powerful workhorse. It's got great
powerful workhorse. It's got great speed. It's got everything you really
speed. It's got everything you really want. I'm using Cloud Opus 4.5 24/7.
want. I'm using Cloud Opus 4.5 24/7.
Now, this all happened with a browser automation task where we just dialed in what we want to have happen. You can
redeploy prompts like this over and over. If you have specific browser
over. If you have specific browser automation workflows you want to run, this is running inside of our agent sandbox skill codebase. All right, I've made some enhancements to this. I want
to show you the kind of end results of putting together the right tooling with your agents, right? The right skills, the right tools, right? We have an entire browser suite tool that agents
are using to operate. We're not blowing away our context window, right? We have
better agents operating our agentic coding jobs. Browser automation is
coding jobs. Browser automation is powerful, but the real value here is a gentic browser testing at scale. You can
have Opus 4.5 accomplish nasty, nasty engineering work on your behalf at scale over and over and over. Opus 4.5 running in cloud code offers you top tier multi-
aent on the-fly testing. What does that mean and why is it important?
Testing is important in general because it increases your review velocity. One
of the two primary constraints of agentic coding. There's planning and
agentic coding. There's planning and there's reviewing. If you're engineering
there's reviewing. If you're engineering properly with agents, you're probably stuck spending most of your time in one of these two locations. Planning and
reviewing. Now, in last week's video, we gave Gemini 3 Cloud Code and CodeCli their own computers for a total of 15 agent sandboxes. As you can see here in
agent sandboxes. As you can see here in my agent sandbox UI, I have five sandboxes running. These of course were
sandboxes running. These of course were all built out by Opus. In fact, Opus oneshotted these applications. What are
these? How is Opus able to oneshot these? And how much work can you really
these? And how much work can you really do with these models? The answer is it's a ton. Okay, let's run another prompt
a ton. Okay, let's run another prompt here. So, I'm going to open up the
here. So, I'm going to open up the terminal here.
Then fork terminal claw. code opus with summary. Use the
claw. code opus with summary. Use the
agent sandbox skill, then list sandboxes, then open every public URL in Chrome. Okay, so I'm going to fire this off. We haven't talked
about forking agents on the channel yet.
Make sure you like and comment so the YouTube algorithm doesn't miss you next week when we release that video. Forking
your agents and spinning off work is ultra important when you're engineering on multiple different problems day after day. Notice how intricate this prompt
day. Notice how intricate this prompt is. You can see our forked terminal
is. You can see our forked terminal right there. It's immediately running
right there. It's immediately running Cloud Code Opus in a brand new window.
And here's all of our sandboxes. We'll
hold that off for one second here. CD
back to root. Then fork terminal. Grab
some instructions for that skill. And
then we have a prompt that we're passing in. All right. So this agent understands
in. All right. So this agent understands that as of here, this is the prompt that we're passing to another agent. Okay.
Again, like multi-agent orchestration is ultra powerful. All right. And just to
ultra powerful. All right. And just to showcase like really what this looks like, we understand that in our first prompt, we ran this workflow, right?
User prompt, primary agent spun up five browser agents which then, you know, responded back to the primary agent which then responded back to us. But you
can push this a lot further, right? This
isn't what real engineering work looks like. This is one concise example of
like. This is one concise example of doing some engineering work with a single prompt. Really, we want something
single prompt. Really, we want something powerful like this, right? Automate your
UI testing. You want to spin up a plan.
You want to then have your agents build.
You then want to host that application, okay? And then you want to have your
okay? And then you want to have your agents test. Then those agents respond
agents test. Then those agents respond back to your host prompt your agent that ran this. And then they respond to you.
ran this. And then they respond to you.
And there's actually a piece missing here where if there's something wrong that our browser agents report on, the host will actually run back to the build step or back to a debug or resolver
step. Right? And so this is what I've
step. Right? And so this is what I've done with these agent sandboxes. This is
how I created these agent sandboxes, right? So, this is all running in the
right? So, this is all running in the agent skill codebase, which I'm going to have available to you for free, link in the description. And you can see here
the description. And you can see here what's happening, right? It has tools to help it operate sandboxes. This agent
listed all the sandboxes, right, with this tool here and then opened all of them in Chrome. And now we have these links. So, what do we have here? Why are
links. So, what do we have here? Why are
agent sandboxes so powerful? And why is it another avenue, another technique you can use for scaling your compute to scale your impact? Uh check this out, right? We have several applications
right? We have several applications here. Every single one of these
here. Every single one of these applications was built out by this agentic workflow, right? This AI
developer workflow. Plan, build, host, browser, test, respond back to host. If
we need to make improvements, do it.
Otherwise, respond back to me. These are
all running in their own dedicated agent environments, right? I'm using E2B as my
environments, right? I'm using E2B as my sandbox host. I like this tool. I'm a
sandbox host. I like this tool. I'm a
big fan so far. But you can see here, right? We have multiple sandboxes that
right? We have multiple sandboxes that our agents are operating. Last week we were looking at Gemini 3. It did a decent job of spinning these up. Opus
4.5 is completing this longunning agent workflow these longunning jobs to build entire full stack applications. So you
know just to break one down right we have a voice notes application. Let me
just show you what this does. If I hit record allow this time and you know we're just talking into our computer here and live we're getting transcriptions. Okay these are live
transcriptions. Okay these are live transcriptions. This is running 11 Labs
transcriptions. This is running 11 Labs brand new. I think it's called Scribe
brand new. I think it's called Scribe 2.5. Big shout out to 11 Labs. Huge fan
2.5. Big shout out to 11 Labs. Huge fan
of that company. Huge fan of the products they're putting out. Um, a lot of untapped potential there. We'll be
talking about 11 Labs on the channel in the future, but um, I'm going to stop talking now so that this can uh, lock in. You know, that transcription came in
in. You know, that transcription came in all the way and it's here live. Right.
So, we have this live voice notes application. At any point, we can stop
application. At any point, we can stop and the delta will lock in. There we go.
And then we can, of course, stop. This
creates a brand new voice note. If we go back to notes and here's the important part. If we hit refresh, right, we can
part. If we hit refresh, right, we can see that all the data is still there.
This is hosted inside of one agent sandbox. I had Opus 4.5 create this
sandbox. I had Opus 4.5 create this entire sandbox from scratch. Okay. And I
can show you the exact prompt I used to do this. It's a complex full stack
do this. It's a complex full stack prompt. Right here it is running inside
prompt. Right here it is running inside the skill. There are a lot of steps to
the skill. There are a lot of steps to it, but you should be pushing your agents further and farther beyond because with every model release, you can do more. And the only way you can
find out what you can really do is by pushing your agents further. I have a single prompt now. Let me just make this super clear. Let me try to set the stage
super clear. Let me try to set the stage for you. Right? This is how engineering
for you. Right? This is how engineering is evolving. I have a single prompt that
is evolving. I have a single prompt that I can use to build full stack applications to quickly prototype or change new and existing applications.
Okay, we're talking green field code bases, brand new and brownfield code bases. All right, existing work. You
bases. All right, existing work. You
know, you want your agents to create PRs. You want them to give you feedback
PRs. You want them to give you feedback on things. You want to engineer new work
on things. You want to engineer new work with them. Plan, build, host, test, the
with them. Plan, build, host, test, the whole suite. All right, this is how you
whole suite. All right, this is how you should be thinking about your engineering work. This is one of the
engineering work. This is one of the critical ideas we break down in my take on how to use agents tactical agentic coding. I'll leave a link for that in
coding. I'll leave a link for that in the description. I don't want to pitch
the description. I don't want to pitch that too much here. Check out that landing page. See if you're interested
landing page. See if you're interested and push what you can do further. All
right. Agentic Workflows, specifically the evolution of Agentic Workflows, AI developer workflows where you have multi-step agents doing work, handing off work from agent to agent, is how you
deliver more value as an engineer right now. So, check that out. Link in the
now. So, check that out. Link in the description. Back to this prompt. We are
description. Back to this prompt. We are
doing a four-step workflow, and you can see that in the workflow step, right?
So, we have a great agentic prompt. Once
again, notice the consistency. I show up every week sharing the exact same prompt format. Why is that? It's because when
format. Why is that? It's because when you have a winning formula, you keep using it. Stay consistent. Make it easy
using it. Stay consistent. Make it easy to communicate to you, your team, and your agents. All right, same prompt
your agents. All right, same prompt format. And you'll see that throughout
format. And you'll see that throughout every single prompt I'm writing now. But
if we dial into the workflow, you know, you can see several steps here. We're
activating our agent skill. We're
initializing our sandbox. And then we start to run prompts. All right. So
prompts, running prompts inside of skills. You need to be able to control
skills. You need to be able to control your agentics. All right? It doesn't
your agentics. All right? It doesn't
really matter how you get it done. It
matters that you can do it and that you know the options available to you.
Right? So there's that host, there's the uh test workflow and this is what's doing all that browser testing work for us. Right? Check this out. Step seven,
us. Right? Check this out. Step seven,
browser UI testing. Again, I want to be super clear. These are full stack
super clear. These are full stack applications running in their own agent sandbox that claw code running Opus 4.5 built out end to end. Okay, oneshotted
applications, right? Scale your compute to scale your impact. These are the two massive advantages that clawed code running clawed opus 4.5 can offer your
engineering delegation and longunning engineering tasks. You can do much more
engineering tasks. You can do much more than you think with these models. And
you know to be super clear I'm not sponsored. I don't receive any you know
sponsored. I don't receive any you know funds or anything from these companies.
I focus on the best tool for the job of engineering. And right now very very
engineering. And right now very very clearly to me the best tool for engineering is an agent that lets you delegate and run longunning engineering tasks accurately. Right? So that's how
tasks accurately. Right? So that's how we got these agent sandboxes. You know
kind of showcase some of this functionality here. We have a simple
functionality here. We have a simple kind of graphing tool. You know we can chart things out. We can change the color. We can add a data point here and
color. We can add a data point here and that's going to update right. We can
change the format here. Bar pi line. We
can download PNG. We can update the titles blah blah blah XYZ. You get the point here, right? It's a graphing tool.
Then we have a more challenging, more technical, medium level full stack application. I like to prompt and
application. I like to prompt and evaluate these agents in tiers of challenging, you know, difficulty, right? And then we have a design tool
right? And then we have a design tool here. We also have a decision making
here. We also have a decision making tool. So this is my favorite one. And it
tool. So this is my favorite one. And it
looks like we have an extra sandbox here that 1 2 3 4 5 that I think is paused.
So this one just aired out for us. But
check this out. Right. So this is my favorite one. Decision matrix. So if I
favorite one. Decision matrix. So if I click this one, this is one that I was playing with before. best agent decent coding tool. We can add a couple options
coding tool. We can add a couple options and you can basically use this tool to help you make decisions, right? So you
have options that you're weighing that you want to decide between and you can add criteria to help you determine which one you should pick, right? And so I just use this as as a random example.
Cloud code versus Gemini CLI, simplicity, features, model selection, you know, we can add other things. I
don't know what you would want to add here, cost. And so this is effectively a
here, cost. And so this is effectively a winning tool for helping you compare things. And again, this is all just to
things. And again, this is all just to showcase the point that you can do a lot with these powerful models if you have the right tooling. If you understand that you should be pushing these models
really hard, really far. All right, so great, we have these tools. How was I able to increase the capabilities of Claude Opus 4.5? I just deployed more of it. Okay, I deployed more of it in a
it. Okay, I deployed more of it in a different way. So instead of just
different way. So instead of just running plan build test with your minimalistic pi testing just test, you can actually just have several agents boot up a browser and really test your
application, right? Full stack, just
application, right? Full stack, just look at the whole thing. All right, and that's what we've done here. And this
greatly increases the chance that your agent delivers a working version to you.
All right, and let me show you exactly what that looks like. Here we have this prompt that I ran in the beginning, right? Our generic browser test. I have
right? Our generic browser test. I have
n user story workflows. So you can imagine this is your user clicking through your application. You can see we have several just simple steps easy to verify. You want verifiable workflows to
verify. You want verifiable workflows to create closed loop prompts. So we'll
take this prompt and we'll run it against our workflow. Right? So we'll
type generic browser test. There is our autocompletes coming in there. And let's
go ahead and have this run against just copy this URL. Paste. True. True. Fire
it off. Claude opus 4.5 is going to spin up somewhere between four and eight sub aents and claude opus 4.5 is spinning up claude opus 4.5 agents to do this job
and you know I've detailed that here the model inside this prompt is going to run opus agents so there we go you can see it is understanding the workflow I need to execute each workflow in a separate
sub agent there we go our agent is queuing up these tasks and we can see this prompt here again same great agentic prompt format there the instructions. Here's the workflow. This
instructions. Here's the workflow. This
is the piece that matters. And you know, you can see here step by step. We're
just instructing our agent to do these things. Let me try to collapse here to a
things. Let me try to collapse here to a decent level. There we go. Setup.
decent level. There we go. Setup.
Determine execution mode. We have a sequential mode and a there we go. Bunch
of windows here. Let me try to minimize this so you can kind of see what's happening here. All right. There's this.
happening here. All right. There's this.
There's this one. There's this one. And
it is challenging to display all these on screen, but I'll do my best here.
I've lost track of how to set these up properly. Uh but you get the point,
properly. Uh but you get the point, right? We have agents operating on the
right? We have agents operating on the browser. I'm not doing this work, right?
browser. I'm not doing this work, right?
An agent is is running this browser in headed mode. It's actually, you know,
headed mode. It's actually, you know, making changes to the site. And you you can see this throughout every one of these windows, right? An agent is running a specific user story,
validating specific work. And this is valuable not because you know classic playwright just whatever testing framework you want to use inside of your code that's all great that is the old
way of doing things right deterministic code that's great we still want that right that's part of why ADWs are so powerful but we also want these workflows where your agent can or in
agent can plan another agent can build and then another agent can actually run tests against the work that was spun up having that dynamic natural natural
language interface that your agent can quickly operate on is very important.
All right. And so you can see the agent is doing stuff to these UIs. Again, my
hands are off. This is what it really looks like to deploy agents to do engineering work for you. All right.
These are testing this full stack application. Okay. So very powerful
application. Okay. So very powerful stuff here. I just want to emphasize the
stuff here. I just want to emphasize the point. Engineering is changing. Okay.
point. Engineering is changing. Okay.
And the name of the game is about not what you can do. It's about what you can teach your agents to do, right? It's
about how you can direct your agents.
It's about how you can communicate to your agents. And then you need to be
your agents. And then you need to be able to chain that work together throughout long chains of engineering work, right? Via agents. Okay, agents is
work, right? Via agents. Okay, agents is the new compositional unit that you and I need to pay attention to. All right.
Last year uh 2 years ago maybe even 3 years ago now my top quote that I would share with engineers is the prompt is the fundamental unit of knowledge work and programming. So if you master the
and programming. So if you master the prompt you master knowledge work. Now
that is still true but something has changed right we have a new unit. It's
not a primitive anymore. The prompt is still the primitive but we have a compositional unit that is exponentially more valuable. It is the agent right?
more valuable. It is the agent right?
The agent architecture wrapping the model. Now the the quote that I would
model. Now the the quote that I would share, you know, the the idea, the core behind everything is um let's go ahead and open up our agent here. You know,
the core behind everything is now if you master the agent, you master knowledge work. If you master the agent, you
work. If you master the agent, you master engineering. Okay, it's not just
master engineering. Okay, it's not just about the prompt anymore. Of course,
prompt engineering, context engineering, the core four is critical for knowing how to operate your agents, but now we've moved up. Okay, it's not just about the prompt. It's about the compositional unit, right? It's about
the agent and the agent architecture and what you can do with agentic systems. A clear framing you can use here is first you want to learn how to operate a single agent. Then you want to learn how
single agent. Then you want to learn how to operate a better agent, right? And
you do this by learning how to prompt engineer and context engineer. And then
you want to learn how to operate more agents. Okay? Scale them up like we're
agents. Okay? Scale them up like we're doing here. Sub agents and then prompt
doing here. Sub agents and then prompt other agents eventually. All right? Then
you have custom agents, right? Embed
your agents into your applications.
embed them into your personal engineering workflows. Scale your
engineering workflows. Scale your compute to scale your impact. Lastly, we
have the orchestration level which really helps you manage every previous level. And the first version of this
level. And the first version of this that you experience is likely with cloud code sub aents, but you can go far beyond that. But the ideas here are all
beyond that. But the ideas here are all the same. You must master the agent and
the same. You must master the agent and you must look for tooling that gives you the best agent and the best model. And
it is very clear opus 4.5 is that model for engineers. Now when you couple that
for engineers. Now when you couple that with powerful technology like agent sandboxes, you give your agents their own isolated devices so that they unlock
three things for you. Isolation, scale,
and autonomy. I've been doing some wacky crazy things with these agent sandboxes with these powerful language models inside of the right agents only because I have the right environment for my agents. All right, so there are many
agents. All right, so there are many pieces here starting to come together.
We're going to be talking about many of these ideas on the channel as we move forward. But let's recenter around our
forward. But let's recenter around our uh test here. So, you know, I'll refresh this and notice we have four four charts. And you can see here that a
charts. And you can see here that a fifth chart has been created by one of our agents doing some test, right? It's
got a test chart here, right? So, agents
are operating on this. They're testing.
They're validating. And if we close this, we can see a nice summary, full complete file here. We can go ahead and open this up. All completed. All Opus
agents. We have once again scaled our compute to scale our impact. So this is what's happening when I'm building you know these agent sandboxes spinning up agents to accomplish engineer work set up prototypes push the agents understand
the capabilities plan build host browser testing step here with a bunch of browser agents all right so very powerful stuff you know when you do stuff like this and you do it properly you are conducting um multi- aent
orchestration all right so your engineering you have an orchestrator agent managing agents that then execute work for you okay this is a big trend orchestration is that final tier you
should be, you know, playing with it with sub agents and then delegating work to other agents as we move forward, right? And another example of
right? And another example of orchestration that we ran here is we had that fork terminal call. And this is probably going to be a topic for next week's video. I'll showcase exactly how
week's video. I'll showcase exactly how I built this that fork terminal skill and really just build a skill from scratch. Understanding how to build the
scratch. Understanding how to build the right agentics to help you engineer with your agents is a critical critical task.
So, next week we'll dive into building a skill from scratch. We covered a lot. We
jumped around a little bit. All the
ideas point to a few things. We're
coming up against the end of the year.
We're coming up against the end of 2025.
What does the release of Gemini 3 and Cloud Opus 4.5 really mean for us engineers? What high conviction bets can
engineers? What high conviction bets can we make as engineers in the age of agents with powerful compute at our fingertips? This is going to be the main
fingertips? This is going to be the main topic of our 2026 predictions video coming up on the channel in December.
Make sure you're a part of the journey.
Let the algorithm know you're interested. Like, comment. You know what
interested. Like, comment. You know what to do right away. Opus 4.5 makes three things crystal clear. You can hand off more work to powerful agents like Clog Code than you think you can. You need to
be pushing your prompts harder, push your skills harder, push your agents harder, further, longer. Understand what
you can really do. All right. Premium
compute is absolutely worth the price.
It's now more affordable than ever to run Claude Obus 4.5. You have to always consider the time that you're getting back from using this model. Keep in
mind, you know, if Cloud Opus does the job in five tool calls and it takes Cloud Sonnet 10 or Gemini 3 10 tool calls, you have still saved money. It's
done it in half the time, right? This is
why companies hire great engineers because they get the job done not only faster. Sometimes the job is only
faster. Sometimes the job is only possible by highly intelligent, by highly skilled, by highly experienced builders and agents and models. So there
are new capabilities unlocked by this model. And you will only find out if you
model. And you will only find out if you are pushing it if you're writing bigger and better prompts. All right. And as we mentioned in our previous Gemini 3 agent sandbox video, I'll link that in the description. Definitely check that one
description. Definitely check that one out as well. It's not just about model intelligence anymore. All right. It's
intelligence anymore. All right. It's
about the agent harness. So you know, cloud code. What are you putting the
cloud code. What are you putting the model in? And it's about the agentic
model in? And it's about the agentic tooling that you give your agents to operate. And so we're talking about
operate. And so we're talking about everything from sub agents, other primary agents, agent sandboxes, your AI tooling in general, specifically your agentic tooling. What unique
agentic tooling. What unique capabilities, what advantages are you giving your agents that allow you to scale your compute, to scale your impact? Orchestrating many agents to
impact? Orchestrating many agents to accomplish work is a massive theme for us here on the channel. One agent is not enough. All right, I'll leave a link to
enough. All right, I'll leave a link to my agent sandbox codebase. If you
understand that valuable things are not always free and if you want to accelerate your agentic coding, check out tactical agentic coding. This is my take on how you can scale far beyond AI
coding and vibe coding with agentic engineering. So powerful that your
engineering. So powerful that your codebase runs itself. The big theme that we focus on here is you want to build the system that builds the system,
right? Build the agents that run your
right? Build the agents that run your application. Don't build the application
application. Don't build the application yourself anymore. You have agents for
yourself anymore. You have agents for that. Focus on the agentic system. All
that. Focus on the agentic system. All
right, link in the description for this.
This is my handcrafted course I built from scratch by using the technology.
This is for mid to senior level engineers. If you're a new engineer or
engineers. If you're a new engineer or if you're a vibe coder, this is not for you. You have to be a cracked vibe coder
you. You have to be a cracked vibe coder to understand what's happening here. So
again, this is not for everyone. Check
this out if you're interested. Link in
the description. Powerful compute is here. Once again, the question for you
here. Once again, the question for you and I. What can we do with this
and I. What can we do with this technology that we couldn't do before?
And the answer is very clear. We can
delegate more and better than ever. And
we can run longer, more challenging, more complex workflow with powerful models like Gemini 3 and especially the new Claude Opus 4.5 running in Claude
Code. You know where to find me every
Code. You know where to find me every single Monday. Stay focused and keep
single Monday. Stay focused and keep building.
Loading video analysis...