AI paradox: faster coding, slower shipping | by Addy Osmani

By JavaScript Conferences by GitNation

Summary

## Key takeaways - **Prompt Engineering is Dead**: Prompt engineering is dead; enter context engineering, which equips AI agents with the right files, examples, history, tools, and constraints so output quality is proportional to context relevance. [07:26], [08:12] - **PR Review Times Exploded 91%**: Google’s data shows pull request review times increased by up to 91% and PR sizes exploded by 154% because AI sped up code generation but human verification did not scale. [16:32], [16:53] - **The 70% Problem**: AI gets you 70% of the way to solving a problem like generating an MVP, but the last 30%—edge cases, architecture, tests—is the hardest and often slower for seniors than writing from scratch. [19:37], [20:10] - **46% Distrust AI Code**: Favorable views of AI for coding dropped from 70% to 60%, with 46% of developers actively distrusting AI accuracy and 30% having little to no trust in AI-generated code. [14:08], [14:19] - **Trio Programming Beats Skill Erosion**: Evolve pair programming into trio programming with senior, junior, and AI to keep quality high and avoid isolation; use no-AI challenges to sharpen critical thinking and build resilience. [22:28], [22:47]

Topics Covered

Embrace AI Native Engineering
Context Engineering Beats Prompts
AI Productivity Paradox
Solve the 70% Problem
Evolve Reviews into Mentorship

Full Transcript

So we are going through a pretty brief change in software engineering at the moment. Uh AI and vibe coding are

moment. Uh AI and vibe coding are fundamentally reshaping our work and there is a lot of hype to sift through.

So today I'm going to share some battle tested insights from the trenches um for making that transition successfully with some fun learnings along the way. But um

before we get started, I do have something I have to share. Uh we've all been replaced by AI. We can finally retire to the woods before they're cut down for more data centers. As you can

guess, this hasn't really happened just yet, but the future could look uh a little bit different.

I've been coding with AI for a few years, and I think the direction we're going in is sort of an AI native engineer. Instead of thinking, you know,

engineer. Instead of thinking, you know, AI might replace me, an AI native engineer might ask for every task. Could

AI help me do this faster, better, or differently in some way? And I think that the future really belongs to engineers. So you can embrace that

engineers. So you can embrace that collaborative mindset. Now, if you zoom

collaborative mindset. Now, if you zoom all the way out, uh this chart is really the road map for the next few years. Um

on one side you have the classic developer journeys where most of our value came from purely human effort and then we hit this AI augmented phase where many of us are in right now and as

we continue shifting we go from augmented to AI first and eventually AI native developer experiences.

Now AI first journeys ask a different question. They ask that how would this

question. They ask that how would this journey look if AI were assumed from the start and not sprinkled at the end. And

then AI native DX is where the unit of work itself ends up shifting to agentic workflows. Now in the future some say

workflows. Now in the future some say that every engineer is going to evolve from a pure implement to an orchestrator of agents. Maybe we'll all be managers

of agents. Maybe we'll all be managers where human judgment and critical thinking is going to help guide AI towards fixing code towards the right outcomes. Some people are very much

outcomes. Some people are very much already doing this and maybe we're going to move from how do I code this to how do I get the right code built. There are

a lot of solutions already available for this. There's cursors background agent

this. There's cursors background agent which can hand tasks off to the background. Can be used by engineers

background. Can be used by engineers PMs. You can monitor progress on multiple tasks in real time. You can be notified if the agent needs input. Um

there's cloud code for the web. I've

been enjoying orchestrating multiple tasks using jewels from Google Labs. Um,

there's also GitHub's C-pilot agent. I

use this both on desktop and on mobile.

Like if I'm on a hike, I love being able to pull out my phone and just spin up a bunch of tasks on side projects. It's

kind of cool. Uh, there's also Conductor for map, which uh supports multiple agents, but I do have to ground ourselves in reality. It's sort of unsurprising that startups or people

with evergreen code bases are already starting to play with some of these ideas like orchestration. Enterprise is

a little bit different. Um, what I hear when I talk to enterprise companies is a lot of skepticism around the readiness of these patterns for large decade old code bases. And I'm excited there are

code bases. And I'm excited there are folks poking at these problems. I think eventually the tooling will get to a good place. But just to talk about these

good place. But just to talk about these pattern changes a little bit more today, most of us act as conductors. We are

effectively guiding one agent through a single task. And tomorrow we might be

single task. And tomorrow we might be those orchestrators directing a fleet of agents working in harmony towards a shared goal.

If you think of an orchestra, you know, maybe you picture something like this.

Um maybe you picture yourself leading an orchestra of agents and it looks a little bit like this. Reality whoever is very different. Uh things are evolving

very different. Uh things are evolving fast, but best practices for teams and organizations are still in flux. So

we're going to talk just a little bit about that today. Now, multiple

developer surveys, including Dora and Stack Overflow, showed that a majority of us are using AI for coding at work.

And what this data tells us is that it's no longer a novelty or an edge case, but it's rapidly become just a standard part of the software development toolkit. And

what I'd like to say is, you know, there's one thing you need to remember.

AI is really not about writing more code faster. It's about building um better

faster. It's about building um better software at the end of the day. So,

throughout this talk, we're going to explore what better could potentially mean. Um now coding with AI is a little

mean. Um now coding with AI is a little bit of a spectrum. Um vibe coding is fast and exploratory. It prioritizes

speed and experimentation over deep review and engineering rigor. And then

we have AI assisted engineering. It's at

the opposite end of that spectrum. And

this is sort of the methodical integration of AI into a mature development life cycle. So that human engineer us like we stay very much in control and we use AI as a collaborator

to augment um a more structured process.

And the challenge for us as builders is really just skillfony navigating this spectrum.

Just to get on the same page, let's quickly just take a look at developer experience and where AI can help across this life cycle. This is kind of the baseline um loop without any AI interventions. It's useful as a

interventions. It's useful as a reference to compare where automation and intelligence can be added to accelerate velocity. So typically we're

accelerate velocity. So typically we're thinking about things like design, coding, testing, building, deploying. So

this highlights where AI is most useful during the early design phases. So think

about chat bots in your editor for asking questions, iterating um and AI triage tools that can streamline planning and investigation. Then we have the inner development loop where we

spend a lot of our time. So coding,

testing, debugging. And this can help with like just faster code generation, automating tests, catching errors, ideally before they go to production and definitely before you end up on hacker

news. Uh here's the full overview of

news. Uh here's the full overview of where AI can help with the developer experience. This is all of those loops.

experience. This is all of those loops.

Uh design, inner, submit, outer, and it highlights just every point where AI can meaningfully reduce toil. AI coding

tools are also a spectrum. Um my team and I have been using a few of these.

You've got plenty of options these days, whether you are prototyping, non-technical, or you're building for production. We've of course also vibe

production. We've of course also vibe coded a lot. And the real genius of VIP coding is never have to confront the technical debt because you have no idea who or what created it in the first

place.

Um, as your team go through this journey, you're going to run into spec skeptics. You really want to turn these

skeptics. You really want to turn these eventually into people who can be technical evangelists, people who can have a very grounded perspective on what are AI's weaknesses, what are its strengths, and can we have a very

balanced way of how we think about it.

Um, we had a few big learnings on my teams. things like, you know, AI tools don't necessarily replace expertise, but they amplify it. And the more skill that you can bring to this as an engineer,

the more powerful the results you're ultimately going to get.

Um, what about the impact of AI on coding at Google where I work? Well,

across the board, it's already been moving the needle on real productivity.

We're seeing pretty measurable velocity gains and faster coding speeds across multiple studies. Whether that is using

multiple studies. Whether that is using it as an autocomplete for full code gen for documentation for testing for code review um we've generally found that it's able to help us in a lot of ways.

We have many teams successfully using things like Gemini CLI um to build the next generation of applications and we're enjoying it.

Um you might remember probably like 3 to 6 months ago which feels like a very long time ago in AI um when we had prompt engineers. I love this title.

prompt engineers. I love this title.

might have well has been called AI code whisperer. Folks would test, evaluate,

whisperer. Folks would test, evaluate, and iterate on prompts to improve the chances of getting a high quality output. But a smart prompt doesn't

output. But a smart prompt doesn't really guarantee you great outcomes.

Many of us tried prompt engineering. We

tried tweaking words endlessly. But as

this iceberg shows, the real issues are often below the surface with mismanaged context. A limited window means that the

context. A limited window means that the AI can forget key details. Um

unstructured content can lead to confusion. competing information can

confusion. competing information can distract it and overloading it can also overwhelm the model. So context mis management is often the big issue here.

Context is king in this moment. Um so if you haven't come across things like context engineering effectively it means equipping an AI agent with everything that it needs to tackle a task effectively. We're talking about the

effectively. We're talking about the right files, examples, history, tools, constraints and the quality of your output is going to be proportional to the quality and relevance of the context

that you provide. many of the tools that we use, whether you're using cursor or client, they make it much easier these days for us to include files, folders, problems, URLs, more docs, um, into your

workflows to help with that context that we're missing. And increasingly, you

we're missing. And increasingly, you also have other ways like visual ways, multimodality, being able to pull more context that can be useful. We're also

seeing our tools evolve a little bit because again, those context window sizes can be pretty finite. So some

tools, this is Klein, um do smart things like automatic context summarization. So

when your conversation approaches the context window size, it can actually start to summarize previous sessions so you can free up space and continue working with still the important parts, the important details um preserved.

There are also other ideas like a memory bank which is kind of a structured documentation system which allows you to maintain context across multiple sessions. So client, for example, can

sessions. So client, for example, can read memory bank files and rebuild its understanding of your project without you necessarily having to do a lot of manual work there yourself. Uh here's a great template for context engineering

from anthropic. It's got ideas like

from anthropic. It's got ideas like background data, examples, conversation history. This type of thing makes for a

history. This type of thing makes for a really great tweet that you will completely forget. Um I forget this all

completely forget. Um I forget this all the time. I decided to um vibe code this

the time. I decided to um vibe code this into a quick web app uh that you can also check out. I've also completely forgotten to use this. Um, we just generally need better tooling to handle this automatically for us and I'm

looking forward to our tools evolving in this direction too. Um, spec driven development has become more popular.

This is the practice of really just planning before you prompt and it's pro providing that clear context that the AI needs to succeed rather than expecting it to read your mind. Um, there are a few other patterns that drop out of

this. Uh recently um I've seen a few

this. Uh recently um I've seen a few folks using like a learnings markdown file. And this is kind of about creating

file. And this is kind of about creating a powerful self-improving cycle where the agents performance gets better with every single task. So at the end of a task, so imagine that you've been prompting it to maybe write a component

or make some additions to uh an app that you've already got. Um you can ask it to store key insights in a simple text file and that acts as long-term memory. So it

becomes sort of an external brain that just keeps getting better and helps steer the agents performance over time.

Other things that we learned, uh, never commit code that you can't explain to other people, um, or even to yourself.

This is a golden rule for production AI coding. Um, you have to assume that you

coding. Um, you have to assume that you and your team are going to have to manually change things at some point.

Um, and so again, human in the loop, human reviewing the code is going to continue becoming important. I think

that reading the code is going to become an even bigger part of the job in the future. we can expect to be spending

future. we can expect to be spending more time reviewing, testing, explaining than necessarily just typing. So, we're

going to have to continue building over this muscle. Um, over a program's

this muscle. Um, over a program's lifetime, it usually ends up getting written once, maybe edited number of times, but it ends up getting debugged, maintained, and enhanced a number of

more times and gets it read quite a lot over its lifetime.

Um, VIP coding is a gift for prototyping. I think that static

prototyping. I think that static mock-ups often look good, but they're not really interactive and that has limited feedback that you can get from the true user experience. So, I we've really enjoyed using this for

prototyping. You can quickly knock out

prototyping. You can quickly knock out something that has a clearer version of the vision that you're offering to people and it just elevates the discussion. Um, there are some companies

discussion. Um, there are some companies that have really really leaned into this idea like Shopify. Um so Shopify had sort of companywide change management around their use of AI across

engineering, product, UX from the top and um what they now have is design dev handoffs that are a little bit different. Um designers can now just

different. Um designers can now just vibe code prototypes and they can work directly with engineering to land these as artifacts. Now that means that there

as artifacts. Now that means that there is code review, there is iteration, there is work on quality, but we're starting to see sort of blending in some cases of the roles which is very interesting to see.

Um, voice dictation also changes the way that you can vibe code. Um, because you can use your voice much much faster than you can type. Anywhere up to three to five times faster. When I first started

hearing about this idea, I thought this works great if you are working remotely in your own office and terribly if you were in any sort of open office situation. But of course, they now have

situation. But of course, they now have VIP coding by whispering which is also uh available. So you can just whisper

uh available. So you can just whisper your way to your MVPs. it actually works pretty well. Um, AI is a multiplier. So,

pretty well. Um, AI is a multiplier. So,

what helps the human can help the AI.

So, um, invest in tests, CI, clear documentation before you really expect um, very big gains. Um, tests are ultimately a huge safety net in this

moment. They really derisk your use of

moment. They really derisk your use of AI for coding. So, if things start going off the rails for any reason, you're going to know sooner. Um, and if you're, you know, I I talked to a lot of teams

who will say, "This is all great if you have end-to-end tests already. What if

we don't have those tests? Do we just use AI to write them?" And you can, you just, again, you need to make sure that you are keeping humans in the loop to understand like what tests are being

generated. Are they sufficient?

generated. Are they sufficient?

Uh, by the way, this looks like a mother mocking their child for some reason, but at least they're safe.

So the promise with AI was sort of this step change in productivity and what data I love reading white papers. I love

reading research. What the data shows is a little bit more nuanced of a reality.

Um and so let's take a look at some studies and some concrete actions that we can maybe take to lock in gains and and manage risk. So what the latest data shows is that usage is high. It

continues to go up but trust is sliding.

um favorable views about AI for coding dropped from 70 to 60% in the last two years and 46% of people actively distrust AI accuracy. Um this chart

shows that about 30% of the developers surveyed here had little to no trust in AI generated code. And this skepticism is not necessarily a bad thing. It's a

sign of you know this area maturing. It

explains some of the friction we're going to see in some of the other things I'm going to talk about.

But models continue to improve, agents continue to improve. Um, however, we we do start to to learn about um where things can go wrong. So, when things go

wrong with Vibe coding, you have Vibe debugging. Um, where you have no idea

debugging. Um, where you have no idea how the code works or really how to fix it, um, I thought I'd give a shout out.

Um, on my team, we very recently released Chrome DevTools MCP. Um, and

this is to help your AI coding assistant sort of see and interact with a live browser. And that means that your agent

browser. And that means that your agent can see what's rendered. It can look at the console logs. It can look at network logs, automate performance optimization.

It can find issues with any broken layouts. There's a lot of power that

layouts. There's a lot of power that this unlocks. Um, if you're, you know,

this unlocks. Um, if you're, you know, coding with AI, maybe check it out.

Might be useful. But back to the data.

So at the individual level, the lift from AI productivity is real. Um

developers using it, you know, can see anywhere up to 21% more tasks completed.

Um people are emerging on the on the edges anywhere up to 98% more pull requests. We'll talk about pull requests

requests. We'll talk about pull requests more in just a second. Um but a very significant number of developers will say that AI has a positive impact on their personal productivity. Some will

even say that, you know, when we're talking about positive impact that it can feel that the AI writes better code than they do in some cases. Now, when

we're especially talking about, you know, common problems, average problems, I could certainly see that being true.

And there are people um who will say that it is extremely increased their productivity. You know, this this shows

productivity. You know, this this shows maybe it's about 10 to 13%. Um and there will be people who say it's slightly increased their productivity, so closer to 41%. And that last largest group

to 41%. And that last largest group really suggests incremental benefit from AI across the workforce. Now we get to the fun stuff. Here's the paradox. We

sped up code generation, but human verification did not scale with that.

Pull request times have increased in some cases by anywhere up to 91%.

PR sizes are also exploding. If you have reviewed a completely AI generated pull request, maybe you've seen that there are cases where people will be making all kinds of changes that are not needed

to many many more files than are required. So PR sizes have exploded

required. So PR sizes have exploded anywhere up to 154%. This is soaking up a lot of those individual gains that we were just talking about. The system

moves at the speed of the slowest step and today that step is review and validation.

So developers are describing this common pattern. AI suggestions are often

pattern. AI suggestions are often plausible, but they're not always correct. And 66% of people call this

correct. And 66% of people call this their biggest time sync. 45% say that debugging AI output is more work than it is worth. And this connects directly to

is worth. And this connects directly to those longer reviews that I was just talking about. The conclusion is kind of

talking about. The conclusion is kind of simple. Quality gates and validation are

simple. Quality gates and validation are now the critical path. So we're as an industry, we're going to need to start evolving how we think about code review in order to avoid just putting all of

this extra strain on the senior engineers.

So what we found in general is that AI shines in low context situations. If you

are starting fresh, if you are handling predictable patterns, writing generic functions, you know, if you're treating AI as a junior engineer, um, where tasks are maybe well defined and context is

low, it can do pretty well, especially with green field code, especially with boilerplate generating unit tests, generating docs, again, prototypes, MVPs. I think that this can free up

MVPs. I think that this can free up experts to be able to focus time on more design work, integration, tricky edge cases, just building more polished experiences.

Where it struggles right now is high context novel problems. And this is where deep context is really required.

Um, and if you are a team building anything non-trivial, this is going to require strong supervision. So it can't necessarily hold a decade of architectural decisions or informal

business logic in its head without studying your code bases, without studying your docs and how your team have been working over time. In security

sensitive areas, you know, the risk of subtle mistakes continue to rise with this. And anytime that your requirements

this. And anytime that your requirements are ambiguous or logic spans multiple systems, again, humans have to lead and AI should be assisting.

Now, some may say that it doesn't matter what you're vibe coding, no matter how brittle it might be, you can always fix it later. But for those of us that

it later. But for those of us that aren't comfortable um yoloing it, uh there's a few things that you can do.

Our first priority is really mitigating skill erosion. Um we have to use AI to

skill erosion. Um we have to use AI to accelerate learning in our teams, but not necessarily replace it. And this

requires rethinking our core teaching and mentoring processes to ensure that engineers at all levels are developing and maintaining fundamental problem

solving capabilities.

This leads to something that I call the 70% problem. Um AI can get you about 70%

70% problem. Um AI can get you about 70% of the way towards solving a problem whether it's generating, you know, a full application, generating an MVP. Um,

but that last 30%, that final mile, um, is the hardest part. Uh, your assistant might help you with scaffolding out a feature, but production readiness means that those edge cases, those

architecture, um, issues, those tests, those cleanups, all of those things require work from us. Um, for juniors, like for if you're somebody that's new

to tech, if you're a junior, that 70% is still very magical. It can feel very magical, but for seniors, that last 30% is often slower than writing the code from scratch yourself.

So AI can help us close the demo gap quickly, but shipping to production, I think, still belongs to humans.

Sometimes that last 30% is something that the seniors see and the juniors don't just because they don't know what to necessarily look for. Um, in other words, the juniors can spend can send

that 70% off as a pull request as if they were done. Um, and it then means that the actual hard part can end up falling into the code review. Um,

largely being done by seniors cleaning up mistakes that no human would have necessarily made. sometimes humans would

necessarily made. sometimes humans would make them but um it is meaning that again we have to rethink how we're approaching code reviews.

Um we don't like talking about AI's impact on the job market but it I think it it is going to impact us in some way.

Um 54% of leaders expect AI to reduce junior hiring in some way over the next couple of years. It is commoditizing the easy parts forcing us to you know think

about how the value chain is evolving to more meaningful work. This could mean potentially um fewer entrylevel roles or that junior roles evolve to focus on things that AI couldn't do quite as

well. And if you're a tech lead, a

well. And if you're a tech lead, a manager, a leader, um I think we're going to have to continue thinking about how we develop talent to make sure that we don't inadvertedly dry up our talent pipeline because we are going to need

those juniors to become seniors at some point. I still think we're going to need

point. I still think we're going to need a lot of senior engineers. Um so one of the first things we can do is try to evolve code reviews into more learning moments. Um, I've heard of senior

moments. Um, I've heard of senior engineers sending back pull requests where it's clear that AI was used, but the person didn't really understand what they were doing. And when a junior

submits an AI generated PR, the review kind of becomes one of your primary venues for mentorship. You can ask them Socratic questions that force them to explain the AI's output. You can help

ensure a level of understanding and not just functionality. So those reviews

just functionality. So those reviews become more about comprehension rather than just correctness.

I've suggest that maybe we evolve pair programming into trio programming. It

keeps the senior expert sort of in the loop, but it ensures and assumes that you you know the junior is going to be using AI in some way to leverage speed.

So everybody's learning quality ideally stays high and we avoid sort of isolation problems that pure AI system work can create. It's still

collaborative human AI teamwork. Um I've

been suggesting in my teams that we try out regular no AI challenges. This is

another thing you can do to ensure critical thinking skills don't evaporate. Um, so you can help keep your

evaporate. Um, so you can help keep your team's skills sharp by just, you know, designating a day of the week or even a certain type of task on your backlog where you just say, well, hey, maybe we don't use AI for that. And it just

reinforces that it is an assistant. It

is a tool, but it is not a replacement for thinking. And this is essential for

for thinking. And this is essential for building resilience when AI fails or is not necessarily available.

AI often produces code that passes basic tests but may lack robustness. Its

baseline is usually well it works I guess it kind of renders in the browser I guess. But again those code reviews

I guess. But again those code reviews are important for checking edge cases performance security correctness whether it is still matching your team's best practices and patterns. And our human

role is really about ensuring production readiness and standards. Um, speaking of standards, this is what safety standards looked like back in the 80s. Um, no

belts were fine. We just had vibes, which is not completely different to what we have today. Um, so I covered a lot today. Uh, if that's how you're

lot today. Uh, if that's how you're feeling, you are absolutely right. Uh,

the important thing about all of this is just stay proactive in helping yourself and your teams along this journey.

Our job is to strike the right balance.

At the end of the day, it is to harness AI speed and capabilities to deliver value faster, but to keep the human element in charge. I think that our creativity, our judgment, and our

understanding of user needs is going to really guide how AI is used. So, I hope that you found something in this talk helpful. Thank you.

helpful. Thank you.

[applause]

Loading...

Loading video analysis...