Why AI-Generated Code Is Quietly Breaking Your Delivery (And What to Do Now)

41% of the code your team shipped last quarter was written by a machine. If your team hasn't hit that number yet, you're headed there soon. But here's what I find staggering as I go around and help organizations: rarely is anyone in the leadership meeting asking the most important question about that code. Can we maintain it?

I'm Lance Dacy, otherwise known as Big Agile, and I'm an enterprise agile coach. I work with product and engineering leaders who are trying to move fast. And moving fast without breaking is really the important part of that.

Today I want to talk about a problem that is quietly building beneath most organizations right now. It has nothing to do with whether AI works. AI works. That's not the debate. The real question is what happens six months from now when the code that AI generated starts to rot, and you've perhaps already let go of the people who would have caught it. That's the tragedy.

The Scenario I Keep Seeing

I was at an engineering offsite last week, and the big question in the room was, "What AI tools are we going to adopt?" Great question. But the processes around them are what concern me, because I am an AI fan. I use it all the time. It's the governance I want to go deeper on.

Here's the scenario I keep seeing play out. A VP of engineering walks into the quarterly business review. The slides look great. Deployment frequency is up, cycle time is down, features are shipping faster than ever. The CEO is thrilled. The board is thrilled. Everyone is thrilled. Those might even be the right numbers to track.

Three months later, that same VP is in a different meeting. Production incidents are spiking. A critical security vulnerability just showed up in a customer-facing service. The team is spending 60% of their sprint capacity fixing things that broke, and nobody can really explain exactly when it started to go sideways.

What happened? The team adopted AI coding tools and they worked. Code was generated faster. Pull requests flew through the system. Output metrics looked incredible. But here is the hard truth: the speed of code generation is not the same as the speed of value delivery. The team was producing more code, but the code lacked the architectural judgment that experienced humans bring. It looked clean. It passed the linter. The tests were green. And nobody caught the decay until it compounded into a crisis.

Let me walk you through three layers of this problem, because I don't think it's theoretical. It's measured.

Layer One: The Code Quality Gap

I'm a data guy at heart, so let's look at some data.

CodeRabbit published a study in December 2025 analyzing 470 real-world pull requests on GitHub. 320 were AI co-authored, 150 were human-only. The findings were stark. AI-generated pull requests contained 1.7 times more issues overall than human-written code. Critical defects were 1.4 times higher. Major defects were 1.7 times higher. And when you drill into security specifically, AI-generated code was 2.74 times more likely to introduce cross-site scripting vulnerabilities.

Before you say "it's just one study," Veracode confirmed the pattern in a separate study. They tested over 100 large language models across 80 coding tasks. They found that 45% of AI-generated code contained some kind of security vulnerability. Java was the worst, with over a 70% failure rate.

Here is the part that should concern all of us. Even as models improved functionally over 2025, security pass rates have stayed flat. The models are getting better at writing code that runs. They are not getting better at writing safe code. That will probably come. I just don't think it's there right now.

Layer Two: The Review Capacity Gap

Somebody is going to say, "Sure, but my team reviews all of the AI-generated code." Let's talk about that, because the same problem exists even in human-generated code.

Think about how code reviews looked just two years ago. A developer submits a pull request, maybe 50 lines, maybe 200. A senior engineer reviews it. They know the code base. They know the conventions. They know the architecture. They catch the edge cases. That system worked because the volume was manageable and the reviewer had context as a human.

Now picture today. AI tools can generate 500 lines of code in seconds. The code looks clean. Formatting is consistent. Comments are thorough. But the comments tend to restate the code more than explain the intent, which is the heart of the human side. Error handling blocks get copied and pasted. Logic errors get hidden behind the veneer of polish.

Stanford researchers identified what they call the false confidence effect. Developers using AI assistants actually write less secure code while simultaneously believing their code is more secure. That is a huge problem. The tool creates a sense of safety that does not match the reality, and few people are poking holes in that.

Your senior engineers are now being asked to review three times the volume of code at the same quality bar in the same number of hours. Maybe even fewer hours. That math does not work. When reviewer fatigue sets in, defects slip through the cracks. And they are not syntax errors. They are architectural flaws, security gaps, and logic failures that show up weeks or months later as production incidents that are very hard to track down.

Layer Three: The Workforce Gap

This one is what keeps me up at night, or it would if I ran a development shop.

A 2025 LeadDev survey found that 54% of engineering leaders believe junior developer hiring will drop in the long term as a result of AI coding tools. Entry-level hiring at the 15 largest tech companies already fell 25% from 2023 to 2024. Job postings for entry-level development roles have dropped 67% since 2022.

Here is why this matters specifically for technical debt. Junior developers are not just cheap labor. They are the pipeline. Those are the people who, like apprenticeships, become mid-level engineers and then senior architects — the ones who review the code and make the judgment calls that AI cannot. When you cut junior hiring because AI makes your senior developers more productive, you are simultaneously creating the technical debt and eliminating the people who could eventually fix it or catch it.

It's almost like the airline industry starving for pilots. Anybody can go and train and learn how to fly an airplane. What you are lacking is the experience and the oversight in the cockpit of somebody who has been through 20 maintenance issues or 30 crash incidents and can help you get past it. It's not just about flying the plane. 99% of the job is boredom. 1% is when you really need it. Development is going down that same path if we don't watch it.

Matt Garman, the CEO of AWS, said it plainly. Cutting junior hires because of AI is, in his words, one of the dumbest things he has ever heard. "How's that going to work," he asked, "when 10 years from now you will have no one who has learned anything?" That is a powerful reflection, and I think most of us need to start paying more attention to it.

Three Questions Every Leader Should Ask This Week

So what do we do about this? I'm not sure either. We're all new to this. But I want to share three questions as a coach that I think would help every product and engineering leader. And I'm not talking about next quarter. I'm talking about this week, or next week at the latest.

Question One: How much of our code is AI-generated, and how much of that is being reviewed by someone who understands the system?

If your team can't answer that question, I don't believe you have any governance. Or at least, not any that can be seen. You are relying on hope, and hope is not a strategy for production-level systems.

Question Two: What is our definition of done for AI-generated work, and does it include explicit quality gates that are different from human-authored code?

It should. This is not about slowing down. This is about being intentional. Make hay slowly. Go fast, but carefully. AI-generated code has a different failure pattern than human code, and your quality processes need to account for that. If your definition of done says "tests pass and code review," that bar is way too low for AI-generated work. Add an architectural review for anything that touches authentication, authorization, data handling, or infrastructure.

Question Three: Are we tracking the ratio of new feature work to rework and defect repair, and has that ratio shifted since we adopted AI tools?

That is a leading indicator we can track. If your teams are spending more of their capacity fixing things instead of building new things, AI may not actually be making you faster. It just appears that way. It's slowing you down in the long run. If it makes you feel faster while quietly eroding your delivery health, that is the hidden cost. You optimize for throughput at the expense of predictability, and the rework queue starts eating your capacity. We all know what queues do to a flow system.

The Data the Analysts Are Publishing

Forrester predicts that 75% of technology decision-makers will face moderate to severe technical debt by 2026. We are in 2026, and I'm already seeing it. This is not a future prediction. This is current reality for most of you watching, whether you know it or not.

Gartner goes further. They forecast that prompt-to-app approaches adopted by citizen developers will increase software defects by 2,500% by 2028. That is not 25%. That is 2,500%. These are not fringe analysts sounding the alarm. These are the firms your CFO and CEO read.

And yet in most of the leadership meetings I've sat in, the conversation is still about adoption. How many developers use the tools? Who's using AI? We have to start using AI. Okay. You do have to start using things to learn them. I'm a big fan of that. But iterative and incremental is the better question.

What to Do This Week

Believe in small, iterative improvements. What is one thing you can do this week? Sit down with your engineering leads for 30 minutes and ask them to pull two data points. First, the percentage of pull requests in the last 30 days that were AI-assisted or AI-generated, if you are tracking it. Second, what is the trend in change failure rate (that's a DORA metric) and time to restore service over the same period.

If they can't track that data, you just found your first problem. If you do have the data, and AI adoption went up while delivery stability held, great. You are likely managing it well. But if AI adoption went up and your change failure rate also went up, you are exposing a quality gap that needs attention sooner rather than later. Don't scale a bad system. Scaling a bad system creates a huge amplitude in the sine wave of change.

Thirty minutes. Two data points. One conversation. That's it. You will know more about your actual risk posture than 90% of leadership teams out there right now.

If you want to go deeper, ask your team to tag every pull request as AI-generated, AI-assisted, or human-authored for the next couple of sprints. Don't change the process. Just make it visible. Like Kanban tells us, make it visible first. You can't govern what you cannot see.

This Is Not an Anti-AI Message

I want to be clear. I am not anti-AI. I use AI tools every day. I love new technology. The productivity gains are real when these tools are used with good judgment and good discipline. But productivity without governance is just a faster accumulation of risk and technical debt.

The organizations that will win in the next two years will not necessarily be the ones who adopted AI the fastest. They will be the ones who adopted it the most responsibly, the most deliberately. You don't have a speed problem. You probably have a visibility problem. Fix that first, because you can't manage what you can't measure.

Keep Going

If you want to bring this conversation into your organization, we'd love to help. Explore the agile leadership and product management classes we offer — including our AI for Product Management class, which was built exactly for moments like this. Not from a coding perspective, but from the perspective of how to responsibly use AI and what it actually is.

We're all about helping you do better today than you did yesterday. See you next week.