Why AI Is Making Your Dev Team Slower: The Productivity Paradox

Would you be willing to say this? Experienced developers, working in repos they already know, using the AI tools that everybody keeps telling them to use, are 19% slower than developers who are not using those tools.

Not faster. Slower.

Now, this is not a vibes-based take or me trying to be controversial for more clicks. I'm a data guy at heart, so I want to look at the studies — and that number is based on a randomized control trial. The implications are bigger than what most of us think.

Happy Monday. Lance Dacy here with Big Agile. I train Scrum Masters, product owners, leaders, coaches, developers — anybody I can get my hands on — and most of the time I coach product organizations on how to deliver more value with less friction. That's a full-time job, I can promise you that. I also help with tooling, and these days that means AI tooling.

This week I want to talk about the productivity paradox in AI tooling, why some teams are actually getting slower with AI, what might be causing it, and what to do about it without becoming the office Luddite.

The core idea

AI tools speed up the part of the system you can see, and slow down the part of the system you're not measuring.

The Team That Quietly Slowed Down

Let me paint a picture you'll probably recognize.

A team you would call high performing. Senior engineers. A cleanish code base, whatever that means to you. Decent, untarnished velocity metrics. Six months ago they rolled out Copilot, or Cursor, or whatever flavor your org is standardizing on. And their output went up. Pull requests went through the roof. Story points went up. Everybody felt good. The CTO was thrilled they could tell their board, "We are embracing AI."

Then somebody like me — or a coach paying attention to real metrics — noticed that lead time was creeping up. Not a lot, but it was creeping up. Change failure rates, if you're tracking those, were ticking up too. And the team started having those uncomfortable retrospectives where somebody says: "We're shipping a lot of stuff, but it doesn't feel like anything is actually working well in the environments. Are we really looking at that?"

The conventional wisdom when you see quality taking a hit is to push harder. More AI. More tools. More automation. Maybe a productivity dashboard. Maybe a hackathon — that's a good one, let's have fun trying to solve those problems.

But the hard truth is that I don't believe any of those address the actual problem.

The team I'm describing didn't slow down because they needed more AI. They slowed down because AI sped up the part of the system that was already fast, and didn't help the part that was always slow.

Let that sit a moment. AI accelerates execution on known problems — no problem with that. But are we also accelerating the discovery of the right problems? Teams that confuse those two will get slower, not faster, every single time in my experience.

I took an HOV lane in the wrong direction last week. I was going really fast, laughing at the traffic — until I realized I couldn't exit. By the time I could turn around I had added twenty minutes to my trip. I was going really fast in the wrong direction.

Whether or not we are solving the right problems still takes a person to ensure we are pointed in the right direction. For technologists, we absolutely still need people who help us decide whether this code makes sense in our ecosystem. Maybe one day we can worry less about that. But right now, this is all brand new to all of us. We do not have a solution for that yet.

The Data: Expectation vs. Reality

I'm a data guy, so this is not just my hunch. Nobody has enough experience yet to reach a perfect conclusion, but we should be paying attention to the data we do have.

METR ran a randomized control trial in July 2025. The study included developers working in repositories they already knew, on real tasks they actually do. Half worked with AI tools and half did not.

Going in, the AI users predicted they would be 24% faster. Independent observers predicted roughly the same. The actual result?

They were 19% slower.

Expectation: 24% faster. Reality: 19% slower. That is a 43-point gap between perception and reality.

Here is the part that should bother you the most. The developers themselves still believed they were faster — even after the data came in. They felt more productive. The work felt smoother. Why wouldn't it? They just produced less actual work.

Most of the time in development we are terrible at perceiving our own pace. We felt productive, but nothing actually shipped. Who is paying attention to that?

Stack Overflow's developer survey adds another layer. They measured trust in AI tools and saw it fall from 43% to 29% over eighteen months. Usage in the same window climbed 84%.

Read that twice. Trust went down. Usage went up.

That is not a healthy adoption curve. That is a tool spreading because it is expected, not because it has earned the trust of the people using it. Choosers versus users — y'all know this issue. We are still facing it right now, except this time the people being mandated tools are the developers themselves.

Where the Bottleneck Actually Moved

I want to bring in a name some of you already know — and if you don't, you should. Don Reinertsen, in his book Principles of Product Development Flow. I feel like it's the most underrated book in our entire industry.

Reinertsen gave us math years ago that still works. I'll paraphrase: the cost of a queue is invisible until you measure it. The economic damage of a bottleneck, a dependency, or a queue usually does not show up where the bottleneck is. It shows up way further downstream.

Who is watching that downstream pain? The team? Likely not. Managers? Too busy.

That, to me, is exactly what is happening with AI right now. The acceleration is upstream. Great — code generation, cool, I don't have to type as much. But the bottleneck that grows is review. We still have to review the code. Code review now has to absorb a much larger volume of code, often from someone who didn't fully reason through it themselves. We have been fighting that problem for a long time.

Some argue we don't even have to review AI-generated code. I'll talk about that another time. But if we do have to review it, I can see how reviews get longer. Stories sit. Lead time — the metric I was watching creep up — starts climbing. The team feels productive because they are generating, not because they are delivering.

The DORA 2024 report doubled down on something that should have been obvious all along. High-performing teams are not the ones shipping the most code.

I worked with a CEO once who thought we had to have butts in seats nine to five because he wanted to see people typing code. I pulled up a GitHub heatmap of where the check-ins were happening, and a lot of them were at 1:00 or 2:00 in the morning. I went into his office and asked him, "What were you doing at 2:00 in the morning?" Of course he was sleeping. I said, "Look at all this activity. The developers were doing this in the middle of the night because they could focus. Forcing them to work nine to five is not a productivity move."

Hands-on-keyboard is not a sign of productivity. It feels like it, but it isn't. In lean, we actually want to ship less code, more simply. High-performing teams have short lead times, low change failure rates, fast time to restore, and frequent reliable deployments. Volume of output does not predict performance. Flow predicts performance.

The Productivity Paradox in One Sentence

Put all of this together and you get what I call the productivity paradox in one sentence: AI tools speed up the part of the system you can see, and slow down the part of the system you are likely not measuring.

It's an easy thing to discover, but hard to figure out what to do with. So what can we actually do as coaches?

The AI Flow Check: Three Questions

I'd like to introduce a small diagnostic I've been using. I don't have a better name for it right now — I'm calling it the AI Flow Check. It's three questions you can run in any retrospective, any planning meeting, or any one-on-one with your engineering lead.

Question 1: Is this a known pattern or a novel pattern?

Known patterns are where AI shines. CRUD endpoints. Standard test scaffolding. Boilerplate. Documentation generated from code that already exists. Release notes. If you have solved that kind of problem fifty times before, congratulations — AI is going to accelerate the fifty-first.

Novel problems are different. A complex bug in a system you barely understand. An architectural decision with three competing trade-offs. A piece of business logic that depends on six edge cases nobody cared to document. AI does not accelerate discovery. It speeds up the typing, but typing was never the bottleneck. Reasoning was. Having to type actually slowed us down — similar to when I teach a class and have to draw out a diagram. It feels slow, but the class absorbs it better. Slow is smooth. Smooth is fast.

So question one is just: what kind of problem is this? Teach your team to sort their work that way. Classify it. If they are using AI on novel problems, they are paying a hidden tax in cognitive validation. They have to read every line, simulate it, check it against context the AI didn't even have. That takes longer than writing it themselves — just like it sometimes takes more time to write a prompt than to do the work yourself. Eventually, if we get the prompts right, we can produce outcomes faster and more repeatably. But you have to work at that.

Question 2: Is our review time growing as fast as code generation?

This is the bottleneck question. If your team is generating twice as much code but reviewing at the same pace they always have, the review queue just keeps filling up.

Every pull request that sits in review I classify as inventory — one of the eight wastes of lean. Inventory is unfinished work the business has not received yet, and we want to minimize it. A retailer doesn't want a lot of inventory sitting in the warehouse. The problem is most of the inventory on a technology team is not visible. The cost is there. It's heavy. But we can't put our finger on it. So we make it visible.

Coaches, this is where you come in. Pull up the last six sprints. Look at average time in review per pull request. If that number is climbing while AI usage is climbing, you have found your bottleneck. The answer is not more AI. The answer is better review discipline — smaller pull requests, faster review SLAs, maybe pair reviews on AI-generated code so the originating developer is in the room while it's being checked.

Question 3: Are you measuring output or throughput?

This is a leadership question. Output is lines of code, pull requests merged, story points completed — y'all know what those are. Throughput is value delivered to the customer in a given window. If you are practicing Scrum, that is what the sprint should produce.

They are not the same thing.

Most teams that fall into the productivity paradox got there because their dashboards measure output. Velocity charts, commit counts, AI usage rates. It's exhausting. And none of those tell you whether anything actually got delivered to the customer.

What are we in business for? Are we in business to track output and satisfy leadership? Or are we in business to beat our competition and deliver the best products we can? That is the hard reality. We get so bogged down in the organizational hierarchy and political science of an organization that sometimes we lose sight of what we are really here for. That is where I get to come in — the neutral party, the wise fool. It is easy for me to say those things. I know it is hard inside the organization. I'm just trying to be a reasoning voice.

So switch your team's primary metric to lead time. Story commitment to story in production. Calendar time, not story points. Then watch what happens to your team's relationship with their AI tools. In my experience, it changes pretty quickly.

Leadership cue

Switch your team's primary metric from output (story points, pull requests merged, lines of code) to lead time — calendar time from story commitment to story in production. Watch how quickly that one change reshapes your team's relationship with their AI tools.

A 30-Minute Exercise for Next Week

Next week, give yourself thirty minutes. No tools. No dashboards. No consultants — unless you want me there, in which case I'll be there. Just thirty minutes with your team.

Pull up your last sprint's pull request report. Pick the five pull requests that took the longest from open to merged. For each one, ask two questions: Was this a known pattern or a novel problem? And did the time in review surprise us, given that?

That's it. You are not solving anything yet. You are just looking. Transparency. Inspection. That's all.

What you will usually find is that two or three of those slow pull requests were AI-assisted novel problems, where the reviewer spent more time validating than the author spent writing — and the author may have been the AI. That's your data point.

The following sprint, try something small. Draw a line. Add something to your definition of done. Set a threshold. Maybe AI is reserved for known patterns and explicit boilerplate, and you classify work before pulling it into the sprint. Maybe novel problems get a human first draft with AI as a second-pass reviewer. Make AI the reviewer — see what it tells us about our code. It can often find things we weren't thinking about.

There is no single right answer. But you cannot pick the right rule until you see the pattern, and humans need to make that decision. Right now, most teams cannot see the pattern — because they are not looking. We need to go look. That is what coaches do.

Tools Amplify the System They Are In

Here is what I want you to take away. AI is a tool. A great tool. I use it all the time and I'm not trying to be anti-tool. But just like any tool, you have to use it the right way. You have to learn it. And tools amplify the system they are inserted into.

If your system is well-flowed — short lead times, fast review, clear measurement of value — AI will likely make you go faster. But if your system, like most of ours, has hidden bottlenecks, output-heavy dashboards, and weak review discipline, AI is going to make those problems worse, faster. More problems, faster. Is that what you want?

The teams that win, to me, are not the ones generating the most code. They are the ones who paid attention to where the bottleneck moved when the AI tool showed up — and then adjusted.

Want to Go Deeper?

If you are a leader trying to navigate AI adoption without losing your delivery discipline, this is exactly the kind of transition we coach product organizations through. There is no single right answer — nobody is the expert. You are the expert. We are there to help you save some scar tissue.

Explore our upcoming workshops and consulting.

Thanks for reading. I hope to see you this week in the comments, the newsletter, the blog, or over on LinkedIn. Have a great week.