Who Owns AI-Generated Code When It Ships to Production?

An AI agent writes a feature. It passes the automated test. The pull request gets approved in a one-minute review. And off it goes, deployed to production. A week later, it creates a security vulnerability that exposes customer data. Now you're sitting in the security officer's office, and someone asks you to answer fast: who's accountable?

If you had to pause, that pause is the problem.

I want to talk about who owns AI-generated code when it actually ships, and why most teams can't answer that question without a half-hour meeting and a long Slack thread.

The core idea

AI didn't remove accountability from your engineering organization. It diffused it across so many hands that no one owns the outcome, and that makes the accountability gap a leadership design problem you can solve by attaching one named human to every AI-generated change that ships.

The Sprint Review That Looked Like a Win

Picture a recent sprint review. Stakeholders are in the room. The product owner kicks things off with a slide showing how many features shipped and what went well. The Scrum Master stands up and shows cycle time is down and velocity is up 40%. Everyone's clapping.

Two weeks later, the security team walks in and flags an authorization bypass on one of those features.

So the team huddles. We run the five whys, or we sketch out an Ishikawa diagram, whatever you use. The developer who's most familiar with it says, "Well, I prompted Copilot to add the endpoint. It generated the auth check. It generated the test." The person who reviewed it steps up: "It compiled. All the tests were green." And the product owner says, "Yeah, when I saw it at the end of the sprint, the functionality looked good."

Finally we bring in the engineering leader, the person who approved the AI tool rollout six months ago. But they never considered that approval to be a decision about accountability.

So what really happened here? We didn't lose accountability. It's right there in the room with us. What we did was diffuse it.

Diffused Accountability Is Not Accountability

Diffused accountability is a coordination failure, and coordination failures show up in a lot of ways.

The traditional model worked because the person who wrote the code owned it. Their name was on the commit. They knew the logic because they'd thought it through and put the words on the page. When something broke, you knew who to call, or at least who could start the investigation.

AI tends to break that chain at the very first link. The person whose name is on the commit didn't write the logic, so their instinct is, "I didn't put that in there." But they did prompt for it. And nobody downstream is sure where their job ends and the AI begins. That gap is where the trust problem actually lives.

What the Research Is Telling Us

I'm a data guy, so let's look at the research. And I want you to remember something: all of this is new to most of us. The best we can do is keep up with what others are seeing, then meld that with our own reality.

"Army of Juniors" and Insecure by Dumbness

In October 2025, a security firm called Ox Security published a report titled Army of Juniors. They analyzed more than 300 open-source repositories, including 50 that used tools like Copilot, Cursor, and Claude. They documented 10 specific architectural and security patterns these AI tools typically generate, things like reinventing libraries from scratch, or ignoring established architecture principles, and what they bluntly called "insecure by dumbness."

That phrase didn't come from me. That's a direct quote from their VP of Research.

The report is not saying AI writes worse code per line. Per line, it writes about the same. The point is that AI removes the natural bottlenecks that used to control what reached production. Code reviews. The slower second pass of debugging. Those bottlenecks weren't bugs in our process. They were features. They saved us from a lot of this.

4x Velocity, 10x Vulnerabilities

Around the same time, Apiiro ran a parallel study and found the same thing. They used a deep code analysis engine across tens of thousands of repositories at Fortune 50 companies over about six months.

Here's what they found. AI-assisted developers were committing code three to four times faster than their peers. But the pull requests got bigger and fewer, which is a no-no in agile development. Instead of small, reviewable changes, reviewers were facing massive, multi-file pull requests that touched five services at once. We'd never do that on purpose.

Then the numbers. Monthly security findings went from about 1,000 to over 10,000 in six months. That's a tenfold jump.

And the kind of vulnerability shifted, too. Trivial syntax errors fell. Logic bugs fell. But architectural flaws, things like privilege escalation paths, surged 322%. Design flaws surged 153%.

So we're not catching different bugs. We're catching bigger, deeper, harder-to-find bugs, the kind that take weeks to remediate and sometimes weeks just to notice. The velocity is real. The output looks great on the dashboard. And the underlying risk is compounding. That's the scary part.

This Is a Leadership Design Problem

Be honest about your own organization. Someone, somewhere, probably approved all of the AI tooling for engineering in the last 18 months. Most leaders did. But if you didn't also approve a governance approach for how that AI gets used, then you signed up for the velocity without signing up for the accountability. And that one's on you, leader, not the developer. The developer using the tool you gave them isn't the problem.

So the accountability gap isn't a developer problem, and it isn't a security team problem. It's a leadership design problem. That sounds bad, but here's the good news: it's a solvable design problem.

Two Frameworks Worth Knowing: NIST and OWASP

Two governance frameworks are worth knowing here, even if you never read every page.

The first is the NIST AI Risk Management Framework, which covers how organizations should govern AI-related risk end to end. The second is the OWASP Top 10 for LLM Applications, tailored to large language model risks like prompt injection, insecure output handling, and over-reliance on AI output.

Don't bother memorizing either of them. You just need to know they exist, and you need to set your team's posture toward them until something newer comes along. Right now, those are the good ones.

A Practical Model: Three Layers of AI Accountability

I'm a big fan of practical models, and you don't need to be a security expert to use this one. I think it's best to think about AI accountability in three layers: individual, team, and organizational. Each layer has to actually exist, with a human name attached, or the layer below it has nothing to stand on.

Layer 1: Individual Accountability

For every piece of work that includes AI-generated output, there has to be one named human who reviewed the work and is willing to put their name on the outcome. This is what "human in the loop" really means. Not "the team approved it." Not "it passed all the tests." A name. A person. If something breaks at 2 a.m., this is the person who gets the call.

The job here isn't to gatekeep. The job is to be the last set of human eyes that actually understood what shipped. If your current process can't name that person for a given change, you have a gap.

Here's a real example. A product team I worked with had a Scrum Master propose a tiny change to their definition of done. Three words: reviewer of record. For any pull request that included AI-assisted code, the reviewer of record had to be named, it had to be a human, and that name had to appear in the PR description:

Reviewer of record: [human name here]

That was it. It cost them basically nothing, and it sparked more conversation about AI use in two weeks than we'd had in the previous six months.

Layer 2: Team Accountability

The team has to agree on which categories of work can use AI without additional human review, and which categories require it. We're not trying to gatekeep; we're trying to maximize flow. Refactoring a private utility function might not need a deep review. But anything touching authentication, authorization, payment, or PII? The team writes that down. It lives in their working agreement, their definition of done, or honestly both.

This boundary protects individual reviewers from being asked to deeply audit every single path, which would slow everyone down. And it protects the organization from something important slipping through because nobody was sure if it counted.

Layer 3: Organizational Accountability

This is where leadership lives. The organization sets the policy that says: we use AI tools for code generation, let's be open and honest about that, here are the categories where they're appropriate, here's how we'll measure quality, here's who owns AI-related quality events at the leadership level, and here's how we review them quarterly.

If you've never seen a document like that in your org, it's not because it exists and someone forgot to send it to you. It probably doesn't exist. Most of us don't have it, and that's okay. Start where you can.

Leaders, you cannot delegate AI accountability downward if you haven't built the structures that make accountability real. If your developers are using AI with no team agreement, no reviewer of record in practice, and no leadership policy, you're not running an AI-assisted engineering organization. You're running an unsupervised one.

And I'm talking to the agile coaches and Scrum Masters too. Shame on us if we're not making our leaders do this. Me included. That distinction will matter when something goes wrong, and right now the data says something will.

The 30-Second Test

The three-layer model isn't about adding a ceremony or a role. It's something simpler. For every AI-generated artifact that reaches a customer, you should be able to trace a chain of human accountability in 30 seconds. Not 30 minutes. Thirty seconds.

Leadership cue

Pick one team this week and require a single line, "Reviewer of record: [name]," on every AI-assisted pull request. The person willing to put their name on it is the person who takes the 2 a.m. call, and the PRs no one will sign are your clearest signal of where the real risk lives.

Try This Next Week: One Line in the Pull Request

So how do we start? Small, iterative, incremental improvements. Pick one team, not the whole org. Before any AI-assisted code on that team gets merged this week, the developer adds one line to the pull request description naming the reviewer of record.

That's it. The name is the person who looked at the AI-generated output, understood what it does, and is willing to take the 2 a.m. call if it breaks. No tool to buy. No workshop to schedule. No 40-page policy to write. Just one line.

Here's what you'll probably learn in about seven days. You'll find out who actually feels comfortable being named on AI-assisted work. You'll find out which PRs nobody wants to put their name on, which is a signal in itself. And you'll start a real conversation about what reviewing AI output should look like, which is where the actual work begins.

If nobody is willing to be the reviewer of record on a piece of AI-assisted work, you don't have an AI-quality problem. You have a clarity problem. And clarity is an easy one to solve first.

You don't need a 50-page AI governance policy to start closing this gap. Nobody's going to read it or remember it anyway. What you need is to engineer it into the workflow: a single named human attached to every AI-generated change that actually ships. Everything else builds up from there.

Closing the Gap, One Named Human at a Time

The velocity AI brings is real, and it isn't going away. The question is whether your accountability keeps pace with your output, or quietly falls behind it. Start with one team, one line, one named human. Then build outward.

If you want to go deeper on building governance, leadership structures, and agile practices that hold up under real AI velocity, explore the classes and coaching we offer at big-agile.com. It's the same practical, systems-level approach I bring to the teams I work with, and it's built for leaders who'd rather design accountability on purpose than discover its absence in the security officer's office.