Five levels of organisational AI maturity

Daniel van der Merwe — Sun, 26 Apr 2026 15:02:10 GMT

A company I was talking to recently had rolled out hundreds of ChatGPT Enterprise accounts across the organisation. When they finally pulled the usage data, a handful of people were responsible for almost all the meaningful activity. A few others were using it for email drafts. Most hadn't logged in since week two. That's enterprise licensing spend with single-digit utilisation, and nobody had noticed until someone thought to check.

What struck me wasn't the numbers. It was that the leadership team had no framework for thinking about what to do next. They'd bought the tools. The tools existed. Some people used them. Was this a success? Failure? An intermediate state? They couldn't tell.

I've been having this conversation, or versions of it, with engineering leaders and portfolio directors about once a week for the past few months. The tools change. The numbers change. The shape of the problem doesn't. We've had to work through this ourselves at Rokkit200, redesigning how we operate from the ground up, so when I describe these levels I'm not standing outside the problem.

Companies have access to AI. They don't have a way to think about what they're doing with it, where they actually are, and what it would take to get somewhere better. McKinsey's State of AI survey found that 88% of organisations now use AI in at least one business function. A separate McKinsey workplace study found that only 1% of leaders consider their companies mature in AI deployment. That gap between "using AI" and "being good at AI" is the entire problem.

We call the framework the AI maturity ladder. Five levels of organisational AI maturity, measured not by what tools you've bought but by how your organisation absorbs and compounds what AI makes possible. Most companies we assess are at level one or two. Some have pockets of level three. Very few have reached four. Five is rare enough that it's worth being honest about how far away it is for almost everyone.

The lone wolf problem

There's a distinction that matters before we get into the levels, and it's one I see leaders trip over constantly.

Individual AI capability and organisational AI maturity are not the same thing. They get treated as though they are, because the people who report on AI progress inside a company are usually the same people who are personally good at it. The developer who built a 17-agent orchestration system on GitHub Copilot that does the planning, execution, and review of an entire feature. The teammate who rebuilt an internal chatbot from scratch on his own using Custom GPTs. These people are real; they exist in most companies of any size, and they create a dangerous illusion.

The individual axis is well understood. Most frameworks describe it as a progression from task assistance (using AI to draft an email) through workflow support (generating code, processing documents) to agentic work (handing AI an end-to-end task and reviewing the output). That progression is about what a person can do. It's important, but it's not what the ladder measures, because a company full of people at level three on the individual axis can still be at level one organisationally. I've seen it. More than once.

I'll call her Sarah, because I keep meeting her. At a round table a few weeks ago, someone called this the "lone wolf" phenomenon, and the label stuck because every engineering leader in the room recognised it instantly. Sarah is your best individual practitioner. She's operating at the agentic level personally, building impressive workflows, sharing what she knows, and presenting at all-hands. Leadership points to Sarah as evidence that "we're doing AI." But what happens when Sarah goes on leave for three weeks? Or when she moves to another company? You find out very quickly that what looked like organisational maturity was actually one person's individual capability, borrowed and relabelled. The company doesn't have an AI practice. It has a person. This is the distinction that matters when AI progress gets reported upward, because the board is usually hearing about the Sarahs, not the organisation.

The ladder measures the organisational axis. What the company does with the aggregate, not what any single person can do with the tools. That's where the real operational leverage sits, and it's where the real gap is.

Level 1: Exploring

This is where most companies start, and a surprising number stay longer than they'd admit.

The ground-level picture is familiar. People are experimenting with AI individually. Some are using ChatGPT for drafting, others have tried GitHub Copilot or Claude, and a few have built small automations. There is no policy, or there is a policy that exists as a PDF somewhere that nobody reads. IT security hasn't formally approved anything, but hasn't formally blocked anything either. The result is shadow AI. Employees using personal accounts, pasting company data into consumer tools, building workflows that nobody else knows about, and nobody has audited.

At one company we assessed recently, the engineering team had individual ChatGPT accounts, no team setup, and a collection of side projects that various people had started on weekends, but nobody had documented or reviewed. The phrase that stuck with me was "weekend warriors." Not as a criticism, but because it captures the dynamic precisely. Enthusiasm without structure, effort without accumulation.

What you see at level one:

Individual experimentation with no shared playbook
Shadow AI across the organisation, creating compliance and IP risk nobody has sized
No usage data, so no way to know what's working
Leadership can't distinguish between "we're doing AI" and "some people are playing with AI"
AI is a conversation topic, not an operational reality

What it costs. The direct cost is small. The opportunity cost is enormous, because every month at level one is a month where the AI practice you could be building isn't compounding. And the risk cost is unknowable until something goes wrong, which is the worst kind of cost because it makes the status quo feel free. Earlier this year, a prompt injection hidden in a GitHub issue title compromised 4,000 developer machines through the Cline AI coding tool. The entry point was plain text that an AI bot misread as an instruction and executed with full system credentials. That's what ungoverned AI tooling looks like when it meets a real threat. And if an auditor asked today how your employees' AI usage is governed, most organisations at this level couldn't answer.

What moving up requires. Someone with authority has to make a decision. Not about which tool to buy, but about whether AI is going to be a thing the company does on purpose. Choosing not to decide is itself a decision, and it has a cost even if nobody's measuring it.

Level 2: Equipping

Level two is where most companies think they are, and where a significant number actually land. Leadership has committed. There's a budget. Enterprise licences have been purchased. There might be a policy document. There is probably a Slack channel. There is definitely a vendor relationship.

And adoption looks like this. A small percentage of the company generates almost all the meaningful usage, a series of proofs of concept that were impressive as demos but never made it to production, and a quiet persistence of shadow AI because the sanctioned tools don't quite fit how people actually work. The policy exists, but it's a compliance artefact, not an operational guide. It tells people what they can't do. It doesn't tell them what they should do. McKinsey's data confirms this. Among the 88% of organisations that report using AI, roughly two-thirds remain in experiment or pilot mode. Only a third have begun to scale. A Harvard Business Review analysis describes the pattern as a "technology-first trap" where organisations deploy AI department by department without linking it to enterprise goals, producing technically successful implementations that never reach production.

I see this pattern everywhere. A company with forty-odd people across development, QA, and product, spread across a portfolio of products on a shared platform. They've bought GitHub Copilot, they've got ChatGPT Enterprise, a few people are using Claude, someone in marketing started using Gemini, and they've spent the better part of a year dabbling across tools with no shared model or governance. Some teams are further along than others, but there's no way to tell which ones or by how much, because nobody is measuring anything.

What you see at level two:

Enterprise licences are in place, and adoption is happening in isolated pockets
Multiple proofs of concept, few production deployments
Little or no cross-team measurement or observability
Shadow AI persists alongside sanctioned tools
AI as a budget line item, not as a practice

What it costs. Level two looks like progress from the inside, which is why it's hard to leave. You've spent the money. People are using the tools. Reports go up that say "AI adoption in progress." But there's no compounding happening. What someone learns on one project stays on that project. The instruction files that make AI effective (the system prompts, the context documents, the workflow templates) either don't exist or exist in someone's personal setup and disappear when they leave. Every new project starts from scratch.

And in some companies, sanctioned tooling without the learning time, the operating framework, or the organisational redesign to support it has actively slowed teams down. You've introduced a new paradigm without giving people the space to make sense of it. And in most well-run companies, writing code was never the bottleneck anyway. It's the PR queues, the QA cycles, the feedback loops between teams. Speeding up one step in the system without addressing the rest just moves the pile-up downstream.

A recent Harvard Business Review article calls this the "last mile" problem. The primary obstacle is rarely model quality or data availability, but the point where technical capability has to meet organisational design. Most companies never get past it.

What moving up requires. An owner, a practice, and measurement. The first is a specific person or team whose job it is to build and maintain how the company works with AI, not just manage the vendor relationship. Someone with enough authority to set standards across teams and enough technical fluency to know whether those standards are being followed. The second is documented ways of working that are shared, taught, and iterated on. The third is a way to know whether any of this is actually working. Most companies get the first two in some form and skip measurement entirely. Without it, you're flying blind, and the distance between level two and level three stays imaginary.

Level 3: Practicing

This is where the productivity multiplier actually kicks in, because the things that matter at level three don't look like what most people expect.

At level three, AI becomes a practice rather than a collection of tools. What distinguishes this from level two isn't better tools or more licences. It's three things that are fundamental to how the organisation works. Instruction files treated as infrastructure. Explicit ownership of the practice. And cross-tool observability. McKinsey's data supports this. The organisations generating the strongest AI returns are nearly three times more likely than others to have fundamentally redesigned their workflows. The differentiator is organisational rewiring, not technical access.

Instruction files as source code means the system prompts, context documents, coding standards, and workflow definitions that make AI effective aren't sitting in someone's head or personal account. They're in version control. They're reviewed, updated, and shared across projects the same way you'd treat any other critical infrastructure. When a new team member joins a project, the AI context is part of the onboarding, not something they have to discover or reverse-engineer.

Explicit ownership means someone is responsible for the AI practice across the organisation. Not in an advisory capacity. Not as a side-of-desk thing. They own the playbook, they run the retrospectives, they make sure learnings propagate. This isn't an AI committee that meets monthly. It's an operational role.

Cross-tool observability means you can see how AI is being used, where it's generating value, and where it's not. Not because you're policing usage, but because you can't improve what you can't measure, and level three is about improvement.

The proof points at this level are dramatic, on the work that's amenable to it. In controlled experiments, the gains are already clear. ANZ Bank's GitHub Copilot trial showed a 42% reduction in task completion time across 100 engineers. But individual task speedups only translate to organisational performance when the practice infrastructure is in place. Instruction files, context documents, and feedback loops from previous projects that mean the next project doesn't start from scratch. The 2025 DORA report confirms this. AI doesn't fix a team; it amplifies what's already there. Only teams with solid workflows and practices see the gains compound.

The macro data is starting to reflect this too. While most companies are still using AI for narrow tasks, a small cohort of power users are compressing weeks of work into hours by automating end-to-end workstreams. That's what level three looks like from the outside. In financial terms, that's weeks of senior capacity freed up per quarter, not through headcount changes but through removing the low-leverage work that was consuming it.

What you see at level three:

Instruction files version-controlled and shared across projects
A dedicated owner (or owners) for the AI practice
Measurement in place, so you know what's working and what isn't
New team members onboarded into the AI practice, not just the tools
Measurable team-level productivity gains, with metrics defined per role

What it costs. Coordination overhead is real. Maintaining shared instruction files, running retrospectives, and propagating learnings across teams. This is work that didn't exist before, and it needs time and attention. If the instruction files and practices aren't maintained, they calcify, and you end up with a level-three structure generating level-two results.

What moving up requires. Feedback loops. Not feedback as in "we do a retrospective." Feedback as a system. A bug was traced back to a spec gap that updates an instruction file, preventing the same bug from appearing on the next project. The practice doesn't just exist; it improves itself. That's the transition from level three to level four, and it's a harder one than it sounds. It also requires governance that's actually operational, not just a policy document. Data classification, usage policies, audit trails. These aren't nice-to-haves at level three. They're what makes the practice safe to scale. Without them, level four's compounding amplifies risk as fast as it amplifies value.

Level 4: Compounding

Level four is where the AI practice stops being something the company maintains and starts being something that maintains itself. Or more precisely, where the output of using AI feeds back into the inputs, so the next cycle is better than the last, without anyone having to manually make it so.

In practice, this means instruction files that improve over quarters because retrospectives feed findings directly into them. Patterns that propagate across projects because the system surfaces what worked and pushes it out, not because someone remembered to share it in a Slack channel. Institutional memory is forming in code and documentation rather than in people's heads. A concrete example from our own work. A production bug traced back to domain knowledge that existed only in the heads of two developers. The kind of thing that, in the past, you'd fix and move on from, because nobody reads internal documentation. Now, that fix goes into an instruction file that the AI reads on every future task. The person who knew it doesn't need to be in the room any more. The knowledge is in the system. Scale that across every project, every quarter, and the baseline the next team starts from is meaningfully higher than the last.

This is rare. Most companies that claim level four are actually running a good level three with occasional flashes of compounding. The difference is whether the feedback loop is structural (it happens because the system is designed that way) or incidental (it happens when someone remembers or has time).

And some companies mistake something else entirely for compounding. At one company, a developer had built an internal AI tool on his own initiative. It worked well in demos. Leadership was excited. But when we asked about test coverage, edge case handling, and what happens when that developer is unavailable, the answers got thin quickly. Nobody had stress-tested it. Nobody had documented it. Nobody else understood how it worked. That's not level four. That's level two with ambition.

The reason it's rare is partly technical and partly cultural. The technical part is that you need an observability infrastructure that most companies haven't built. A way to trace outcomes back through the AI-assisted process to the instruction files and practices that shaped those outcomes. The cultural part is that compounding requires a kind of discipline that looks different from the discipline of shipping. It requires people to slow down after a success and ask, "What made this work, and how do we make sure it keeps working?" Most engineering cultures reward the shipping part and treat the reflection part as optional.

What you see at level four:

Instruction files evolving based on production outcomes, not just opinions
Patterns propagating across teams without manual effort
Observability connecting AI usage to business outcomes
The practice is improving quarter over quarter with measurable evidence
New projects starting at a higher baseline than the last ones

What it costs. A different kind of discipline. The investment isn't primarily in tools or even in time. It's in building and maintaining the feedback infrastructure, and in a cultural commitment to treating the AI practice as a system that compounds rather than a set of tools that depreciates.

What moving up requires. A strategic reframe. At levels one through four, AI is an operational capability. It makes the company better at doing what it already does. Level five requires asking a different question entirely.

Level 5: Directing

I'll be straight with you. I haven't seen a company operating consistently at this level. This is where the ladder points, not where anyone has planted a flag.

Every level up to this point has had AI as the subject of the question. Should we do it? Why isn't it working? How do we make it stick? How does it keep getting better? At level five, the question changes.

It becomes "what's possible now that wasn't before?" Notice that AI isn't in the sentence. It dropped out, not because it stopped mattering but because it became assumed. Up to level four, the focus is on building a system that improves itself. At level five, the focus shifts to where that system is applied. The organisation is no longer limited by how well it can execute. It's limited by what it chooses to execute.

Work that used to be too risky becomes viable, because the cost of being wrong is lower. Work that used to be too small becomes worth doing, because it can be executed efficiently. Entire categories of projects open up, not because the market changed, but because the company's capability did. Ideas move from conversation to execution fast enough that the feedback becomes the strategy.

There's a board meeting I keep imagining when I think about what this looks like in practice. The agenda covers three things. A market opportunity in a new vertical. The product expansion planned for Q3. Integration priorities from the acquisition that closed in January. AI isn't on the agenda. Not because the board is unaware of it, but because asking about the AI strategy at this point would feel like asking about the electricity strategy. It's infrastructure. It's assumed. The conversation is about what the infrastructure makes possible, not about the infrastructure itself.

And that's the honest promise of this entire framework. Not that AI only makes you more efficient at the thing you were already doing. Not that you save headcount or trim a budget line. The promise is that the question in your rooms changes. From "should we?" to "why isn't it working?" to "how do we make this stick?" to "how does it keep improving?" to, finally, "what's possible now that wasn't before?"

That last question is worth working towards.

Where does this leave you?

The distribution in the market, from what I've seen, is heavy at levels one and two. Most companies that claim to be at three are actually at two with good intentions. Level four is aspirational for most. Level five is more of a direction than a destination right now.

Here's what I think matters most right now. The two-to-three transition is the consequential move of the next three years. It's where the productivity gains materialise, where shadow AI gets resolved, and where the AI investment starts compounding instead of depreciating. Companies that make this transition in the next 12 to 18 months will find the value accumulating faster than they expected, because compounding is like that. The earlier it starts, the more dramatic it gets.

The three-to-four transition is where long-term value lives. It's harder, it takes longer, and most companies aren't ready for it yet. But the companies that build genuine feedback loops into their AI practices will eventually find themselves in a position that's very difficult to replicate, because you can't buy a compounding system off the shelf. You have to build it from accumulated practice.

None of this is glamorous. Governance, instruction files, measurement, feedback loops, giving people actual time to learn. The foundations are boring and they're what everything else is built on. If you skip them to chase the flashy stuff, levels three and four will collapse under their own weight.

If you're trying to figure out where your organisation sits on this ladder, I'd suggest starting with three questions:

What happens to your AI capability if your two strongest practitioners leave tomorrow?
Can a new team member onboard into your AI practice, or just into the tools?
And is anyone measuring whether any of this is actually working?

The answers will tell you more than any self-assessment, and in almost every conversation I've had, they land about one level lower than the person expected.

Daniel van der Merwe is the Technical Director of Rokkit200, an AI transformation agency that works with engineering and product organisations to build compounding AI practices.

*Header image designed with ChatGPT.

Rokkit200 Insights