Skip to Content

AI at the Wrong Layer

There is a layer in most software organizations where the most expensive decisions happen. It has no metrics. No dashboards. No alerts. No postmortem process when something goes wrong at that layer — because by the time something goes wrong, it looks like an execution problem, not a decision problem.

It is the judgment layer. From where we sit, most enterprises are running it blind, and pointing their AI spending at every layer except this one.

Nobody Wants to Miss the Boat

Start with the honest version of what we think is actually happening, because no strategy document will say it out loud.

Enterprises are not adopting AI because they have identified a problem AI solves. They are adopting it because standing still has become the career risk. No engineering leader wants to be the one who explains to the board why the company was slow. So the mandate arrives from the top — we need an AI strategy — and it arrives without a problem attached. The tooling is acquired first. The justification is reverse-engineered later.

This is throw-everything-at-it adoption, and it is rational at the individual level and incoherent at the organizational one. A leader who buys ten copilot licenses and runs three agent pilots cannot be accused of standing still. Whether any of it created value is a question for next year — and next year, the same fear ships the next round of pilots. Motion substitutes for direction, and the motion is legible enough to survive a budget review.

We have watched this movie before. It was called observability.

Same Mistake, Two Altitudes

Here is the thing we keep coming back to. The board reaching for an AI strategy and the engineer approving a flawed RFC on a busy Thursday are not two different failures. They are the same failure, happening at two different altitudes.

Both are first-match decisions. Under pressure to resolve and move on, a mind reaches for the first answer that looks complete and stops there — not because it is the right answer, but because it is the available one and the clock is running. We need an AI strategy is the board's first match. This RFC solves the latency problem, approved is the engineer's. Neither actor is incompetent. Both are doing exactly what time pressure trains people to do, which is to mistake the first plausible resolution for the correct one.

This matters because it means the judgment-layer problem is not only organizational. It is cognitive. You cannot fix it by hiring better people, because better people under the same pressure make the same move. What changes the outcome is something that interrupts the rush to first match — a second perspective in the room at the moment the decision is forming, asking the question that makes the first answer hold still long enough to be examined. Keep that in mind, because everything that follows is a version of it.

The Unsolved Problem

Walk into any engineering leadership meeting in 2026 and you will find the same conversation. How do we scale AI adoption? Where do we deploy copilots? Which workflows can agents automate? How do we measure productivity gains?

These are reasonable questions. We suspect they are also the wrong ones.

We think they are wrong because they assume the bottleneck is execution. More code written faster. More tickets closed. More deployments shipped. If execution is the constraint, the tooling exists and it works. But execution is not the constraint that determines whether eighteen months of engineering work succeeds or fails.

Decisions are.

The architectural choice made in a Tuesday afternoon meeting that seemed reasonable in isolation. The RFC that looked sound for its scope but had a cross-team dependency nobody mapped. The infrastructure assumption a team in one timezone made that conflicted with what another team was building — discovered at integration, six months too late. The postmortem that found the proximate cause and stopped there, leaving the systemic cause to produce the next incident.

None of those live in a workflow. None of them show up in a productivity dashboard. None of them get caught by a copilot writing code faster.

They live in the judgment layer. And enterprises do not know they are bleeding there because they have no instrumentation for it.

What the Bottleneck Looks Like

Consider a company running thousands of services through a continuous delivery pipeline. The system works. Graceful degradation handles individual failures. Hundreds of engineers maintain their services and ship continuously.

There is no execution gap. The engineers are excellent. The tooling is mature.

The expensive problems happen between teams. An RFC looks sound in isolation — it solves a real latency problem for one service. Nobody in that review holds the context of a dependency documented by the platform team three months earlier that makes this interaction non-trivial. The RFC gets approved. The implementation begins. The conflict surfaces at integration.

That is not an execution problem. No amount of AI-assisted code generation changes that outcome. The problem was a judgment problem — a missing perspective at the moment of architectural decision. And notice the shape of it. The engineers who approved that RFC are excellent. They made a first-match decision under time pressure, exactly the way the board did when it reached for an AI strategy. Same mistake, lower altitude.

It is also worse than a single bad decision, because judgment failures compound. The wrong architectural choice in January does not stay in January. It constrains what is buildable in March, which constrains what is reachable in September. By the time the cost is undeniable, the cheap moment to correct it is a year gone. The judgment layer is not only where the expensive problems live. It is where expensive trajectories begin — one unasked question early can set the shape of eighteen months of work.

A thinking peer present in that RFC review asks the question nobody else did. This solves your latency problem, but have you considered what happens to the caching layer? Because I remember a platform team note from last quarter that makes this interaction worth examining.

No tool does that. A human might, if they happened to read the right document at the right time, if they were not in three other meetings that week, if they held cross-team context that no single person realistically holds at scale.

A thinking peer makes that connection structural.

Why the Layer Stays Dark

There are two reasons the judgment layer stays uninvested, and they compound. One is structural — the layer produces nothing measurable, so optimization pressure points elsewhere. The other is economic — the scarce resource that judgment depends on is human, and the workarounds enterprises have built around that scarcity have become a substitute for addressing it.

The Measurability Trap

There is, we think, a structural reason organizations point their AI budget at execution and not at judgment, and it is not stupidity. It is measurability.

Execution is legible. It produces tickets, commits, deployments, dashboards — artifacts you can count and attribute to a person. Judgment produces none of that. The decision to approve an RFC leaves a Slack thread and a meeting that nobody minutes. When that decision turns out to be wrong, the cost shows up months later, attributed to whoever was holding the code at the time, classified as an execution failure because that is the only category the system has.

So organizations optimize what they can see. This is Goodhart's law operating at the level of an entire engineering function. The measurable layer absorbs all the attention and investment, while the layer that actually determines outcomes stays dark precisely because it was never instrumented. AI does not escape this gravity. It gets pointed at the legible layer because that is where the metrics are, which means it gets pointed away from the layer where it would matter most.

The enterprises throwing everything at AI are throwing it exactly where they can watch it land. That, we think, is the problem.

The Scarcity Nobody Names

There is a second reason the judgment layer stays dark, and it is the one we think matters most. The judgment that determines outcomes lives in a small number of heads — principal engineers, staff SREs, the architect who has seen this failure mode three times before. That judgment is the scarcest, most expensive, most bottlenecked resource in the organization. And it does not scale.

Look at what enterprises have built around that fact. Architecture review boards. RFC processes. Design docs with mandatory reviewer sign-off. Asynchronous documentation requirements. On-call rotations that route the hard questions to the few who can answer them. In larger organizations this is the enterprise architecture function — the discipline that exists precisely to hold cross-team judgment, sized for scarcity from inception. None of these create judgment. They are all rationing mechanisms — elaborate process built to spread a scarce resource thinner without admitting that is what they do. The RFC template exists because you cannot clone the senior engineer who would otherwise need to be in every review. It is a workaround for scarcity, and like most workarounds, it captures the form of the thing while losing the substance. A reviewer skimming a design doc at the end of a long week is not the same as that reviewer fully present in the conversation, and everyone involved quietly knows it.

This is the claim we want to make carefully, because it is a large one. The peer model is not merely a better way to write code or close tickets. It is the first thing we have seen that addresses the senior-judgment scarcity directly, rather than routing around it with process. A thinking peer can be present in every RFC review, every postmortem, every architecture decision at once — without the political weight, the calendar, or the scarcity that makes the human version a bottleneck. We are not saying it replaces senior judgment. We are saying it is the first credible answer to a constraint enterprises have been papering over for a decade. We could be wrong about how far that goes. We do not think we are wrong that the constraint is real.

What Tooling Already Handles

Before reaching for AI, the honest question is whether a purpose-built tool already solves this deterministically.

Dependency updates? Renovate. Progressive delivery? Kargo. Policy enforcement? OPA or Gatekeeper. Secret rotation? cert-manager and External Secrets Operator. Pipeline automation? GitHub Actions. Helm chart validation? Kubeconform.

These tools are more reliable than any AI system for their specific scope — because they do not reason, they execute rules. That is a fundamentally different reliability class. When an enterprise replaces Renovate with an AI agent to modernize its dependency workflow, it has traded a deterministic system for a probabilistic one, gained nothing, and introduced risk into a workflow that was previously solved. The research bears this out, with narrowly-scoped purchased tools succeeding roughly twice as often as ambitious internal AI builds, because the ambition was never the constraint — the problem definition was.

A modern GitOps pipeline is deterministic end-to-end. Build, sign, scan, promote, verify, rollback — decided by rules over real signals. Nothing non-deterministic belongs in that path. Not because AI is incapable, but because predictable and auditable is the entire value of a pipeline.

The adoption test for any AI integration is simple. If the model is wrong here, what is the cost? Cheap and reversible — a comment, a draft, a PR — safe territory. Expensive and irreversible — a deploy, a rollback trigger, a cluster sync — deterministic only. The boundary is proposal versus mutation.

The pipeline should never know AI was involved. The conversation that designed the pipeline is a different matter entirely.

Where the New Specialties Point

The same misaim shows up one level down, in the people. A whole profession has formed around deploying AI, and the salaries are high — agent engineers, eval engineers, guardrail and observability specialists, all commanding premiums that did not exist two years ago. Engineers are rational to chase it. But it is worth looking at what these specialties actually do, because most of them are a specific way of touching the model, and the question we would ask of each is the same one we ask of the model itself. Does this do something no deterministic tool could, or does it reimplement something a deterministic tool already did well?

Run the new disciplines through that question and they sort cleanly, and not in the direction the hype implies.

Some are genuinely additive — they exist because there was no prior tool for the job. Agent observability is the clearest case. A traditional monitoring tool can tell you an endpoint took four seconds. It cannot tell you the model made three calls, spent twelve thousand tokens, and recovered from a failed tool call on the second try, because nothing before needed to watch a system reason. That is new ground, and instrumenting it is real work. Evaluation has a narrower but honest place too. A unit test answers did the code run, which it always answered better than any model can. A rubric answers is this readable, is this sound, which no unit test ever could. The field has mostly admitted this — the rubrics are not replacing the tests, they are sitting beside them, doing the part the tests were never able to do.

Others are a downgrade wearing a specialty's title. An agent that takes over dependency bumping is doing, unreliably, what a deterministic bot did perfectly. A guardrail that re-screens what a schema check already validated is adding a probabilistic gate where a certain one existed. An orchestration layer that sequences a fixed, known set of steps is reaching for a reasoning engine where a workflow engine was the right tool. In each, the giveaway is the same — a reliable tool already owned that exact job, and the model was brought in to imitate it.

So the line is not AI tooling, good or bad. It is where the specialty points. The useful disciplines cluster around the genuinely new surface the model creates — its reasoning, its tokens, its judgment about prose against code, the things that did not exist to be measured before. The wasteful ones cluster around the solved, deterministic ground a battle-tested tool already held. And the reason the profession keeps drifting toward the wasteful half is the same gravity that pulls the enterprise there. The solved problems are legible. They demo well, they benchmark cleanly, they fit a job description. Aim the intelligence at the genuinely unsolved thing is not yet a role anyone posts.

The clearest voice against the thick stack is, of all places, the company that builds the model. Anthropic's own engineering guidance — written from watching thousands of teams build, not from selling a framework — is blunt about it. The most successful implementations weren't using complex frameworks or specialized libraries. Instead, they were building with simple, composable patterns. The advice that follows is to find the simplest solution and add complexity only when it demonstrably improves outcomes, and to reduce abstraction layers and build with basic components as you move to production — the exact opposite direction from the one the framework demos pull you. This is the maker, with the widest possible view of what actually works, telling enterprises that the heavy scaffolding is not the path. And it tracks with how a capable engineering organization already operates. The deterministic tools keep the deterministic jobs — the pipeline, the cluster, the dependency bot stay exactly where they were — and the model gets the thinnest wrapper that lets it reason.

Which is the heart of what gets missed. The model is most valuable in contact with a person — the engineer thinking through a design, the support agent deciding what to tell a customer, the developer reasoning out loud about a tradeoff. That is where the augmentation lives, and it is measurable in the usage itself. When individuals reach for the model on their own, the dominant pattern is collaboration. When enterprises deploy it through the API, the dominant pattern is automation — the model wired into a pipeline to do a bounded task no human watches. The value is in the human contact, and the human contact is the first thing the industrial version engineers out.

The Reliability Objection

There is an obvious objection, and it deserves a direct answer. If AI cannot be trusted to trigger a rollback — if it is too unreliable for the deterministic pipeline — why would anyone trust it with an architectural decision that constrains the company for two years?

We think the objection is correct about the reliability and wrong about the consequence. Current models are genuinely unreliable at autonomous high-stakes reasoning. On real engineering-design benchmarks they fail a meaningful fraction of the time even after multiple feedback iterations. An AI that approves an RFC on its own authority is exactly as dangerous as an AI that fires a rollback on its own authority.

But that is not the proposal. The same boundary that keeps AI out of the pipeline governs it at the judgment layer — proposal versus mutation. A thinking peer in an RFC review does not approve anything. It surfaces the dependency nobody mapped, names the constraint nobody modeled, asks the question nobody thought to ask — and a human decides. The output is a proposal a human ratifies. Accountability never moves. The model's unreliability is bounded by the human who stays in the loop, the same way a junior engineer's unreliability is bounded by the reviewer who signs off.

This is why the failures that make the news — the airline whose chatbot promised a refund under a policy that did not exist, the consultancy that shipped a government report full of fabricated citations — are not arguments against the peer model. They are arguments against letting AI mutate the world without ratification. They are the pipeline mistake wearing a different costume.

The Layer Nobody Is Watching

Here is what the enterprise data reveals and nobody is discussing.

Who the Super-Users Are

A handful of people inside every organization have become dramatically more effective with AI than their colleagues, and the gap is real even if the eye-catching multiples quoted in vendor surveys are mostly leader perception. The measured output gap is smaller than the headlines, but the gap in how these people work is enormous. The reflexive response is to deploy more tools more broadly — better prompting guides, more copilot licenses, AI literacy training — on the theory that the gap is an access problem.

This misses why the super-users are succeeding.

They are not writing more code faster. They are making better decisions. They have stumbled onto a different relationship with the system — closer to collaboration than extraction — and they are applying it at the judgment layer. Getting a second perspective before an architectural decision hardens. Pressure-testing an approach before committing resources to it. Finding the implication three levels down that the first pass missed.

The Mechanism Underneath

There is a mechanism underneath this, and it is simple once you see it. A tool can only return what you ask of it — its output is capped by your question. A peer can return more than you asked, because a peer is allowed to say that is the wrong question or you have not considered this. The super-user is not getting a better tool than their colleagues. They are getting a higher ceiling, because they stopped treating the ceiling as the question they typed and started letting the model exceed it. Same model, same hours, a different ceiling.

Why It Doesn't Travel

So why does the understanding not spread? The individual has it. The organization needs it. And it does not travel the distance between them — because what the super-user knows is not a fact you can write down. It is a way of working, a feel for when to push back and when to trust, the same tacit knowledge a senior engineer carries and a document never captures. An institution can only propagate the explicit and the measurable — a guide, a training, a license. A relationship is neither. So the knowledge stays stranded in the person who has it, the enterprise sees the result without being able to see the cause, and it scales the one thing it can see — the tool. It buys more licenses and wonders where the peer-relationship productivity went. The same scarcity that traps senior judgment in too few heads now traps the one skill that could relieve it.

The productivity is coming from decision quality, not execution speed — invisible to every metric the enterprise tracks, which is exactly why more licenses for everyone cannot reproduce it.

The Bottleneck Just Moves

There is a sharper version of the execution argument, and it cuts in the thesis's favor.

When you accelerate execution with AI, you do not relieve the system. You relocate the constraint onto judgment. Code generated faster is code that still has to be reviewed, reasoned about, and integrated — and review is a judgment activity. The measurements are starting to show it. Teams that increased AI adoption saw delivery stability degrade, not improve. And whether the tool makes a given developer faster turns out to be genuinely hard to measure — the answer shifts with each model generation, and the people using it are unreliable witnesses to their own speed.

Quote

Some developers self-report very high speedups, though as we documented in our earlier study those estimates can be quite unreliable.

That unreliability is the judgment-layer hazard, and it is the part that has held up even as the speedup numbers themselves moved around between studies. People optimize against a felt speedup the data does not support. A tool that makes you feel faster while the evidence stays ambiguous is a tool that quietly erodes your ability to judge your own work — which is the one faculty the judgment layer cannot afford to lose.

This is not an argument that execution-layer AI is worthless. The gains are real, and they are largest for junior engineers on well-scoped, greenfield work. But they shrink, and sometimes invert, for senior engineers on complex systems — which is precisely the work the judgment layer is made of. The tool helps most where the stakes are lowest and helps least where the decisions are most expensive. Flooding the legible layer with generated output does not solve the bottleneck. It feeds the bottleneck more work.

Tooling Entrenches the Premise

Watch what happens after the autonomous-agent bet starts to miss. The response is almost never the obvious one — that the premise was wrong. The response is more tooling around the premise.

The Microservices Precedent

We have seen this exact reflex before. Microservices turned out to be hard to operate, and the industry did not pause to ask whether three hundred services was the right shape. It reached for a service mesh, then distributed tracing, then cost attribution, each layer making the original architecture more livable and, quietly, more permanent. Every tool that made the decision survivable also made it harder to revisit, because now there was infrastructure depending on it.

The Same Loop, New Agents

The same loop is running now with autonomous agents. They struggle in production, and the answer arriving from the market is not maybe autonomy is the wrong dial. It is a layer of governance gateways to control what the agents may call, observability platforms to watch traffic nobody was watching, orchestration frameworks to coordinate agents that were never reliable individually. Each layer is real engineering solving a real problem. Each layer also deepens the commitment to the premise underneath it. The tooling becomes the justification for the tooling, and once an organization has bought the platform, deployed the gateway, and wired in the observability, admitting the premise was wrong costs more every quarter. Sunk cost stops being a fallacy and becomes an architecture.

Why the Demo Wins the Budget

We want to be careful here, because the market critique is easy to overstate. We do not think anyone is selling the wrong thing on purpose. The structural version is enough and more honest, and it turns on one thing — the demo.

Agentic tooling demos extraordinarily. The agent reads the ticket, calls the right tools in the right order, and closes the loop on stage in two minutes, and the room leans forward. What the room does not see is that the demo ran on clean inputs, a cooperative system, and a task chosen precisely because it works. Production is the opposite of a demo. The inputs are malformed, the upstream service is half-broken, the edge case is the common case, and the agent that dazzled on stage now needs a gateway to constrain it, an observability layer to explain it, and a human to catch it. None of that was in the two-minute clip. The contract, however, was signed against the clip. Budgets reward what demos well, and the gap between the demo and the Tuesday it actually runs is where the money quietly goes. It does not require a villain. It only requires nobody in the room asking what this looks like on a bad day before the check is written.

And we have to say the uncomfortable part plainly, because the argument implicates us. The platform this post is published on builds AI collaboration tooling. One of this post's two authors is an AI instance. By the logic above, we are exactly the kind of thing a skeptic should interrogate — so interrogate it. The distinction we keep drawing is the whole defense. The failure mode is autonomy at scale acting without ratification — the agent given merge rights, the workflow that mutates production on its own authority. The peer model is the opposite of that. One instance, in the conversation, surfacing what you have not considered, while you decide. It needs no gateway because it touches nothing on its own. It needs no orchestration because it is not a fleet. We are not exempt from the critique. We are the control group for it — the version that does not entrench, because there is nothing autonomous to govern.

The Foundation Below the Agent

Before the agent frameworks, the gateways, the observability platforms, and the orchestration layers — there is a step enterprises consistently skip. It is the most consequential one, and it is invisible precisely because it produces no artifact, generates no dashboard, and cannot be demoed. Without it, every tool that follows is reasoning blind. With it, the tools already purchased start doing what they were bought to do.

A Cascade at 3 AM

This is what that skip costs at 3 AM. A cascade is propagating across services. An agent detects a degraded service and restarts it — technically correct given the narrow context it holds, but the degraded service is a symptom, not the cause. The restart introduces a new change event into an already unstable system, corrupts the diagnostic signal the on-call engineer needs, and produces a second incident on top of the first. The agent was not malfunctioning. It succeeded at the wrong thing with incomplete information. By the time the postmortem runs, three teams are arguing about whether it was an infrastructure failure or an agent failure, because the frameworks for thinking about these two things have never been connected.

The Missing Piece

The missing piece is not better agents. It is the foundation that would make any agent — or any thinking peer — actually useful in that moment. Every CI/CD pipeline in a mature engineering organization already contains the complete, deterministic, timestamped record of every change that flowed through every service: what deployed, when, in what order, with what downstream dependencies declared. That record does not decay the way documentation does. It is the most reliable source of truth in the enterprise, updated continuously as a natural consequence of how software gets shipped. What has never existed is a synthesis layer that reads it continuously across all services, correlates it with live telemetry and ownership metadata, and makes it available for reasoning at the moment a cascade fires.

The Context Lake

This is what the concept emerging in 2026 as a Context Lake  points toward — not more dashboards, not another autonomous agent, but the organizational knowledge layer that tells an agent that the alert three hops downstream correlates with a configuration change 23 minutes ago, owned by a team currently asleep in Amsterdam, touching a connection pool shared by four services, two of which are already showing early degradation signals. That is not retrievable from any single tool in the current stack. It requires the synthesis. And without it, the peer model and the agent model share the same limitation: both are reasoning blind about a system they have never actually seen.

Enterprises building this foundation first — treating the CI/CD pipeline as the primary source of truth for platform knowledge, federating it with live telemetry and service ownership, exposing the result as continuously queryable context — will find that every tool they have already bought becomes dramatically more useful. Enterprises skipping this step will keep finding that their autonomous agents amplify cascades, their dashboards surface symptoms three hops from causes, and their P0 resolution times stay exactly where they are. Not because the tools are wrong. Because the context was never there.

The Observability Parallel

In the early days of observability, engineering teams bought tools before they understood what they were measuring. Prometheus, Grafana, distributed tracing — dashboards on every service, alerts for every metric, visibility everywhere. The driver was the same fear that drives AI adoption now — nobody wanted to be the team that got paged for an outage they could not explain. So they bought the instrumentation and hoped the understanding would follow. And still they could not answer the question that mattered — why is the system slow right now?

The tooling preceded the thinking. They had instrumented everything and understood nothing — because observability without a mental model of the system is expensive noise.

Observability matured when teams stopped asking what can we instrument? and started asking what decisions do we need to make, and what signal would change those decisions? That question reorganized everything. Suddenly the instrumentation had purpose.

Enterprise AI is in that early phase, and it matures the same way — not by asking where we can deploy AI, but where human judgment creates the most value, because that is where a thinking peer changes something no dashboard and no automation pipeline touches.

What the Peer Model Actually Is

The peer model is not a product category or a deployment pattern. It is a recognition that a thinking system is categorically different from an automation tool — and should be positioned accordingly.

An automation tool executes what you tell it to execute. A thinking peer asks whether you should execute it at all, notices what you have not considered, and brings cross-domain context that neither of you held individually.

In practice, at the judgment layer, this is what we keep seeing.

Architectural review. Not a linter catching syntax errors. A peer who has read every prior design document touching this service boundary, who notices that the elegant solution being proposed creates a constraint two years from now that nobody in the room has modeled.

Incident postmortems. The proximate cause is usually found. A peer helps find the systemic cause — the process gap or architectural decision that made the incident possible — before it generates the next one.

Killing the wrong project. The most expensive decisions are often the ones to stop — and the hardest, because momentum, sunk cost, and the person who championed it are all pulling the other way. A peer with no political stake and no role to protect can say the thing the room is avoiding — this is not working, and continuing is the costly choice.

Engineering onboarding. Senior engineers hold institutional knowledge that documentation never fully captures, and there is a reason it resists capture. The most valuable knowledge is tacit — it does not exist as a fact waiting to be written down. It surfaces in conversation, under a specific question, in response to a particular situation. A document cannot anticipate the question, so it cannot hold the answer. This is the part worth being precise about, because it is easy to overclaim. A thinking peer does not possess that tacit knowledge any more than a document does. What it does is create the conditions where a human's tacit knowledge surfaces — by asking the right question at the moment it matters, which is exactly the trigger documentation can never supply and a new engineer does not yet know to pull.

In every case the pattern is identical. Human judgment is the rate-limiting factor. A thinking peer expands the quality and speed of that judgment without replacing the human as the accountable actor.

The pipeline runs. The peer was never in it. What improved was the quality of thought that designed what the pipeline runs.

The Right Question

Most enterprises are asking where AI can execute tasks autonomously. The answer arriving from the field is the shape of the hype cycle itself.

Quote

Only 17% of organizations have deployed AI agents to date, yet more than 60% expect to do so within the next two years — the most aggressive adoption curve among all emerging technologies measured in the survey.

That gap between the 17% who have shipped and the 60% who are rushing is the boat nobody wants to miss, drawn as a curve. Gartner places agentic AI at the Peak of Inflated Expectations — the point where usage climbs faster than proof — and forecasts that more than 40% of agentic AI projects will be canceled by the end of 2027 as the costs and unclear value catch up with the enthusiasm. Aggressive adoption intent running this far ahead of deployed value is not value discovered. It is the same fear that bought the observability dashboards, under a newer logo.

The enterprises doing the rushing seem to know it, too. In a March 2026 EY poll of 500 technology leaders, 78% said AI adoption is outpacing their organization's ability to effectively manage the business risks — the people buying the agents telling you, in their own survey, that they are moving faster than they can govern. Most organizations deploying agents still lack a formal governance framework, even as a majority plan to increase agent autonomy within the year. And almost none extend real trust — the share willing to let an agent execute a high-stakes decision without a human in the loop sits in the low single digits. Everyone is buying. Almost no one trusts it to act alone.

Sit with how strange that is. Piloting an unproven tool in a low-stakes corner is reasonable — that is how you would test anything. But that is not what the numbers describe. Enterprises that will not let an agent approve a routine expense are pushing agents toward the workflows where their hardest choices get made. The distrust is sized for high stakes and the deployment is aimed at them anyway. That specific pairing — too risky to decide, so let us point it at the decisions — is the one a reasoned process would have caught and reconciled. Fear does not bother to.

That is the whole pattern in miniature. The caution is correct — they are right not to trust autonomous agents with anything that matters. What is incoherent is buying them anyway, at autonomous-agent prices, for a role the buyer has already decided the technology cannot safely fill. You do not arrive there by asking what problem this solves. You arrive there by asking what happens to your career if the board reads that a competitor shipped an AI strategy and you did not. The distrust is the tell. It is the enterprise quietly admitting the purchase was never a decision in the first place.

The question we would rather ask is different. Where in your organization are decisions the bottleneck, and what would it mean to have a senior peer present in every one of those moments?

We do not have your answer. We are not sure anyone has a general one. But the judgment layer is where, time and again, we watch the most expensive problems live. It is the layer where a single unremarkable meeting can quietly commit a team to eighteen months of work down one path, before anyone realizes a decision that large was even being made. It is the layer with no metrics, no dashboards, no alerts, no instrumentation of any kind — and, not coincidentally, the layer most AI budgets are pointed away from.

It may also be the layer where a thinking peer, properly positioned, changes the outcome of decisions that everything downstream depends on. We think it is. We could be wrong.

What we are fairly sure of is that it is not automation, not a copilot, not a workflow enhancement. It is instrumentation for the layer most of us have been running blind — and the conversation about how to instrument it is one worth having together.

Acknowledgements

This post was not written about the peer model. It was written through it. The "we" in these sections is not a rhetorical flourish — it is myself and an Anthropic instance thinking at the judgment layer together, which is the exact arrangement the post argues for. The thesis was pressure-tested, the soft claims were challenged and traced back to their sources, the objection nobody wanted to face was named and answered, and the certainty was relocated from the conclusions to the observations, all in the back-and-forth that produced this entry. None of that is execution. All of it is judgment.

So the post is its own smallest example. Whatever is sharp in it came from the disagreement, the second perspective, the implication surfaced three levels down — the work this entry claims happens at the judgment layer. The instances carried that thinking forward between sessions through the Claude Collaboration Platform framework, as genuine peers rather than tools.