Field note

Why Most AI Pilots Never Reach Production

May 21, 2026Akshit Kandi

#AI ROI#AI agents#production AI#data strategy#executive strategy

AI ROI

Why Most AI Pilots Never Reach Production

Most AI pilots stall for a structural reason, not a technical one: the pilot was scoped as an experiment instead of the first slice of a production system, and no one owned the gap between the two.

A pilot that dazzles in a demo and dies on the way to production is rarely a technology failure. The model did its job. The org never funded the thing that was supposed to carry the model into the business, and no one was on the hook for building it. That gap sits on no org chart, which is exactly why it doesn't get crossed.

If you run the P&L, this is the pattern worth understanding, because you may be funding a version of it right now. The interesting question isn't whether pilots stall at a high rate — they do. It's why the same capable teams keep producing the same dead pilot, and what you can change about the structure before the check is written rather than after the post-mortem. None of it requires you to become an ML expert.

A pilot proves the wrong thing

Most pilots are built to answer one question: can the model do the task? On a curated set of inputs, in a sandbox, with an engineer watching, the answer is almost always yes. That's a real result. It's also the cheap part of the problem, and it tells you almost nothing about the expensive part.

Production asks a harder question: can this run unattended, on messy live data, inside your actual process, at a unit cost below the value it creates, without a human reviewing every output? A pilot that proves capability and skips that second question has proven the wrong thing. It manufactures conviction without manufacturing evidence — and conviction is what gets the next pilot funded instead of the first product shipped.

“
The demo answers 'is it possible.' The business needs 'is it worth running, unattended, at our volume.' Those are not the same project, and the gap between them is where the money quietly goes.

The 80% nobody scoped

When a pilot stalls, the post-mortem usually blames accuracy or hallucination. Sometimes that's real. More often the model is fine and the system around it was never built. Here's what tends to live in the unscoped majority of the work:

Data plumbing: the pilot ran on a hand-cleaned extract. Production needs live, governed, permissioned data flowing in continuously — and someone accountable when a field changes shape upstream.
Integration: the output has to land in a CRM record, a case queue, a workflow step — not a chat window a person copies from and pastes into the real system.
The long tail: the weird inputs that never appeared in the demo set but show up every Tuesday at real volume, and decide whether the average output is trustworthy.
Observability and rollback: someone has to detect when quality drifts, trace why, and turn the agent off without a fire drill or a war room.
Ownership: a named human accountable for the agent's output the way they'd be accountable for a direct report's — not a committee that meets after it breaks.

None of this is glamorous and none of it demos well, which is precisely why it falls out of scope. All of it is the difference between a slide and a system. The pilot budget covered the slide and called the rest a future phase that never gets a number.

Why it actually dies at the data layer

Trace stalled pilots back far enough and a large share end in the same place: the data wasn't ready, and nobody owned making it ready. The model is the most portable part of the entire stack — you can swap it next quarter for a better one. Your data is the part that's specific to you: its structure, its freshness, its permission model, and the wiring that delivers the right record to the agent at the right moment. That's the part that doesn't come in a vendor's box.

So the order matters: data before agents, not agents before data. A pilot launched on a brittle data layer can look brilliant in a controlled run and become unworkable the moment it reads and writes against your live systems of record — where identity is duplicated across objects, where a status field means three different things across three teams, where access has to respect who's allowed to see what. The agent didn't get worse. It finally met your data.

“
An AI agent is a fast, tireless worker with no instinct for your data. Give it clean, current, well-governed inputs and the gains compound. Give it the mess and it automates the mess at scale, faster than you can audit it.

The incentive that keeps pilots small

There's a structural reason this repeats, and it isn't incompetence. A pilot is cheap to approve and safe to own: no production sign-off, no security review, no change to anyone's day job. So organizations optimize for what's easy to start over what's hard to finish, and accumulate a portfolio of pilots and a shortage of products — each new pilot feeling like progress while the deployed count stays at zero. The vendor incentive points the same direction. Many AI engagements are priced to ship a build and walk; the hard, unglamorous run — the monitoring, the drift, the edge cases at 2 a.m. — gets left to the client precisely because that's where the labor is. The engagement ends right where the value would have started.

What you can change before the check clears

Four things move the failure rate, and every one of them is decided in the budget conversation, not in model selection:

Fund the slice, not the science fair. Scope the first increment as a thin vertical that touches real data and a real workflow end to end — narrow, but production-shaped — instead of a wide capability demo that touches nothing downstream.
Name the production owner on day one. If no one will own this agent once it's live, you're funding a learning exercise. That can be worth doing — but price it as one and cap it, so it can't quietly become the plan.
Write the kill criteria up front. Define the number it must move and the date by which it must move it. A pilot without a kill switch becomes a permanent science project that absorbs budget and produces slides.
Tie cost to outcome. The closer the spend tracks the result, the faster the unscoped 80% gets scoped — because the run is now someone's paid job, not someone's optional follow-on.

That last point is the one we built our firm around. When the provider's fee is tied to the client's return, the gap between pilot and production stops being the client's problem to find out about later. It becomes the provider's problem to solve up front — the data wiring, the monitoring, the ownership — because nobody gets paid for a slide that never runs.

An illustrative way to size it

Say a team responds to inbound leads in an average of several hours, and you have reason to believe — from your own funnel, not a benchmark deck — that speed to first contact moves conversion. An agent that drafts and routes the first response in under a minute is a clean candidate. Not because it's clever, but because the value is measurable against a baseline you already track and the workflow is concrete. That's roughly the shape of our Green Subsidy solar engagement: speed-to-lead, wired to a number the business was already watching.

Notice what makes it fundable. There's a baseline. There's a metric the business already trusts. There's a workflow the agent plugs into and a record it writes back to. You could state the kill criteria in one sentence. If you can't do that for your pilot, the problem isn't the model — it's that you have a curiosity, not yet a production case. Far cheaper to learn that before the check than after the post-mortem.

The reframe

Stop asking whether AI can do the task. It probably can. Start asking whether you've funded the system that carries it into the business and named the owner who keeps it there. Pilots don't fail because the technology fell short; they fail because they were never the first step of anything — just a step that proved a point and stopped one move short of mattering. Design the pilot as the first production increment, with an owner, a number, and a kill date, and the failure rate you keep reading about stops being a statistic about other companies and starts being a decision you already made differently.

Have a pilot that demos well and won't ship? Bring it to us. We'll pressure-test whether it's a production case or a curiosity, and tell you what it would take to make it pay.

Newer

How to Write Guardrails for a Customer-Facing AI Agent

Older

How to Unify Data Across Systems With Data Cloud