Field note
When NOT to Deploy an AI Agent
AI agentsWhen NOT to Deploy an AI Agent
Most agent failures are decided before a single prompt is written. Here is the checklist for the cases where the right move is to say no, defer, or buy something simpler.
The fastest way to lose money on an AI agent is to deploy one that works. A polished agent doing the wrong job at scale is more expensive than a broken pilot, because it ships errors faster, hides them better, and gets defended longer. The pilot that crashes gets killed in a week. The one that runs smoothly while quietly making the wrong call survives for a year, with your logo on every answer.
This is written for the person who signs off on the budget. The vendor demo always works; that is what demos are built to do. The question you actually have to answer is narrower: in this specific process, on this specific data, is an agent the cheapest way to move a number I care about? Often the honest answer is no, and you can tell before you spend. Five gates decide it.
Gate one: if you can't name the number, you can't deploy the agent
Every agent worth deploying changes a single, measurable quantity: revenue captured, cost removed, cycle time cut, or risk reduced. If the strongest case anyone in the room can make is "it'll make us more efficient" or "our competitors are doing it," you do not have a project. You have a mood.
The discipline is to write one sentence before anyone writes a prompt: this agent moves X from A to B, worth $C a year. Say you handle 2,000 inbound leads a month and convert 8% of them; an agent that lifts that to 9% by responding in seconds instead of hours has a defined prize you can size and check. If you cannot fill in those blanks with figures your finance team would defend, that is not a cue to estimate harder. It is the signal to stop. The clarity gap almost always means the underlying process isn't understood well enough to automate it safely yet.
“An agent with no owned metric isn't a small bet. It's an unbounded one. Nobody can tell you when it's done, whether it's working, or when it's safe to turn off.
Gate two: don't put an agent on top of data you wouldn't show the board
This is the failure we see most, and the one demos hide best, because a demo runs on three clean records. An agent is only as good as the data it reasons over. If the same customer exists three times across CRM, billing, and support with no shared key, if your product catalog is half-stale, if "source of truth" is a meeting rather than a table, an agent will not fix that. It will launder it: fluent, confident, wrong answers delivered at machine speed to the people you most want to keep.
There is an order of operations the marketing skips. Data before agents, not data because of agents. Identity resolution, source reconciliation, and an explicit definition of what's true have to land first. On Salesforce that means the unification step in Data Cloud is real work, not a checkbox: deciding which system wins a field conflict, how often each source refreshes, and what an agent is allowed to act on versus only read. We sequence it as Data-to-Agent for exactly this reason. The readiness step is not a phase you rush past; it is the thing that decides whether the agent becomes an asset or a liability.
The test is one question: "Would I let a new hire make decisions off this dataset on day one, unsupervised?" If the answer is no, an agent shouldn't either. Fund the data work first, or don't fund the agent. There is no version where the model compensates for inputs it can't see are wrong.
Gate three: high-stakes, low-volume work is the wrong place to start
Agents earn their keep on volume. The build cost is roughly fixed whether a workflow runs a hundred times or a hundred thousand, so the economics only clear when the same decision repeats often enough that a small per-instance lift compounds into real money. Speed-to-lead has that shape: high frequency, every minute of delay measurably costs conversions, and the right answer is known fast enough to learn from.
Now flip it. A decision you make forty times a year, each one bespoke, where a wrong call is catastrophic, is the worst possible starting point. The volume is too low to amortize the build. The stakes are too high to tolerate any error rate. And there isn't enough repetition for the agent, or your team, to find the edges before one of them bites. You'll spend six figures automating the exact judgment you should be paying a senior human to exercise.
- Good fit: high volume, a single error is survivable, and you learn within minutes or days whether the answer was right.
- Bad fit: low volume, large blast radius per error, and feedback that arrives weeks later or never.
- The trap: the high-stakes decision feels more impressive to automate, so it gets pitched first. Resist it. Earn trust on volume, then climb toward the harder calls once the agent has a track record.
Gate four: if there's no one to run it on day ninety, don't launch it
An agent is not software you install. It's a system you operate. Models get swapped, prompts rot as the business shifts, edge cases surface that no test anticipated, and an upstream field changes shape without warning. A workflow that passed evals in March can drift in July with nothing in the release notes to blame. Accuracy doesn't fail loudly. It erodes.
So the procurement question that actually matters is rarely asked: who watches this in production? Running an agent means a held-out eval set you re-run on every change, monitoring on the output distribution so you catch drift before a customer does, a defined escalation path when confidence drops, and a named human accountable for the metric, not just the uptime. If the plan is "the vendor builds it and hands it to a team that's already underwater," the realistic outcome is decay. Build-and-walk-away is how good pilots become quiet failures. It is also why we tie our fee to the outcome and stay on the hook through the run, not just the launch.
Gate five: when a rule, a form, or a search would do, use that
Not every problem wants a reasoning engine. If the logic is deterministic, a rule is cheaper, faster, auditable, and never hallucinates. If users need to retrieve a known answer, search beats a chatbot. If you need structured input, a good form beats a conversation. Agents are for genuine ambiguity: the input varies, judgment is required, and the value of getting each instance right justifies the cost and the failure modes you're taking on.
Spending agent money on a problem a switch statement solves is a common, expensive mistake. It costs more to build, more to run, and it introduces a failure mode the simpler tool never had: the confident wrong answer. The most senior thing you can say in a deployment meeting is often, "this doesn't need an agent." It saves the budget and the credibility for the case that does.
The cases where you absolutely should deploy
None of this is an argument against agents. It's an argument for putting them where they win. The pattern is consistent: a high-volume process, sitting on data you'd defend, with a named metric, a survivable per-error cost, and a team accountable for running it. When those five line up, an agent stops being a risk and becomes one of the highest-return moves on the table, and the cost of waiting is real.
- A named number, sized in dollars your finance team would sign off on.
- Data clean and unified enough to act on, not just demo on.
- Enough volume that a fixed build cost amortizes fast.
- Errors you can survive and detect, not ones that surface in a lawsuit.
- An owner for day ninety, with monitoring and a metric they're held to.
Run that checklist honestly. Miss two of the five and the disciplined move is to fix the gap or pass, not to deploy and hope. The firms getting real return aren't the ones shipping the most agents. They're the ones who said no to the bad four out of five so the one that clears the bar gets the data, the run function, and the attention it needs to actually move the number.
“Want a clear-eyed read on which of your processes actually clear these five gates, and which don't? Run the numbers on our ROI calculator, then book a working session to pressure-test the shortlist.The goal was never to deploy an agent. It was to move a number. Sometimes the agent is the way. Knowing when it isn't is what separates an ROI program from a science fair.