Field note
A Change-Management Playbook for AI Adoption
change managementA Change-Management Playbook for AI Adoption
Most AI rollouts fail at adoption, not at the model. Here is a field playbook for getting people to actually use the agents you build, structured around the one variable that predicts whether they will: calibrated trust.
The agent works in the demo. It passes eval. It clears the pilot. Then it ships, and three weeks later the dashboard shows most of the team has quietly gone back to doing the task by hand. Nobody filed a ticket. Nobody complained at the all-hands. They just stopped using it.
This is the most common way AI projects die, and it almost never shows up in the post-mortem as a change-management failure. It gets logged as "accuracy wasn't there" or "the use case was wrong." Usually the model was fine. Adoption was the thing that broke — and because nobody instrumented adoption, nobody can prove it. Change management for an agent is also not the same problem as a CRM migration or a new expense tool: with a new tool you ask people to learn a different set of buttons; with an agent you ask them to delegate judgment to software and stay accountable for the result anyway. That is a harder ask, and the generic adoption playbook does not cover it. This is the version that does.
The real adoption curve is a trust curve
Set aside the standard "awareness, desire, knowledge" change models for a moment. For agents, the single variable that predicts whether someone keeps using the thing is trust — and trust moves through three distinct states you have to manage separately, because the lever that helps in one actively hurts in another.
- Verify everything: the user re-checks every output by hand. Net productivity is often negative here — they are doing the work twice. This phase is unavoidable; budget for it instead of pretending it away.
- Spot-check: the user trusts the common cases and inspects only the unusual ones. This is where real productivity shows up, and where the durable value lives.
- Over-trust: the user stops checking entirely, including the cases the agent gets wrong. This is not the goal. An agent that is trusted blindly is a liability, especially in regulated or revenue-facing work.
The job of change management is to move people from state one to state two quickly, and to keep them out of state three. Most rollouts get this backwards: they push for blanket trust on day one ("just let the AI handle it"), a user gets burned by an early miss, and they snap back to verifying everything — permanently. A single bad experience in week one can cost you months of adoption. Calibrated trust, not maximal trust, is the target.
Pick the first use case for its blast radius, not its wow factor
Engineers and execs both tend to choose the flashiest possible first agent — the one that demos well to the board. That instinct is wrong. The first production agent should be chosen to make trust cheap to build, which means three properties, all of them present at once:
- The output is verifiable in seconds. A drafted email a rep can read and approve beats an autonomous pricing decision they cannot audit.
- A wrong answer is recoverable, not catastrophic. You want failures that are annoying, not failures that lose a deal or breach a policy.
- The user already does the task, so they can judge quality. Do not debut an agent on work nobody on the team understands well enough to supervise.
In our Green Subsidy solar engagement, the entry point was speed-to-lead: getting a qualified human response to an inbound lead faster. It is a good first case precisely because everyone in the building already knows what a good lead response looks like, and can tell within minutes whether the agent helped or hurt. They can supervise it on day one. That is what makes the trust curve climbable — and it is the opposite of starting with the autonomous decision nobody can check.
“Choose your first agent the way you would choose a first case for a new surgeon: not the hardest one you can find, but the one where the team can clearly see it went well.
Design the human's new job before you design the agent's
Here is the part most implementations skip. When you put an agent into a workflow, you are not removing a human task — you are converting it. The rep who used to draft the email now reviews a draft; the analyst who pulled the report now validates one. That is a real role change, and if you do not name it, people experience it as "my job is being eroded" instead of "my job moved up a level." So write the new role description explicitly. What does the human now own? Usually the exceptions, the edge cases, the relationship, and accountability for the final output. Make that the prestigious part of the job, because it genuinely is: the agent handles volume, the human handles judgment. People adopt tools that make them more valuable and quietly sabotage tools that make them feel replaceable — and they are rational to do so. Let compensation and recognition follow the new shape of the work, or the org chart will keep rewarding the task you just automated.
Instrument adoption, not just accuracy
For the architects reading this: your observability stack is probably measuring the wrong things for adoption. Model accuracy, latency, and token cost tell you the agent is healthy. They tell you nothing about whether anyone is using it. You need a separate adoption telemetry layer, wired in from day one, keyed on the user and not just the request.
- Override rate: how often does a user edit or reject the agent's output? Falling over time means trust is building. A flat-high rate means the agent is not earning it.
- Abandonment: users who tried it, then stopped. This is your silent-quit signal — the people who slipped back to manual without telling anyone.
- Bypass paths: are people routing around the agent entirely? If a manual fallback exists, instrument it and watch how often it gets used.
- Time-to-confidence: how many interactions before a given user moves from verify-everything to spot-check? Shorter is better, and it varies a lot by team and by role.
Treat these as first-class SLOs alongside the model metrics. An agent at 95% accuracy with a 70% abandonment rate is a failed deployment, full stop — and only the adoption telemetry will tell you that, because the model dashboard will look green the whole time. This is also why "build it and hand it over" is the wrong operating model: the signals that determine ROI only appear after launch, under real load, with real users. Someone has to be accountable for them continuously, not just at go-live.
Make the feedback loop visible, or trust stalls
When a user corrects an agent and nothing visibly changes, they learn that correcting it is pointless — so they stop, and you lose both the signal and their engagement. The most underrated adoption mechanic is closing that loop out loud. When a correction leads to a prompt change, a new guardrail, or a retrieval fix, tell the person who flagged it: "You caught the agent misclassifying renewal emails — that is fixed as of this week." This costs almost nothing and changes the psychology entirely. The user goes from babysitting dumb software to training a system that listens. That second framing is the engine of durable adoption — and it only works if the data layer underneath the agent is wired to absorb the feedback, which is the unglamorous reason data readiness precedes agent quality every single time. A correction the pipeline cannot act on is a promise you will break twice.
Sequence the rollout: champions, then the skeptic, never everyone
Big-bang rollouts maximize the number of people who can have a bad first experience simultaneously. Stage it instead. Start with a small group of genuine volunteers — not your most senior people, your most curious ones — and let them hit the rough edges while expectations are low and the relationship is forgiving. Fix what they find before anyone else sees it. Then comes the non-obvious move: recruit a vocal skeptic next, not the rest of the enthusiasts. A skeptic who converts becomes your most credible internal advocate, because everyone knows they were not predisposed to like it. Enthusiasts convince no one; they were always going to nod along. Win the person who folded their arms in the kickoff and you have won the room.
Write down what the agent is not allowed to do
Counterintuitively, the fastest way to build trust in an autonomous system is to publish its limits. A one-page operating boundary — what the agent decides on its own, what it must escalate, what it will never touch — does more for adoption than any accuracy claim. It tells the cautious user exactly where their judgment is still required, which lets them relax everywhere else. That document is also your governance artifact and your accountability anchor. When something does go wrong — and it will — "was this inside the boundary we agreed to?" is a far healthier first question than a hunt for blame. For executives, it is the difference between a managed risk and an open-ended one. Adoption and governance turn out to be the same discipline viewed from two angles.
The uncomfortable truth: adoption is an operating cost, not a launch event
The version of change management most vendors sell ends at go-live: training delivered, comms sent, project closed. But every signal that matters — override rate, abandonment, time-to-confidence — only moves after launch, and it keeps moving as the data drifts, the team turns over, and the edge cases evolve. An agent that was well-adopted in Q1 can quietly decay by Q3 if nobody is watching the trust curve. That is why we tie our fee to the outcome and stay on to run the thing: it forces the discipline of caring about adoption long after the impressive demo, because the demo was never where the return lived. If your rollout plan ends at launch day, that is the gap to close first — with us or without us.
Planning an AI rollout and want a candid read on where adoption is likely to break? Book a working session with us.