Field note
How to Run a Salesforce Data-Readiness Audit (DIY)
SalesforceHow to Run a Salesforce Data-Readiness Audit (DIY)
A runnable, agent-specific audit you can do in your own org this week with reports, SOQL, and a spreadsheet. Readiness isn't a property of your data; it's a property of the one decision you're asking an agent to make.
Most data-readiness audits fail before they start because they ask a question with no answer: "is our data clean?" Clean for what? An org that runs a sales pipeline flawlessly for ten years can be hopelessly unready for an agent that drafts renewal quotes — same data, different verdict. Readiness is not a property of your data. It is a property of the specific decision you are asking an agent to make. Get that backwards and you'll spend a quarter scrubbing fields the agent never reads while it trips over the three it does.
So the first move in a real audit is not opening a profiler. It is writing one sentence: the exact action the agent will take, and the exact fields it must read to take it. Everything below is scoped to that sentence. This is a DIY guide — no new tooling, no procurement cycle. Reports, SOQL, the developer console, and a spreadsheet will get you to a defensible go/no-go call this week.
Step 0: Write the agent's decision sentence
Pick one agent action you actually intend to ship. Not the roadmap — one action. Write it as "Given X, the agent will do Y." For example: "Given an inbound lead, the agent routes it to the right rep and sends a first-touch reply within five minutes." That sentence names your inputs — lead source, product interest, territory, current owner — and your output, a routing decision plus a message. Now you have a closed set of fields. The audit cares about those and nothing else; the ones outside the sentence can stay messy and you'll guardrail around them later. This scoping is the front of our Data-to-Agent method, and it's the cheapest insurance you can buy: a global "clean up Salesforce" initiative can run for a year and still not tell you whether one agent can ship, while auditing the eight fields that agent touches takes an afternoon — and tells you exactly that.
“If you can't name the fields the agent reads to make its decision, you are not ready to audit. You are ready to scope.
Step 1: Populated isn't the same as trustworthy
The classic audit measures fill rate — what percentage of records carry a value. Fill rate is necessary, and it lies. A field that's 100% populated because a validation rule forces reps to pick the first picklist option is more dangerous than an empty one, because the agent will read it as signal and act on it. So pair every fill rate with a distribution. For each field in the decision sentence, run a GROUP BY: `SELECT Industry, COUNT(Id) total FROM Account GROUP BY Industry ORDER BY COUNT(Id) DESC`, and read the shape.
Three tells repeat across orgs. One value holding 70-80% of records is usually a default masquerading as data — check whether it's the first picklist entry or the integration's fallback. A long tail of near-duplicates ("USA," "U.S.," "United States," "us") means the agent will mis-segment on a string match. And a restricted picklist with free-text spillover in a paired field means your schema already lost a fight with reality, and the real value lives somewhere the model isn't looking.
- Fill rate per field — but only the fields in the decision sentence.
- Value distribution via GROUP BY — hunt for a dominant default and near-duplicate variants.
- Recency: when the field itself was last edited (via field history), not the record's LastModifiedDate, which moves every time anything on the record changes.
- Provenance: is this field written by a human, an integration, or a formula? Each fails differently, and the agent can't tell them apart.
Step 2: Test referential integrity, because agents follow relationships
Humans tolerate broken joins. They see a blank account name, shrug, and keep working. An agent traversing Lead to Account to Opportunity to Contract does not shrug — it either errors out mid-action or, worse, silently stitches the wrong records together and proceeds with full confidence. So audit the joins, not just the fields. Count the orphans your agent will walk into: leads with no associated campaign, opportunities with zero OpportunityContactRole rows, accounts whose Owner is an inactive user.
The OpportunityContactRole gap is the one that catches almost everyone. Reps close deals without ever marking who the buyer was, so an agent told to "follow up with the decision-maker" has no decision-maker to follow — it either stalls or guesses. One query surfaces it: count opportunities in your target stages that have no contact-role rows. Quantify that before you build, not after the agent emails a renewal to a champion who left the company two years ago.
Step 3: Find the duplicates the agent will act on twice
Duplicate accounts and contacts are a reporting nuisance for a human. For an agent they are an action multiplier — two records become two outreach messages, one annoyed prospect, and one nick in your sending-domain reputation. Lean on your existing matching and duplicate rules as a first signal, then verify by hand, because those rules are tuned for data entry, not for autonomous action. Pull 50 records the agent would actually touch and check how many have a near-twin under a looser eye. If more than a couple do, dedupe is a precondition for shipping, not a backlog item.
The honest part the readiness checklists skip: you almost never need a clean org. You need the slice the agent touches to be clean, and you need a measured error rate on the rest so you can put a proportionate guardrail in front of it. "Clean everything first" is how readiness projects die without an agent ever shipping.
Step 4: Score timeliness against the agent's clock
Data has a shelf life relative to the action, and the action sets the clock. A speed-to-lead agent that must answer in minutes cannot lean on an enrichment field refreshed by a nightly batch — by the time the agent reads it, the value is stale, or for a brand-new lead, simply not there yet. So map each input field to its write latency: how long after the trigger event does a correct value actually land? Then compare that latency to the agent's required response time. Any field slower than the agent is a silent failure waiting to fire. In our Green Subsidy solar work, the entire value of the agent was acting on a fresh inbound the moment it arrived — which only holds if every field it reads is written synchronously, before the agent does, not on an overnight job.
Step 5: If Data Cloud is in the picture, audit the resolution, not the records
Data Cloud changes the audit because the agent usually reads a unified profile, not a single object. The readiness question shifts from "is this field clean?" to "does identity resolution merge the right records, and only those?" You can't answer that from aggregates — you have to inspect resolution by hand. Pick ten known customers, find each one's unified individual, and verify two things: every source record that should have merged did, and nothing foreign got pulled in. Then confirm the calculated insights and data graphs the agent consumes are refreshing on a cadence the agent can live with, using the same clock test as Step 4.
- Match-rule precision: are distinct people being merged into one profile? False merges are catastrophic — the agent acts for the wrong human.
- Match-rule recall: are one person's records staying split across profiles? Fragmentation starves the agent of context it assumes it has.
- Data-stream freshness versus the agent's clock — the latency map from Step 4, applied to the streams feeding the profile.
- Field mapping: confirm the source field the agent thinks it reads is the one actually mapped into the DMO the model queries. Renames and remappings break this quietly.
Step 6: Turn findings into a go / guardrail / no-go call
An audit that ends in a spreadsheet of red cells is a report, not a decision. End yours with a verdict per field. Green: the agent reads it directly. Yellow: it ships behind a guardrail — a confidence threshold, a human-in-the-loop checkpoint, or a fallback path when the field is missing or stale. Red: the agent must not depend on it until the data is fixed, full stop. That table is the audit's actual deliverable, and it's where the build, run, and accountability disciplines meet. You cannot stand behind an agent's outcomes if you never priced the data risk going in — the verdicts are how you price it.
“Don't fix the whole org. Decide, field by field: can the agent trust this today, can it trust it behind a guardrail, or not at all?
What this actually saves you
Teams that skip the agent-specific audit don't avoid the cost — they defer it to production, in front of customers, the first time the agent confidently does the wrong thing on bad data. A scoped two-day audit converts that hidden cost into a list of known risks you chose deliberately. For a buyer that's the entire point: you can't tie a fee to an outcome you can't predict, and you can't predict an outcome that rests on data nobody measured. Readiness work isn't overhead in front of the ROI. It is the part of the ROI you actually control.
Want a second set of eyes on your decision sentence and the fields under it? Book a working session and we'll pressure-test your agent's data readiness together — and tell you honestly which fields are green, yellow, and red.