Field note
How to Get Your Salesforce Data AI-Ready in 30 Days
Data CloudHow to Get Your Salesforce Data AI-Ready in 30 Days
A 30-day plan for Salesforce architects that scopes data readiness around what an agent actually reads at runtime — identity resolution and grounded retrieval — instead of a boil-the-ocean cleanup that never ships.
Most "get your data AI-ready" projects fail the same way. Someone opens a data-quality report, sees 40% of accounts missing an industry code, and declares a six-month cleanup before any agent can ship. Six months later the cleanup is at 60%, the budget is gone, and there is still no agent in production.
Here is the part the readiness frameworks skip. An agent does not read your whole org. It reads a narrow slice — the records, fields, and documents it retrieves to answer one specific question. AI readiness is not a property of your data. It is a property of the path between a user's question and the answer. Scope the work that way and 30 days is enough to put a real agent in front of real users.
Stop fixing fields. Start fixing retrieval paths.
The instinct from years of reporting and integration work is to think in tables: clean the Account object, dedupe Contacts, backfill the picklists. That is the wrong unit. An Agentforce agent works backward from an intent — "what's the status of my order," "summarize this account before my call" — and touches only the data on that intent's path.
So your first move is not a quality audit. It is to list the five to ten questions your agent will actually answer, then trace each one to the exact objects, fields, and unstructured sources it reads. That trace is your scope. Everything off the path can wait. A field that is 40% null doesn't matter if no agent reads it; a field that is 5% null but sits on a hot path can poison every answer that depends on it.
“Readiness is measured per question, not per table. The smallest useful unit is one retrieval path you can ground and trust.
Week 1: Identity resolution before anything else
The single biggest difference between a Data Cloud project and the integrations you've done before is that the unit of work is the resolved entity, not the row. If your agent can't reliably say "this lead, this contact, and this case are the same human," every downstream answer is built on sand. Splintered identity is how an agent confidently tells a customer they have no open tickets while three sit under a duplicate profile.
Spend week one defining unification on the entities your hot paths touch — usually Individual and Account. Decide where deterministic match keys (email, normalized phone, a stable external ID) carry the load and where you're forced into fuzzier rules, because the probabilistic matches are exactly the ones that misfire. Set reconciliation order for conflicting fields so you know which source wins when CRM and a billing sync disagree. Then pressure-test against the records you already know are messy: the same company under three spellings, the personal email that breaks the join. Don't chase a perfect golden record across the org. Aim for trustworthy identity on the paths, and nowhere else yet.
Week 2: Ground the unstructured data — and govern what's visible
Half the answers a useful agent gives come from text, not fields: knowledge articles, PDFs, email bodies, case comments. Getting these AI-ready means making them retrievable by meaning, which in Data Cloud means modeling them as a search index your agent can ground against. This is where retrieval-augmented generation stops being a buzzword and becomes a config decision you own.
Two things determine quality here, and both are boring on purpose. First, chunking: a 60-page PDF retrieved whole is noise; the same PDF split into titled sections is signal, because the retriever can return the one section that answers the question instead of burying it. Second, freshness: a vector index built once and never refreshed will confidently cite a policy you retired last quarter. Decide the refresh cadence now, before launch, not after the first wrong answer reaches a customer.
- Index only the documents on your traced paths — not the whole knowledge base. Scope is what keeps retrieval relevant.
- Carry source metadata (last-updated, owner, record link) into each chunk so every answer is auditable back to a source.
- Respect sharing rules and field-level security in retrieval. An agent that retrieves past your permission model is a data breach with a friendly tone.
That last point is non-negotiable for architects. The agent acts with the running user's visibility, so confirm grounding queries honor sharing and FLS exactly as a report run by that user would — and test it with a low-privilege user, not an admin. Convenience is never a reason to widen what the agent can see.
Week 3: Write the data contract the agent reads from
An agent reads your field metadata the way a new hire reads a glossary. A field literally named "Status2__c" with no description tells the model nothing, so the model guesses — and guesses are how you get answers that are plausible and wrong. Treat the metadata the agent sees — field labels, descriptions, and the retrievers and instructions you expose — as the contract between your data and the model. It is, quite literally, part of the prompt.
In week three, write descriptions for every field and object on the hot paths: what the field means, its allowed values, and when it's null versus zero. Then define each retriever narrowly — one job, one clear name, an explicit set of returnable fields, a built-in filter. A retriever scoped to "open cases for this account, last 90 days" beats a generic "query anything" tool every time, because a bounded tool is one you can actually test, and one the model can't misuse to wander off-path.
“If a sharp new analyst couldn't answer the question from your field descriptions alone, neither can the model. The metadata is the prompt.
Week 4: Evaluate honestly, then watch it in production
Readiness isn't a checkbox you tick — it's a number you can show. Before launch, build a small evaluation set: 30 to 50 real questions with known-correct answers, drawn from actual users, including the awkward edge cases. Run the agent against them and score two things separately. Groundedness: did every claim trace to a record or document it actually retrieved, with no invented detail? Accuracy: given what it retrieved, was the answer correct? Splitting them matters, because an answer can be perfectly grounded and still wrong if the retriever pulled the wrong slice. If the agent can't pass your own test set, it is not ready, however clean the dashboard looks.
And readiness decays. New records arrive malformed, a synced source schema drifts, someone bulk-loads duplicates and identity resolution quietly degrades. The teams who stay ready are the ones who keep the eval running on a schedule and alert when groundedness drops below a line they set in advance. This is the unglamorous discipline behind any agent you'd actually trust in front of a customer — and the part most projects skip, because it has no launch date and nothing to demo.
What 30 days does and doesn't buy you
Thirty days, scoped this way, gets you a small number of paths that are genuinely AI-ready: identity resolved, text grounded, metadata legible, answers evaluated. It does not get you an org-wide golden record or a clean 360-degree everything. That's the point. You ship a real agent on real paths, prove the value, and expand path by path — which is exactly how we run Data-to-Agent: get the data ready for one job, launch, then scale the surface area instead of stalling on a cleanup that never ends.
We took this approach on Green Subsidy's speed-to-lead work: ready the narrow slice the agent needed to act on inbound interest fast, rather than waiting on a perfect data estate. The discipline that makes outcome-tied fees possible is the same one that makes a 30-day plan possible — nobody pays for a clean schema. They pay for an answer that's right, fast, every time.
Want a second set of eyes on which retrieval paths to ready first? Map your highest-value agent question with our team.