All posts

Field note

Why Your AI Strategy Should Start With Data, Not Models

Akshit Kandi
#AI strategy#data strategy#AI ROI#executive#AI agents
Why Your AI Strategy Should Start With Data, Not Models
AI strategy

Why Your AI Strategy Should Start With Data, Not Models

SkySync

The model is the part everyone can rent and nobody can differentiate. The data is the part you already own and almost nobody has organized. That asymmetry is the whole strategy.


The part of your AI strategy you spend the most time arguing about is the part that matters least. Which model. Which vendor. Whether this week's release beats last month's on some benchmark. Meanwhile the thing that actually decides whether your AI works — your data — gets a line item and a junior owner. That inversion is the single most common reason expensive AI programs quietly underperform, and it almost never shows up in the postmortem.

I built Agentforce as a senior PM at Salesforce, and now I run AI agents in production for companies that pay us on the outcome, not the hours. From that seat you stop watching model leaderboards almost entirely. One question replaces them: when the agent reaches for a fact about a customer, is the fact there, and is it true? That is a data question. It is always a data question.

Models are a commodity. Your data is the only input your competitor can't buy.

Consider what a frontier model is from a strategy standpoint. It is a capability that you, your competitor, and a startup in a garage can all rent for the same per-token price by this afternoon. Same API, same weights, same context window. Whatever edge a model gives you, it gives everyone, and it gives them simultaneously. There is no moat in a thing your rival can subscribe to before lunch.

Your data is the opposite. Nobody else has your fifteen years of orders, your service histories, your pricing exceptions, or the unwritten rules your best rep keeps in her head. That asset is unique to you by definition. So the strategic logic is almost embarrassingly simple: build your advantage on the input that is scarce and yours, not the one that is abundant and everyone's. A strategy that opens with model selection is optimizing the commodity and ignoring the moat.

Picking your AI strategy by choosing a model is like picking a restaurant by the brand of its oven. The oven is fine. The ingredients decide whether anyone comes back.

Every demo assumes the data already exists

Every impressive AI demo you have seen runs on data that was clean, complete, and correctly connected — usually because someone curated it for the demo the night before. The model didn't know your customer was a returning enterprise account with an open escalation and a contract that expires in March. Someone handed it that context. In production, that someone has to be your data layer, automatically, for every interaction, all day, with no one curating anything. That is the gap that surprises executives. The model is genuinely capable — it can reason, summarize, plan, and hold a conversation. What it cannot do is know things about your business that live nowhere it can reach, or that live in six systems under three spellings of the same customer's name. A capable reasoner with no access to the truth doesn't go quiet when it's missing a fact; it fills the gap confidently. And a confident wrong answer is more expensive than no answer, because it sounds right long enough to act on.

Where AI strategies actually die: the boring middle

When AI programs fail, the autopsy almost never says "we chose the wrong model." It says some version of this:

  • The agent couldn't tell that "Acme Corp," "Acme Corporation," and "ACME Inc." were the same account, so it answered three different ways.
  • Half the fields it needed were blank, free-text, or last touched in 2019.
  • The fact it needed lived in a system the project never got permission to connect to.
  • Two sources disagreed, and there was no rule for which one wins.
  • Nobody owned keeping the data current, so accuracy decayed quietly after launch — and trust went with it.

None of these are model problems. All of them are unglamorous, expensive, and invisible from the boardroom until the agent embarrasses someone in front of a customer. This is exactly the part the AI marketing skips, and exactly the part that determines your return. The order of operations is not optional: make the data the agent will stand on true before you put the agent on it.

What "true data" actually requires under the hood

For the people who will build this, "organize the data" is too vague to act on. Concretely, the work has a shape. First, identity resolution: a single key that unifies the same customer, account, or asset across CRM, billing, support, and whatever spreadsheet the renewals team actually trusts — so the agent retrieves one entity, not three near-duplicates. Second, a precedence rule for every field that more than one system claims to own, so a conflict resolves deterministically instead of returning whichever record the query happened to hit first. Third, freshness you can prove: a timestamp and a source on each fact the agent is allowed to rely on, so a stale value can be detected rather than silently served. None of this is exotic. On Salesforce, Data Cloud exists to do precisely the identity-resolution and unification layer — but the platform doesn't decide your precedence rules or your freshness thresholds. You do. That's the design work, and it's where most of the actual return is won or lost.

Data-first doesn't mean a two-year data project

Here is where leaders overcorrect. "Start with data" gets heard as "freeze everything for a multi-year master-data initiative before anyone is allowed to touch AI." That is the wrong lesson, and it is how data-first quietly becomes a reason to do nothing. You do not need all of your data to be perfect — you need the specific slice that one valuable use case depends on to be trustworthy. So scope it to the job. Pick one use case with a number attached — speed-to-lead, case deflection, renewal risk — and ask only three things: what facts does the agent need to do this one task, where do they live, and are they true? That is answerable in weeks, not years. It also does quiet compounding work, because the identity keys, precedence rules, and ownership you establish for the first agent are reusable for the next ten. Get the foundation right once on a narrow slice, then extend it.

You don't need clean data everywhere. You need true data exactly where the agent reaches — and a named owner who keeps it true after launch.

The executive version: what to ask in the next meeting

You do not need to know the difference between a vector index and a data model to govern this well. You need to change which questions you ask. When someone brings you an AI proposal, the tell of a serious plan versus a science project is whether it answers the data questions before the model questions — so ask these, in this order:

  • What single, measurable outcome does this move — and roughly how much if it works? (If the proposal can't put a candidate number on it, illustrative or not, it isn't a business case yet.)
  • What facts does the agent need to be right, and which of our systems hold them today?
  • How do we know those facts are current and unique — who resolves duplicates and which source wins a conflict?
  • Who owns the data after launch so accuracy doesn't decay the week the project team rolls off?
  • When the agent is wrong, how fast do we find out, and how do we roll it back?

Notice the model never came up. A team that can answer these has done the unglamorous work that makes any decent model perform. A team that only wants to talk about which LLM is hottest this quarter has skipped it. The model choice is real — it's just the last ten percent of the decision, not the first.

The one place this gets opinionated

I'll go past the safe version of this argument. In two years the model you pick today will look quaint, and you'll have swapped it for something better without anyone outside the team noticing — because models are getting cheaper and more interchangeable, not less. The thing you build now that still pays off then is the connected, governed, trustworthy data layer underneath. You rent the model. You own the data. Spend your strategy budget on the asset that survives the next three model releases.

This is also why, when we take an engagement, our fee is tied to the outcome rather than the hours. It forces an honest order of operations on us: we don't get paid for a clever model demo. We get paid when the agent produces a real result — which means we are the ones who have to make the data true first, run the agent on it, and stay accountable when the world shifts underneath it. Outcome accountability and data-first discipline turn out to be the same discipline wearing two hats.

So if you take one thing from this: your AI strategy is mostly a data strategy with a more exciting name. Lead with the data your agents will stand on, scope it to one use case that moves a number, and treat the model as the easy, swappable part — because it is. Do that and the agent earns its keep. Skip it, and no model on the leaderboard will save you.

Want to pressure-test whether your data can carry the AI you're planning? That's the conversation we have on every first call.