Field note
How to Unify Data Across Systems With Data Cloud
Data CloudHow to Unify Data Across Systems With Data Cloud
Unifying customer data in Data Cloud is not a pipe-laying exercise. It is an identity and modeling problem, and the real test is whether an agent can safely act on the result.
Most teams treat "unify our data" as a plumbing project. Connect the sources, land everything in one place, draw a dashboard on top, declare victory. Then six months later someone asks why the same customer shows up three times, why the agent quoted a stale balance, and why the "single view" disagrees with the system it was built from. The plumbing worked. The unification didn't.
Unifying data across systems with Data Cloud is not primarily a movement problem. It is a modeling and identity problem wearing a pipeline costume. The connectors are the easy 20%. The hard 80% is deciding what an entity is, which record wins, and how fresh the answer has to be for whatever consumes it. This piece is about that 80%.
Ingestion is the part the demos skip past
Every Data Cloud demo opens at the same beautiful moment: data already mapped, identities already resolved, a clean unified profile glowing on screen. The interesting decisions all happened off-camera. Data Cloud's job is to bring data in without copying it everywhere — through zero-copy federation to a warehouse like Snowflake or BigQuery, native connectors to Salesforce orgs, and streaming ingestion for events. Each mode has a different freshness, cost, and query-latency profile, and choosing wrong is where projects quietly go sideways.
A blunt rule of thumb: federate what is large and analytical, ingest what you need to act on in near real time, and stream what is event-shaped. If you copy your 400-million-row warehouse into Data Cloud because copying felt safer, you have signed up to pay for and re-sync data you were never going to query at low latency. The flip side is just as real: zero-copy federation reads live from the source, so a slow warehouse becomes your profile's slow warehouse at the worst possible moment — when an agent is mid-conversation waiting on a value. Pick the mode against the read pattern, not the comfort level.
The data model is the actual product
The center of gravity in Data Cloud is the harmonization layer, where messy source schemas get mapped onto a shared model. In Salesforce terms that's mapping Data Lake Objects to Data Model Objects against a canonical shape. Skip the modeling discipline and you get a unified store of contradictions — which is worse than separate systems, because now the contradictions are centralized and look authoritative.
Three modeling decisions carry most of the weight, and they're worth fighting over before any data moves:
- Grain. What does one row mean? A customer, an account, a household, a subscription, a device? Mixed grain is the root cause of most "my numbers are wrong" tickets — and the hardest thing to unwind once activations and reports depend on it.
- Canonical entities. Pick the handful of objects that everything maps to — Individual, Account, Order, Case — and refuse to let each source invent its own. The map from source field to canonical attribute is the real contract; write it down before you wire anything.
- Semantics over fields. "status = 3" in one system and "Active" in another must resolve to the same meaning, not just live in adjacent columns. The harmonization layer is where you encode that meaning, or where you silently fail to.
“A unified schema you can't explain to a new engineer in ten minutes isn't unified. It's just consolidated confusion with better marketing.
Identity resolution is where it's actually won or lost
Once data is modeled, Data Cloud resolves identities into unified profiles using match rules — exact keys like email or phone, and fuzzier reconciliation across name, address, and device. This is the single highest-leverage and highest-risk step in the whole build. Tune match rules too loose and you merge two different people into one profile. Tune them too tight and one person fractures into five, and your "360 view" is really five 72-degree views.
Treat identity resolution as a tunable model with a precision-versus-recall tradeoff, not a checkbox. Build a hand-labeled set of record pairs you know to be the same person and pairs you know to be different, then track two numbers as you tune: the false-merge rate (distinct people collapsed into one) and the fragmentation rate (one person split across profiles). Start strict so precision stays high, then loosen deliberately while watching false merges climb. A wrong merge is not a cosmetic bug — it can cross a privacy boundary, exposing one person's data under another's profile, which is a consent and compliance event, not a data-quality footnote. Get this right before anything downstream reads from it.
Decide who wins: reconciliation and freshness
When three systems disagree about a customer's phone number or lifetime value, something has to decide the answer. Data Cloud lets you set reconciliation rules — most recent, source priority, most frequent — at the attribute level. Don't apply one global rule. The marketing system might be authoritative for consent, billing for balance, support for case history. Encode that source-of-truth map explicitly, attribute by attribute, because if you don't, the default will quietly pick for you — usually some flavor of "last writer wins," which is exactly wrong for a field like consent.
Freshness deserves the same explicitness. Federated data is as fresh as the underlying warehouse and may add query latency. Streamed events can be near-instant. Batch-ingested data is as stale as its last run. A single unified profile silently blends all three, so the profile's effective freshness is whatever its stalest load-bearing attribute happens to be. The fix is to decide, per attribute, how old the answer can be before it's wrong for the use case — a marketing segment tolerates yesterday; a balance check tolerates seconds — and then choose the ingestion mode to meet that bar, not the other way around.
Activation is the only proof that counts
Here's the contrarian part. A unified profile that only feeds dashboards has never really been tested. Reports forgive a lot — a duplicate here, a day-old number there, nobody notices. The moment that data drives an action, every flaw becomes visible and consequential. So the honest acceptance test for unification isn't "does the segment look right," it's "would I let an automated system act on this unprompted?"
That question changes the bar. An agent that answers "what's my balance," reprices a quote, or routes a lead needs the right entity, the winning value, and adequate freshness — at inference time, not in last night's batch. This is the bridge from Data Cloud to Agentforce, and it's also why we tell clients the sequence is data before agents, never the reverse. An agent on unresolved data doesn't fail quietly; it fails confidently, in front of a customer, citing the wrong person's record as if it were settled fact.
A sane sequence to build it
If you're starting a real unification effort, resist the urge to connect everything at once. Narrow it to one decision worth getting right, then build the thinnest unified slice that supports it:
- Pick one downstream action that has to be correct — a renewal nudge, a service answer, a lead route — and work backward from the data it needs.
- Map only the entities and attributes that action touches. You'll model more later; you don't need the whole enterprise on day one.
- Resolve identity strict-first and validate merges against labeled samples before trusting them.
- Set reconciliation and freshness per attribute, with a documented source of truth.
- Activate to one channel, measure against ground truth, and only then widen the aperture.
This is roughly the Agent Ready stage of how we think about delivery: get the data correct and actionable before a single agent goes live. It's slower to start and far cheaper to live with, because you never ship centralized confusion to production — and at the end of week one you have one working action to point at instead of a half-built platform with nothing acting on it.
What "unified" should actually mean
Unified data isn't one place where all your data lives. It's one place where, for any entity you care about, you can name the canonical record, defend why a given value won, and state how fresh it is — and then let something act on that with confidence. Data Cloud gives you the machinery. The judgment about grain, identity, and source-of-truth is the work, and it's the part no connector does for you.
Get those decisions right and the downstream stops being scary: dashboards reconcile, segments behave, and agents act on reality instead of three contradictory versions of it. Get them wrong and no amount of pipeline throughput will save you.
Planning a Data Cloud unification you intend to put agents on top of? We can pressure-test your identity and modeling decisions before they reach production.