Data Unification
Also known as: Identity Resolution, Customer Data Unification, Entity Resolution
Data unification is the process of matching, merging, and reconciling records that describe the same real-world entity — a person, account, or product — scattered across separate systems into one consistent, queryable profile. It is not a one-time data dump; it is a continuous pipeline that ingests, standardizes, resolves identity, and reconciles conflicts as source data keeps changing. The output is a single version of an entity that downstream systems — analytics, marketing, and AI agents — can act on without guessing which record is true.
The part the marketing skips: unification is a process, not a table
Vendors sell unification as an outcome — "one customer profile" — and architects inherit it as a problem that never stops moving. Source systems drift. A CRM, a billing system, and a support tool each spell the same customer three different ways, update on different schedules, and disagree about which email is current. Unification is the standing pipeline that reconciles that disagreement on every change, not a migration you run once. Treat it as a one-time merge and the unified profile is stale within a quarter: the duplicates you cleaned up grow back because nothing is resolving new records against the old ones, and every source that keeps writing keeps reintroducing the conflict you thought you'd closed.
How it actually works
- Ingestion and standardization: pull records from each source and normalize formats — casing, phone and address formats, date types, encoding — so two records of the same person are even comparable. A lot of what looks like a matching failure is actually a standardization gap upstream.
- Blocking (the scaling step nobody mentions): you cannot compare every record to every other record — that is quadratic and dies at scale. Blocking groups records into candidate sets by a cheap key (postal code, email domain, name prefix) so matching only runs within plausible neighborhoods. Pick the blocking key badly and two records that should match never get compared, no matter how good your scoring is.
- Matching: decide which candidate records refer to the same entity. Deterministic matching uses exact keys (email, tax ID) — precise but blind to messy real-world data. Probabilistic matching scores fuzzy similarity across fields and accepts a match above a threshold — catches more, but that threshold is a dial that trades false merges against missed ones. Production systems use both: deterministic where the keys are clean, probabilistic where they are not.
- Identity resolution: cluster matched records under a stable, durable ID so the same entity keeps the same identity even as source IDs churn and new records arrive.
- Survivorship (the golden record): when sources conflict, rules decide which value wins — most recent, most trusted source, most complete. The subtle part is that survivorship is field-level, not record-level: the freshest phone number and the most authoritative billing address often live in different source records, so the canonical profile is assembled field by field rather than copied from one winning row.
- Continuous reconciliation: re-run blocking and matching as new and changed records arrive, near real time, so the profile stays current. This is also where the expensive case shows up — a new record reveals that two previously separate clusters are actually the same entity, and now they have to be merged after downstream systems already treated them as distinct.
The trade-offs an architect actually owns
Unification is a tuning problem between two failure modes, and you cannot zero out both. Match too aggressively and you over-merge — two real people collapse into one profile, and now your agent emails the wrong person or exposes one customer's data to another. Match too conservatively and you under-merge — the same person stays split, and your "single view" is three views in a trench coat. The threshold that sits between them is a business decision, not a default: in regulated or high-stakes contexts, bias toward fewer false merges and accept some duplicates; for broad marketing reach, the cost calculus flips. Two engineering rules survive either choice. Keep merges reversible — an unmergeable bad match is a data-quality incident you cannot recover from cleanly. And keep the matching evidence stored, so when someone asks why two records became one you can answer instead of guessing.
Where it fits — and why AI raised the stakes
Unification used to be a back-office concern feeding dashboards and campaign segments; a slightly stale profile cost you a mistargeted email. AI agents changed the blast radius. An agent doesn't just read the profile — it acts on it, taking governed actions against records in real time. If identity resolution is wrong, the agent isn't wrong on a chart; it's wrong in production, on a live customer, before a human reviews it. That is why a credible AI-agent deployment starts at the data layer rather than the prompt: resolution errors don't stay contained, they get executed. In the Salesforce stack specifically, Data Cloud is where ingestion, identity resolution, and survivorship happen, producing the unified profile an Agentforce agent reasons over — but the principle holds on any stack. At SkySync we run it in that order on purpose — data before agents — because an agent grounded on un-reconciled data automates your worst records faster, not your best outcomes. If you're scoping an agent program, the honest first question isn't which model; it's whether your entities resolve. You can pressure-test that at /start.
Frequently asked
What is the difference between data unification and data integration?
Integration moves data between systems and gets it into one place; unification decides which of those records describe the same entity and reconciles them into one profile. You can integrate a hundred sources into a warehouse and still have ten duplicate versions of every customer. Integration is plumbing; unification is the identity-resolution judgment that runs on top of it.
Is a golden record the same thing as data unification?
The golden record is the output — the canonical, survivorship-resolved version of an entity, often assembled field by field from several source records. Unification is the ongoing process that produces and maintains it. A golden record that isn't continuously reconciled against new source data is just a snapshot that's already decaying.
Why can't I just match every record against every other record?
Because the number of comparisons grows roughly with the square of your record count — at a few million records, all-pairs matching is computationally hopeless. That is what blocking solves: it pre-groups records into candidate sets by a cheap key so matching only runs within plausible neighborhoods. Blocking is where much of the real trade-off lives — too coarse and you miss true matches, too fine and the job won't finish.
Why do AI agents need unified data specifically?
Because an agent takes actions, not just reads. If identity resolution merges two people or splits one, an agent acts on that error live — contacting the wrong person, applying the wrong entitlement, or exposing data across customers. With analytics a resolution error is a wrong number; with an agent it's a wrong action. Unified, resolved data is what lets the agent act on one version of the truth instead of a fragment.
Related terms
Salesforce Data Cloud
Salesforce Data Cloud is the layer that ingests, unifies, and resolves identity across your data sources into a single real-time customer profile that the rest of Salesforce — including Agentforce AI agents — can reason and act on.
Agentforce
Agentforce is Salesforce’s platform for building AI agents — software that reasons over your business data, makes decisions, and takes actions inside Salesforce, governed by your existing permissions and audit trail. Unlike a chatbot that only replies, an agent can complete a task end to end.
Ready when you are
Worth a
conversation?
Tell us one number you'd like AI to move. We'll show you how we'd do it, what it's worth, and how we'd tie our fee to getting you there.