Identity Resolution

Also known as: Identity Stitching, Entity Resolution, Profile Unification, Record Matching

Identity resolution is the process of deciding which records scattered across your systems refer to the same real-world person or account, then linking them into one profile. It is fundamentally a probability problem: each candidate pair gets a match score, and a threshold turns that score into a yes-or-no decision. Set the threshold wrong and you either fragment one customer into many profiles or collapse two customers into one.

Why it matters

Every downstream system inherits the identity decisions made here. Personalization, attribution, consent and suppression lists, and especially AI agents all act on whatever profile resolution hands them — and they act fast, at scale, without a human sanity-checking each one. That is the part the marketing skips: an AI agent does not soften a bad merge, it amplifies it. If resolution collapses two customers, the agent emails one person about another's order, quotes the wrong contract, or honors a closed account's permissions. If it over-fragments, the agent treats a known customer as a cold lead and asks them to start over. Identity resolution is the layer that decides whether 'act on one version of the truth' is real or just a slide in the deck.

How it works

  • Blocking — you can't compare every record to every other (that's N-squared, and at a few million records it's billions of comparisons), so you first bucket records by cheap keys (same email domain, same ZIP, same phone area code) to shrink the candidate set before any scoring runs.
  • Matching — within each block, pairs are scored by deterministic rules (exact email, hashed phone) and probabilistic similarity (fuzzy name via edit distance or Jaro-Winkler, normalized address, common-name down-weighting) combined into a single match confidence.
  • Thresholding — a high score auto-links, a low score stays separate, and the band in between is the hard part: you either route it to human review or pick a default and accept the error it implies. This is precision versus recall stated as an operational dial.
  • Link vs. merge — deciding two records match is separate from deciding what to do about it. An identity graph keeps records distinct and stores the match as an edge (cheap to revise); a hard merge collapses them into one row (cheap to query, painful to undo). The choice determines how recoverable your mistakes are.
  • Survivorship — when you do unify fields, survivorship rules decide which value wins (most recent, most complete, most trusted source) so the profile isn't a contradiction of two truths.
  • Reconciliation over time — identity isn't a one-time job. New data arrives, people change emails and addresses, and yesterday's confident merge may need to split. Good systems keep the lineage so any link can be audited and reversed.

The trade-off no threshold escapes

There is no setting that eliminates both errors at once — this is precision versus recall, and you are choosing where to sit, not whether to choose. Loosen the threshold and recall rises: you catch more true matches but manufacture false merges (two people become one). Tighten it and precision rises: you avoid false merges but leave the same customer fragmented across several profiles. The right point is not a universal number; it depends on the cost of each error in your domain. For a marketing suppression list, a false merge that wrongly suppresses someone is cheap. For a healthcare or financial record, merging two identities is a reportable incident. So architects should treat the threshold as a governed, auditable, per-use-case decision — versioned and reviewable — not a default left wherever the tool installed it. And measure it: without a labeled sample of known matches and known non-matches, you have a confidence score and no idea what it's worth.

Where it fits

On Salesforce, identity resolution is the core job of Data Cloud: it ingests records from Salesforce and outside systems, resolves them into a unified profile, and exposes that profile to Sales Cloud, Service Cloud, analytics, and Agentforce agents. The sequence is the point. Resolution is foundation work that comes before the agent, not cleanup you bolt on after — which is exactly why SkySync's Data-to-Agent method starts at 'Agent Ready,' getting the data and identity right, before 'Agent Launch.' An agent built on unresolved data inherits every fragment and every bad merge and acts on them at machine speed. If you're about to put an agent in front of customers, resolution quality is the ceiling on how far you can trust it — worth pressure-testing before you ship, not after (see /solutions/agentforce-implementation).

Frequently asked

What's the difference between identity resolution and deduplication?

Deduplication finds and removes copies of the same record within one system, usually on exact or near-exact matches. Identity resolution is broader: it links records that refer to the same entity across many systems and formats — different emails, nicknames, old addresses — using probabilistic matching, and produces one durable profile (or identity graph) rather than just deleting duplicates.

Is identity resolution deterministic or probabilistic?

Both, and good systems combine them. Deterministic matching links records on exact shared keys (a verified email or hashed phone) with high confidence. Probabilistic matching scores fuzzy signals (similar names, normalized addresses) for the common case where no clean key exists. Most real profiles are stitched from a mix, with a confidence threshold deciding the ambiguous middle.

Should I store identities as merged records or as a graph?

A graph keeps source records intact and represents each match as an edge, so links are easy to inspect, weight, and reverse when new data contradicts them — at the cost of more complex queries. A hard merge collapses records into one row that's simple to read but expensive to unwind. Many platforms expose a unified profile over a graph underneath, giving you read simplicity without permanently destroying lineage. Choose based on how often your matches will need correcting.

Why does identity resolution matter so much for AI agents?

An agent takes governed actions on whatever profile it is handed — sending messages, updating records, honoring permissions — without a human checking each one. If identities are merged wrong, it acts on the wrong person; if they're fragmented, it fails to recognize a known customer. Resolution quality sets the ceiling on how trustworthy an agent can be, and a bad merge a human would have caught becomes an action the agent already took.

Ready when you are

Worth a
conversation?

Tell us one number you'd like AI to move. We'll show you how we'd do it, what it's worth, and how we'd tie our fee to getting you there.