All posts

Field note

Data Lake Object vs Data Model Object in Salesforce Data Cloud

Akshit Kandi
#Data Cloud#Salesforce#Data Architecture#Agentforce
Data Lake Object vs Data Model Object in Salesforce Data Cloud
Data Cloud

Data Lake Object vs Data Model Object in Salesforce Data Cloud

SkySync

In Data Cloud, the DLO is where your data lands and the DMO is where it becomes usable. Confuse the two and you pay for it in mapping rework, query cost, and an agent that reasons over the wrong shape of data.


Two objects, two acronyms one letter apart, and a difference that decides whether your Data Cloud build is clean or a swamp. The Data Lake Object and the Data Model Object are not two flavors of the same thing. They sit at different stages of the pipeline, they answer to different rules, and the moment you treat them as interchangeable you start accruing the kind of debt that surfaces six months later as duplicate identities and a query bill nobody can explain.

This is the plumbing under every Customer 360 and every Agentforce deployment. If you are the architect on the hook for it, here is the distinction that actually matters in production — not the marketing version.


The one-line version, then the part that matters

A Data Lake Object (DLO) is where ingested data lands, in the shape the source gave it. A Data Model Object (DMO) is where that data is reshaped into a canonical model you can resolve identity on, segment, and serve to agents. DLO is storage. DMO is meaning.

Every data stream you create — a CSV upload, a Salesforce CRM connector, an S3 ingestion, a Marketing Cloud feed — produces a DLO automatically. The DLO mirrors the source: its fields, its types, its quirks. Nothing downstream in Data Cloud does the interesting work directly against a DLO. The DLO exists so you have a stable, queryable copy of the raw data to map from.

The DMO is the target you map that raw data into. It conforms to a schema — usually one of the standard objects in Salesforce's data model (Individual, Contact Point Email, Sales Order, Case) or a custom DMO you define. Identity resolution, calculated insights, segmentation, and Agentforce grounding all read the DMO. The DLO never participates in those directly.


Why the two-layer split exists at all

It would be simpler to ingest straight into a model. Data Cloud splits the layers on purpose, and the reason is worth internalizing because it changes how you design.

Sources change and you do not control them. A vendor renames a column, adds a field, swaps an ID format. If your downstream logic read the raw source, every such change would ripple into your segments and agents. With the DLO/DMO split, the blast radius of a source change stops at the mapping layer. You fix one mapping; everything downstream keeps reading the same stable DMO contract.

  • The DLO is the staging contract with the source — faithful to whatever arrived, including its mess.
  • The mapping is the translation layer — the only place that knows both the source's shape and your canonical shape.
  • The DMO is the contract with everything downstream — identity resolution, insights, segments, Agentforce.
  • Change isolation is the whole point: a source-side change should never reach a segment without passing through a mapping you control.

If you have ever maintained an integration where a producer's schema change silently broke a consumer three systems away, you already understand why this separation earns its keep.


What only a DMO can do

This is the practical test for whether you have modeled correctly. A long list of Data Cloud capabilities simply do not operate on DLOs:

  • Identity resolution runs on DMOs. Match and reconciliation rules unify records into a Unified Individual built from mapped DMOs — never from raw DLOs.
  • Calculated insights and most segmentation target DMOs, because they assume a known, canonical shape.
  • Agentforce and Einstein grounding read the unified, modeled data — the DMO side — so the agent reasons over one resolved customer, not seven raw source rows.
  • Data graphs and the Customer 360 profile are assembled from DMOs and the relationships between them.

So if you find yourself trying to resolve identity or ground an agent and the data still looks like the source, the diagnosis is almost always the same: you stopped at the DLO and never finished the mapping to a DMO. The DLO did its job. The modeling didn't happen.


The mapping is where the real design happens

Mapping a DLO to a DMO is not a clerical drag-and-drop, even though the UI makes it look like one. This is the step where you decide what your data means, and the decisions are load-bearing.

It starts before you match a single field. Every DLO carries a category — Profile, Engagement, or Other — and that choice constrains how the data behaves downstream. Profile data describes who someone is; Engagement data is time-series behavior keyed on an event timestamp. Mislabel a stream of order events as Profile data and you have quietly told the platform to overwrite history instead of accumulate it. Fix the category before you touch the fields.

Then come three field-level decisions that cause most of the pain we see in audits. First, the primary key: the DLO needs a field that uniquely identifies a row, and choosing a non-unique field here quietly corrupts everything downstream. Second, the event time field on engagement DMOs, which determines how records are versioned and how recency is computed — get it wrong and your "latest" record isn't. Third, relationship fields between DMOs, which are what let a query traverse from a person to their orders to their cases. Skip those and your data is technically present but practically siloed.

A DLO with no DMO mapping is just a more expensive copy of your source system. The value isn't in landing the data. It's in modeling it.

One source field can map to multiple DMOs, and that is often correct: an email column might feed both an Individual's contact point and a marketing engagement DMO. Conversely, several DLOs from different sources commonly map into one DMO — that convergence is exactly what makes cross-source identity resolution possible in the first place.


Custom DMO or standard DMO?

Reach for a standard DMO whenever your data plausibly fits one. Standard objects come pre-wired into the data model that identity resolution, ready-made insights, and the Customer 360 profile already expect — you inherit that integration for free. A custom DMO is the right call when you genuinely have an entity Salesforce doesn't ship: a domain-specific record, an industry object, a proprietary event type.

The failure mode to avoid is reflexively cloning every source table into a custom DMO because it feels like a faithful translation. That rebuilds the modeling work the standard objects already did, and you forfeit the downstream wiring. Map to the standard model first; go custom only where the standard model can't honestly represent what you have.


Where this bites you in cost and in agents

Two consequences are worth stating plainly for the people who sign off on the budget and the people who answer for the agent's behavior.

On cost: Data Cloud consumption is driven heavily by the volume you process and the queries you run, not by the elegance of your model. Ingest raw data into a DLO and never map it, and you can still pay to store and reprocess data that yields no resolved profile and no segment — overhead with no payoff. Bringing in only what feeds a real downstream use tends to be the cheaper architecture as well as the cleaner one. The two goals point the same direction more often than they conflict.

On agents: an Agentforce agent grounded on poorly modeled data doesn't fail loudly. It answers confidently from the wrong shape of data — reading a duplicate record, missing a relationship, citing a stale row because the event time field was mismapped. This is the unglamorous truth behind "data before agents." The agent inherits the quality of your DMO layer exactly, and the DMO layer inherits the quality of your mapping. You can't fix at the prompt what you broke at the model.


A quick mental model to carry

When you are staring at a Data Cloud org and trying to reason about it, hold this: the DLO answers "what arrived, and from where?" The DMO answers "what does it mean, and who is it about?" Ingestion fills the first. Mapping fills the second. Identity resolution, segmentation, and every agent live entirely in the second.

Most Data Cloud projects that go sideways do so because the team treated landing the data as the finish line. Landing it is the easy part. Modeling it — choosing keys, setting categories, conforming to standard objects, wiring relationships, converging multiple sources into one resolved entity — is the work that decides whether you built a Customer 360 or an expensive lake nobody can fish in.

That modeling discipline is the front half of how we run Data-to-Agent at SkySync: get the data right and resolved first, because every agent we build and run afterward — with our fee tied to its return — is only ever as good as the DMO layer underneath it.

Talk to us about getting your Data Cloud model right before the agents