Field note

A Salesforce Org-Health Scorecard

May 27, 2026Akshit Kandi

#Salesforce#org health#technical debt#Agentforce#architecture

Salesforce

A Salesforce Org-Health Scorecard

Most org-health reviews count debt. This one scores carrying capacity: a reusable, seven-dimension scorecard that tells you what your org can safely carry next, not just what's wrong with it.

Every Salesforce health check produces the same artifact: a long list of things that are wrong. Too many custom fields. Hardcoded IDs in flows. Apex with no test coverage. Profiles nobody can explain. The list is always true, always alarming, and almost never acted on — because a list of problems is not a decision. The question a leader actually has is not "how broken is my org?" It is "what can my org safely carry next?" Those are different questions, and only the second one is worth scoring. So this scorecard inverts the usual exercise. Instead of measuring debt as a sin to be confessed, it measures carrying capacity: the org's ability to absorb the next thing you put on it — a new business unit, a CPQ rollout, an integration, an autonomous agent — without something downstream catching fire. Seven dimensions, each scored 1 to 5, each with a test you can run in an afternoon. The output isn't a grade; it's a verdict on what to greenlight, and on where the next initiative will fail first if you don't fix the right thing before you start.

Why "health" is the wrong frame, and "capacity" is the right one

A perfectly healthy org by checklist standards can be completely unable to carry your next initiative, and a messy org can carry it just fine. Health is absolute; capacity is relative to a load. A 12-year-old org with 400 custom fields might be rock-solid for the sales process it runs every day and a death trap the moment you ask it to feed a real-time agent. The debt only matters where the new load lands. That reframing is what keeps a scorecard from turning into a year-long cleanup project nobody scoped — and what lets you put a price on the risk instead of the mess.

“
Technical debt is not a number you owe. It's a number you owe relative to what you're about to build. Score the org against the load, not against perfection.

Dimension 1: Data trust — can a system act on what's in here?

Humans tolerate bad data because they eyeball it and route around it. Automations and agents don't. This dimension scores whether the fields that drive decisions can be trusted by something that won't second-guess them. Don't measure fill rate alone — fill rate lies. A required-field rule that forces reps to pick the first picklist value gives you 100% populated and 0% trustworthy.

Score 5: key decision fields are validated at entry, distributions look real, ownership and source are known.
Score 3: fields are populated but one default value dominates, or near-duplicate variants ("USA" / "U.S.") are common.
Score 1: critical fields are free-text where they should be constrained, and nobody can say where the values come from.
Test: for each field that drives a decision, run a GROUP BY and read the distribution, not the count — a single value above ~70% of records is a default masquerading as data.

Dimension 2: Automation legibility — can one person explain what fires when a record saves?

The most dangerous orgs aren't the ones with the most automation. They're the ones where nobody can predict the cascade. A single Account update kicks off three flows, two built by a contractor who left, one of which silently overwrites a field the first one set. Watch especially for overlapping logic across old and new tooling: orgs that migrated from Workflow Rules to Process Builder to Flow without decommissioning the old layer carry three generations of automation racing each other, with no guaranteed order between them. The test is brutal and simple — pick your busiest object and ask one person to whiteboard, from memory, everything that fires on save. If they can't, and they usually can't, you score low, and you now know why your last "small change" took three weeks. Legibility, not elegance, is the thing to score: an ugly flow you understand is safer than a beautiful one nobody dares touch.

Dimension 3: Integration blast radius — what breaks outside Salesforce when something changes inside it?

Every org has hidden dependencies on the outside world: a middleware job that reads a field nightly, a marketing tool that writes leads through an API, a warehouse syncing on a schema it assumes is frozen. The scorecard question is the blast radius of a change. The test: open Setup, pull every integration user, connected app, and named credential, and ask which fields and objects each one touches — the names you can't account for are your blast radius. A field's API name becomes a contract the moment one external system reads it. Score 5 means those contracts are inventoried and contract-tested. Score 1 means you find out what's connected by breaking it.

Dimension 4: Security and access coherence — does permission still map to intent?

Access models rot quietly. Roles and profiles accrete over years of one-off requests until "who can see what" is an archaeological dig. This matters more than ever the moment you add automation that runs as a user, or an agent that acts on a customer's behalf, because it inherits whatever sprawl you've accumulated. The test: pick three sensitive objects and three real users across different functions, and verify their access is what someone would design on purpose today. If the answer to "why can this person edit that?" is "a ticket from 2022," you score low.

“
An agent or automation doesn't get its own permissions. It inherits yours. If you can't explain a human's access, you can't reason about a system's.

Dimension 5: Schema discipline — is the object model still telling the truth?

Object models drift from the business they're supposed to describe. A field called "Status" that encodes three different concepts depending on record type. Two custom objects modeling the same real-world thing because two teams built in parallel. Record types whose original distinction stopped mattering years ago. Score this by sampling: take ten fields a newcomer would assume they understand from the name, and check whether the data inside matches the label. The gap between name and reality is your schema debt — and it's the debt that most reliably misleads anything reading the org programmatically, because a language model grounding on your fields trusts the label and never sees the drift.

Dimension 6: Change safety — can you ship without holding your breath?

This is the dimension most health checks skip, and the one that predicts your future velocity. It's not about the org's contents; it's about your ability to change them safely. Do you have a sandbox that mirrors production? Real test coverage on the Apex that matters, or just enough to clear the 75% deploy gate? A deployment path that isn't one admin clicking through Setup in production on a Friday? Score 5 is a team that ships changes calmly because they can roll back. Score 1 is a team that batches changes for months because every release is a coin flip.

Sandbox parity: can you reproduce a production bug in a lower environment, or does the data and config diverge too much to trust?
Test coverage that asserts behavior, not coverage written to pass the gate — assertions, not just lines executed.
A rollback story: when a deploy goes wrong, what's the recovery, and how fast can you reach it?
Release cadence: small and frequent (healthy) versus large and rare (brittle).

Dimension 7: Observability — would you know if something quietly stopped working?

The failures that hurt most are the silent ones: a flow that errored for three weeks and nobody noticed, an integration dropping 5% of records, an automation that stopped firing after an update. This dimension scores whether your org tells you when it's failing. Are flow and Apex errors monitored, or do they pile up in an email nobody reads? Is there alerting on integration drops? When a number looks wrong in a report, can you trace it to a cause, or do you shrug? An org you can't observe is one you can only react to — and once you put autonomous behavior on it, reacting after the fact means reacting after a customer already felt it.

Scoring it: the lowest number is the one that matters

Resist the urge to average the seven dimensions into a tidy single score. An average hides the exact thing the scorecard exists to surface. An org that scores 5 on six dimensions and 1 on integration blast radius is not "mostly healthy" — it is one schema change away from breaking a system it forgot it was connected to. Read the minimum, not the mean. Then weight the dimensions by the load you're about to add, not in the abstract. Rolling out a new sales process? Schema discipline and automation legibility dominate. Adding an agent that acts in real time? Data trust, observability, and change safety move to the top — an autonomous system on an org you can't observe is a liability with a polished UI. The same org can be ready for one initiative and dangerously unready for another, and your lowest weighted dimension is your real carrying capacity, because that's where the spend gets burned first.

From scorecard to a go / fix-first / no-go call

A scorecard that ends in seven numbers is still just a report. Turn it into a decision. For the initiative in front of you, each weighted dimension gets a verdict: green, build on it as-is; yellow, build behind a guardrail or fix only the narrow slice the new load touches; red, remediate before the initiative starts or it fails in production. You don't fix the whole org. You fix the part the next thing stands on, and you ship the rest knowing exactly what risk you accepted. This is the same discipline behind how we build and run agents at SkySync: you can't stand behind an outcome you never priced the risk of. Scoped to the initiative, the scorecard is the cheapest insurance you'll buy — a two-day exercise that turns "we think it'll be fine" into a list of risks you chose on purpose. Run it on your own org this week. The lowest number on your card is the most useful sentence in your roadmap.

Want a second set of eyes on your scorecard before you greenlight the next big initiative? Book a working session and we'll pressure-test your org's carrying capacity against what you're about to build.

Newer

Agentforce vs. ChatGPT for Customer Service

Older

How to Qualify Leads 24/7 With an AI Agent