All posts

Field note

How to Forecast When Your Pipeline Data Is Messy

Akshit Kandi
#Forecasting#Data Strategy#Salesforce#RevOps#AI Agents
How to Forecast When Your Pipeline Data Is Messy
Forecasting

How to Forecast When Your Pipeline Data Is Messy

SkySync

Most forecasting advice assumes clean data you don't have. Here's how to build a forecast that survives stale stages, missing close dates, and reps who edit the past.


Every forecasting guide starts in the same fantasy world: opportunities have accurate amounts, stages move forward in order, close dates mean something, and reps update records the day things change. If that were your CRM, you wouldn't be reading this.

The real problem isn't that your forecast model is wrong. It's that the data feeding it is lying to you in specific, predictable ways — and most teams patch the model when they should be measuring the lie. This is a piece about doing the opposite.


Stop cleaning the data. Start measuring how dirty it is.

The instinct when pipeline data is messy is to launch a hygiene project: backfill close dates, enforce required fields, retrain reps. Worth doing eventually. But it's a multi-quarter effort, and you have to forecast on Monday.

The faster move is to treat dirtiness as a measurable property of the data, not a moral failing to be fixed. Every messy field has a signature you can quantify, and a quantified error is something you can correct for. You don't need clean data to forecast well. You need a calibrated estimate of how wrong your data is, applied consistently.

That reframe is the whole game. A forecast is a claim about the future built on claims about the present. If you know which present-tense claims are unreliable, you can discount them instead of trusting them blindly.


The five lies your pipeline tells, and the signature of each

Messy CRM data isn't random noise. It fails in a handful of recognizable patterns, and each one distorts the forecast in a different direction. Name them before you model them:

  • Stale stage: the opportunity sits in 'Negotiation' but hasn't been touched in 40 days. Signature — large gap between LastModifiedDate and the stage-entry date. These inflate your committed number with deals that are quietly dead.
  • Phantom close dates: every deal closes on the last day of the quarter or the first of next month. Signature — close dates cluster on round dates. The pipeline isn't timed; it's defaulted.
  • Amount drift: the amount was set at creation and never revised, or it mirrors the list price with no discount applied. Signature — amount equals a round number or exactly matches a price book entry.
  • Stage-skipping and rewind: deals jump from stage 2 to 5, or move backward when a rep cleans up. Signature — non-monotonic stage history in the field audit trail.
  • Survivorship gaps: lost deals get deleted or reassigned instead of marked Closed Lost, so your win rates look better than reality. Signature — a hole in the historical record where losses should be.

You can detect every one of these with queries you already have permission to run. The audit trail (OpportunityFieldHistory in Salesforce) is the single most underused asset in forecasting — it tells you not just what a field says, but whether anyone has believed it lately.


Build a trust score before you build a forecast

Here's the practitioner move that changes everything downstream. Before you weight a single opportunity, score how much you trust the record. Not the deal — the data about the deal.

A simple version: start every open opportunity at 1.0 and dock it for each lie it shows. Stale by more than one sales cycle? Subtract. Default close date? Subtract. Amount never edited since creation? Subtract. Owner changed twice this quarter? Subtract. The output is a per-record data-trust score between 0 and 1 that's completely independent of stage probability.

Now your forecast has two dimensions instead of one. The classic forecast multiplies amount by stage probability. The honest forecast multiplies amount by stage probability by trust score. A $200k deal in Negotiation that nobody has touched in six weeks stops counting like a $200k deal in Negotiation that moved yesterday — even though the CRM shows them identically.

Stage probability tells you how likely a deal is to close. Trust score tells you how likely the stage is even true. You need both, and almost nobody computes the second one.

The elegance is that trust scoring degrades gracefully. As your data gets cleaner, scores drift toward 1.0 and the adjustment fades on its own. You're not building a permanent crutch. You're building a measurement that retires itself.


Forecast the gap, not just the deals

Even a trust-weighted bottom-up forecast misses what isn't in the CRM at all: the deals reps are working but haven't logged, the renewals that haven't been created yet, the upside nobody enters because the stage doesn't fit.

So run a second forecast from the top down — off closed-won history, seasonality, and rep capacity — and compare it to your bottom-up number. The two will disagree. The size and direction of that disagreement is itself a signal. A bottom-up that runs chronically above top-down means your pipeline is padded. Chronically below means reps are sandbagging or under-logging. The gap is information, not an error to reconcile away.

Track that gap over several quarters and it becomes a correction factor you can apply with confidence. You're no longer forecasting from the data alone — you're forecasting from the data plus a learned model of how your data tends to mislead.


Where AI agents help — and where they quietly make it worse

It's tempting to throw a model at this. AI is genuinely good at parts of the problem and genuinely dangerous at others, and the line between them is worth being honest about.

Where it helps: pattern detection at scale. An agent reading the activity history, email and call logs, and field changes can flag a 'Negotiation' deal that hasn't had a real customer touch in weeks far faster than a manager scanning a board. It can draft the nudge to the rep, propose a revised close date from actual engagement signals, and keep the record honest between forecast calls. That's real leverage — keeping data fresh is a volume problem, and agents are good at volume.

Where it hurts: a model trained on your historical pipeline learns your historical lies. If reps have always closed-dated to quarter-end, the model will confidently predict quarter-end. If losses were deleted instead of logged, the model inherits your inflated win rate and reports it back as insight. A forecast model trained on dirty data doesn't clean it — it launders it, turning bias into a number that looks objective.

The rule we hold to: use agents to improve the data's freshness and completeness first, and only then to predict from it. An agent that keeps stages and close dates honest is worth more to your forecast than any clever prediction layer sitting on top of records nobody maintains. Data before agents, every time.


What good looks like in 30 days

You don't need a data warehouse rebuild to start. A realistic first month, in order:

  • Week 1 — Quantify the lies. Run the five signature queries against your open pipeline and put a number on each. You'll likely find a quarter of your committed dollars sitting in stale or defaulted records.
  • Week 2 — Ship a trust score. Even a crude one, computed nightly, that lives next to stage probability. Show the trust-weighted forecast beside the raw one in your forecast call.
  • Week 3 — Stand up the top-down comparison and start logging the gap. One number, tracked over time.
  • Week 4 — Point an agent or a simple automation at the worst offenders to refresh stages and close dates, so next quarter starts cleaner than this one.

None of this requires perfect data. It requires admitting your data is imperfect in measurable ways and building the measurement in. The teams that forecast accurately through the mess aren't the ones with the cleanest CRMs. They're the ones who know exactly how dirty theirs is and price it in.

If you want a second set of eyes on where your pipeline data is lying — and an agent that keeps it honest between forecast calls — that's the kind of work we build, run, and tie our fee to the result.

Pressure-test your pipeline data with us