Base Rates and Bayesian Reasoning in WFM

From WFM Labs
With a 1% base rate, even a 90%-accurate fraud detector produces mostly false alarms: of 585 flags, only 90 are real.

Base Rates and Bayesian Reasoning in WFM concerns how to combine the underlying frequency of an event (its base rate or prior) with new evidence (a test, a flag, a signal) to reach a correct probability. The recurring error — base-rate neglect — is to judge a case on the strength of the evidence while ignoring how rare or common the outcome is to begin with. In workforce management, where rare events are flagged constantly — fraud, attrition risk, quality failures, anomaly alerts — neglecting base rates produces systematically overconfident conclusions and floods of false positives.[1]

Bayes' rule in words

Bayesian reasoning updates a prior belief into a posterior in light of evidence: the probability of a hypothesis given the evidence depends not only on how diagnostic the evidence is, but on how likely the hypothesis was beforehand.[2] The practical consequence is decisive: when the prior (base rate) is very low, even strong evidence may leave the posterior probability modest. A test that is "90% accurate" does not make a flagged case 90% likely to be positive — that conclusion confuses the accuracy of the test with the probability of the event, and the gap between the two is governed by the base rate.

The false-positive trap

The clearest WFM illustration is a flagging system applied to a rare event. Suppose 1% of interactions are fraudulent and a detector catches 90% of real fraud while false-alarming on 5% of legitimate interactions — a genuinely good detector. Applied to 10,000 interactions:

Flagged Not flagged Total
Fraud (1%) 90 10 100
Legitimate (99%) 495 9,405 9,900
Total 585 9,415 10,000

Of the 585 flags, only 90 are real fraud — about 15%. Despite a 90%-accurate detector, the large legitimate population generates more false alarms than the small fraud population generates true ones. The base rate, not the accuracy, dominates the result. The same arithmetic governs attrition-risk scores, QA anomaly flags, and any alert aimed at a rare condition.

Where base rates bite in WFM

  • Fraud and risk flagging. Fraud and security alerts target rare events; without base-rate awareness, teams over-trust flags and over-staff investigation queues for caseloads that are mostly false positives.
  • Attrition prediction. A model that flags "high churn risk" is interpreted as near-certainty when, against a modest base attrition rate, most flagged agents will in fact stay; interventions sized to the flag count overshoot.
  • Quality monitoring. Rare serious defects mean a sensitive QA flag is mostly noise unless the base rate is accounted for, wasting calibration and coaching effort.
  • Planning priors. Bayesian thinking also strengthens forecasting: blending a stable prior with new data avoids overreacting to a single surprising period, the formal basis of Bayesian forecasting methods and a complement to the signal-versus-noise judgment.

Reasoning well with base rates

  • Start from the base rate. Anchor on how common the outcome is before weighing the evidence; the flag updates the prior, it does not replace it.
  • Think in natural frequencies. Expressing the problem as counts out of a population — as in the table — makes the false-positive rate obvious where percentages and "accuracy" mislead.[3]
  • Report precision of a flag, not just accuracy. "What fraction of flags are real?" is the operational question, and it depends on the base rate; track it explicitly.
  • Set thresholds to the caseload, not the score. Calibrate alert thresholds against the volume of false positives the operation can actually work, not against the model's headline accuracy.

Base-rate neglect is the statistical face of a cognitive bias: vivid, specific evidence (a confident flag) crowds out abstract background frequency. Sound Bayesian habits are the corrective.

Maturity Model Position

In the WFM Labs Maturity Model™, whether decisions account for base rates separates calibrated judgment from alarm-driven reaction.

  • Level 1–2 (Emerging / Foundational) — flags and risk scores are trusted at face value; "90% accurate" is read as "90% likely," and false-positive floods are absorbed as workload.
  • Level 3 (Progressive) — base rates are built into how flags are interpreted, precision is tracked alongside accuracy, and thresholds are set to manageable caseloads.
  • Level 4–5 (Advanced / Pioneering) — Bayesian updating is embedded in models and decision tooling, priors are managed deliberately, and automated alerting is tuned to posterior probability rather than raw detector accuracy.

See also

References

  1. Tversky, A., & Kahneman, D. (1974). "Judgment under Uncertainty: Heuristics and Biases". Science, 185(4157), 1124–1131. doi:10.1126/science.185.4157.1124.
  2. McGrayne, S. B. (2011). The Theory That Would Not Die. Yale University Press. ISBN 978-0-300-16969-0.
  3. Gigerenzer, G., & Hoffrage, U. (1995). "How to Improve Bayesian Reasoning Without Instruction: Frequency Formats". Psychological Review, 102(4), 684–704.