Regression to the Mean in WFM
Regression to the Mean in WFM is the application to workforce management of regression to the mean — the statistical tendency for an extreme measurement to be followed by one closer to the average, purely because the extreme value was partly due to chance. The phenomenon was first described by Francis Galton, who observed that the children of unusually tall parents tended to be closer to average height than their parents.[1] In contact center operations, where almost every metric contains a large random component, regression to the mean is one of the most common reasons that interventions appear to work when they do not — and a frequent source of misattributed credit and blame.
The mechanism
Any observed metric can be thought of as a stable underlying level plus random noise: an agent's handle time this week reflects their true skill plus the particular mix of calls, system behavior, and personal factors they happened to encounter. When a measurement is extreme — the best or worst result in a group, or an unusually good or bad period — it is likely that the random component was also extreme in the same direction. The next measurement, drawing fresh noise, will on average be less extreme. The more chance contributes to a metric, the stronger the regression.[2]
Regression to the mean is not a force that pulls results toward the average; it is a consequence of imperfect correlation between successive measurements. It appears whenever cases are selected because they are extreme and then measured again.
Why it matters in workforce management
The danger is systematic: WFM and operations routinely select on extremes and then act, which is exactly the condition that manufactures the illusion.
- Coaching the bottom performers. Agents are identified as the lowest performers in a period and given an intervention. Their scores improve at the next measurement — but the bottom group would have improved on average even with no intervention, because their selection was partly bad luck. Without a control group (A/B test) or a longer baseline, the coaching effect is confounded with regression and is almost always overstated. The mirror image — top performers selected for reward who then "decline" — invites unwarranted blame.[2]
- Crediting a new model or process. A forecast model, schedule change, or routing tweak is introduced right after an unusually bad month. The following month looks better, and the change is declared a success. Some or all of the improvement may be regression from an extreme starting point. The corrective discipline is to compare against the expected range of variation (see Statistical Thinking in WFM), not against the worst recent point.
- Reacting to outlier days. A single very poor service-level day is followed by a better one regardless of any action taken intraday, tempting teams to credit whatever lever they happened to pull. This is the regression counterpart of the common-cause versus special-cause distinction.
- Vendor and pilot evaluation. Sites or queues chosen for a pilot because they are performing badly will tend to improve on their own. A pilot judged only on before-versus-after at the worst sites will overstate the benefit.
Distinguishing regression from real effects
The practical defenses are the standard ones for separating signal from noise:
- Use a control or comparison group. Compare the intervention group to a similar untreated group exposed to the same conditions; both regress, so the difference isolates the real effect. This is the core argument for controlled experiments in WFM.
- Establish a stable baseline. Judge change against the metric's normal range over many periods, not against a single extreme point.
- Select on something other than the outcome. Where possible, choose intervention targets on a leading indicator rather than on the very metric being evaluated.
- Expect partial reversion. When a result is extreme, the disciplined prior is that the next result will be less extreme — before attributing the move to any action.
Maturity Model Position
In the WFM Labs Maturity Model™, awareness of regression to the mean separates operations that learn from their data from those that are misled by it.
- Level 1–2 (Emerging / Foundational) — extreme results are treated at face value. Bottom-performer coaching and post-bad-month process changes are credited on before-versus-after evidence, and the cycle of misattribution repeats.
- Level 3 (Progressive) — the operation expects reversion from extremes, baselines its comparisons, and reserves causal claims for changes that exceed the normal range of variation.
- Level 4–5 (Advanced / Pioneering) — interventions are evaluated with control groups and controlled experiments as a matter of routine, and automated systems are designed not to over-react to extreme readings.
See also
- Statistical Thinking in WFM
- Signal and Noise in WFM
- Correlation and Causation in WFM
- A/B Testing for WFM Experiments
- Variance Harvesting
- Forecast Bias Detection and Correction
- Causal Inference in Workforce Management
References
- ↑ Galton, F. (1886). "Regression Towards Mediocrity in Hereditary Stature". The Journal of the Anthropological Institute of Great Britain and Ireland, 15, 246–263. doi:10.2307/2841583.
- ↑ 2.0 2.1 Kahneman, D. (2011). Thinking, Fast and Slow. Farrar, Straus and Giroux. ISBN 978-0-374-27563-1.
