Goodhart's Law and Metric Gaming

From WFM Labs

Goodhart's Law and Metric Gaming concerns a failure mode at the heart of metrics-driven workforce management: when a measure is turned into a target, people optimize the measure rather than the outcome it was meant to represent, and the measure stops being a good indicator. The principle is named for economist Charles Goodhart, whose 1975 observation about monetary policy[1] was later generalized by anthropologist Marilyn Strathern into the widely quoted form: "When a measure becomes a target, it ceases to be a good measure."[2] A closely related formulation, Campbell's Law, warns that the more a quantitative indicator is used for social decision-making, the more it will distort and corrupt the process it monitors.[3]

Why measures decay into targets

Most operational metrics are proxies — convenient, measurable stand-ins for an outcome that is harder to observe directly. Average handle time is a proxy for efficiency; adherence is a proxy for reliability; a quality score is a proxy for good service. A proxy works only as long as the link between it and the true outcome holds. When the proxy is tied to incentives, evaluation, or ranking, people find the cheapest way to move the number — and the cheapest way is usually not the one that improves the underlying outcome. The correlation that made the proxy useful is precisely what the pressure breaks.

How it appears in contact centers

Contact centers are unusually exposed because so much of the work is measured in real time and tied to performance management:

  • AHT gaming. When AHT becomes a target, agents can lower it by rushing customers, transferring or escalating prematurely, or releasing calls — improving the number while degrading first contact resolution and customer outcomes.
  • Adherence theater. Adherence measured mechanically can be satisfied by being in the right state at the right time without doing the underlying work it was meant to ensure.
  • Occupancy and the speed–quality trade. Pushing occupancy or productivity targets without balancing measures invites the failures catalogued under The Occupancy Trap.
  • Quality-score inflation. When QA scores drive rankings, calibration drifts, easy interactions are selected for review, and scores rise without service improving.
  • AI containment gaming. When a self-service or AI deflection rate becomes the target, containment can be inflated by making escalation hard — raising the metric while harming resolution and trust.

In each case the metric improves and the outcome it represented does not, or moves the opposite way.

Mitigations

Goodhart effects cannot be eliminated, but they can be contained:

  • Pair every efficiency metric with a quality or outcome metric. Balanced sets — AHT with FCR, occupancy with agent welfare, containment with resolution — make single-metric gaming visible because the cost shows up elsewhere.
  • Measure outcomes, not just proxies, where feasible. Resolution, repeat-contact rate, and verified customer outcomes are harder to game than activity counts.
  • Use metrics for learning before judgment. Indicators tied loosely to coaching distort less than indicators tied directly to ranking and pay, the condition Campbell's Law warns about.
  • Rotate and audit measures. Watch for a metric improving while related outcomes stall — the signature of gaming rather than genuine gain.
  • Separate the target from the measure. Set goals on the true objective and treat proxies as evidence about it, not as the objective itself.

Maturity Model Position

In the WFM Labs Maturity Model™, how an operation uses its metrics is a clearer maturity signal than how many it tracks.

  • Level 1–2 (Emerging / Foundational) — single proxy metrics are set as hard targets and tied to incentives; gaming is common and often misread as improvement.
  • Level 3 (Progressive) — metrics are balanced (efficiency paired with quality and outcome measures), and the operation actively watches for proxy–outcome divergence.
  • Level 4–5 (Advanced / Pioneering) — measurement emphasizes verified outcomes, metrics are used primarily for learning, and automated optimization is constrained by balancing measures so systems do not game their own objective functions at scale.

See also

References

  1. Goodhart, C. A. E. (1975). "Problems of Monetary Management: The U.K. Experience". In Papers in Monetary Economics, Vol. 1. Reserve Bank of Australia.
  2. Strathern, M. (1997). "'Improving ratings': audit in the British University system". European Review, 5(3), 305–321. <305::AID-EURO184>3.0.CO;2-4 doi:10.1002/(SICI)1234-981X(199707)5:3
  3. Campbell, D. T. (1979). "Assessing the impact of planned social change". Evaluation and Program Planning, 2(1), 67–90. doi:10.1016/0149-7189(79)90048-X.