Propensity Score Matching in WFM

From WFM Labs

Propensity Score Matching in WFM (PSM) is a causal-estimation method that approximates a randomized comparison from observational data by pairing treated units with untreated units that had the same probability of being treated. That probability — the propensity score — collapses many confounding characteristics into a single number, so that comparing treated and untreated units with similar scores removes the measured confounding. It is a workhorse for workforce management evaluations where an intervention was not randomized and the treated group differs systematically from everyone else.[1]

The idea

The problem PSM addresses is selection: the agents who took a training program, or the sites that adopted a tool, are usually different from those that did not — in tenure, skill, volume, or motivation — and those differences also drive the outcome. Rosenbaum and Rubin showed that, rather than match on every covariate at once (infeasible with many variables), it suffices to match on the single propensity score, the estimated probability of treatment given the covariates. Conditional on the score, treated and untreated units have, on average, balanced covariates — mimicking randomization for the measured variables.[2]

How it is done

  1. Estimate the propensity score — model treatment as a function of observed covariates (commonly logistic regression), producing each unit's probability of treatment.
  2. Match, weight, or stratify — pair each treated unit with one or more untreated units of similar score (or weight units by the score, or group into strata).
  3. Check balance — confirm the matched groups now have similar covariate distributions; this, not the model's fit, is the success criterion.
  4. Estimate the effect — compare outcomes between matched groups, typically yielding the average treatment effect on the treated (ATT).[3]

WFM examples

  • Voluntary program uptake. Estimating the effect of an opt-in upskilling program by matching participants to non-participants with the same propensity to enroll (based on tenure, prior performance, schedule).
  • Tool adoption. Comparing teams that adopted an agent-assist tool to similar non-adopting teams matched on size, channel mix, and baseline metrics.
  • Shift or site comparisons. Estimating the effect of a shift pattern by matching agents with similar characteristics across patterns.

Cautions

  • Only measured confounders. PSM balances observed covariates; it does nothing for unmeasured confounding such as motivation. This is its central limitation and the reason it is weaker than instrumental variables (which can handle unmeasured confounding) and far weaker than a randomized experiment.
  • Overlap is required. If treated units have no comparable untreated counterparts (non-overlapping scores), they cannot be matched and must be dropped, changing the population the estimate applies to.
  • Balance, not the model, is the goal. A propensity model can fit well yet leave covariates unbalanced; balance must be checked directly after matching.
  • Garbage in. Conditioning on the wrong variables — a collider or a mediator — can introduce bias rather than remove it, so covariate choice should follow the causal diagram, not convenience.

Maturity Model Position

In the WFM Labs Maturity Model™, PSM is a common step up from naive treated-vs-untreated comparison.

  • Level 1–2 (Emerging / Foundational) — program participants are compared directly to non-participants, with selection differences mistaken for program effects.
  • Level 3 (Progressive) — analysts adjust for selection by matching or weighting on a propensity score and verify covariate balance before claiming an effect.
  • Level 4–5 (Advanced / Pioneering) — matching is combined with sensitivity analysis for unmeasured confounding and chosen against alternatives (DiD, IV, RD, experiments) based on which assumptions are most defensible.

See also

References

  1. Rosenbaum, P. R., & Rubin, D. B. (1983). "The Central Role of the Propensity Score in Observational Studies for Causal Effects". Biometrika, 70(1), 41–55. doi:10.1093/biomet/70.1.41.
  2. Stuart, E. A. (2010). "Matching Methods for Causal Inference: A Review and a Look Forward". Statistical Science, 25(1), 1–21. doi:10.1214/09-STS313.
  3. Caliendo, M., & Kopeinig, S. (2008). "Some Practical Guidance for the Implementation of Propensity Score Matching". Journal of Economic Surveys, 22(1), 31–72.