Causal Diagrams (DAGs) in WFM

From WFM Labs
The three causal structures a DAG makes visible — fork (confounder), chain (mediator), and collider — each implying a different rule about what to adjust for.

Causal Diagrams (DAGs) in WFM are directed acyclic graphs used to represent assumptions about cause and effect in workforce management data. In a causal DAG, each node is a variable (volume, tenure, coaching, attrition), each arrow is a claim of direct causal influence from one variable to another, and acyclic means no variable can cause itself through a loop. A DAG is not a picture of the data; it is an explicit statement of the causal structure a practitioner believes generated the data, and that structure determines which statistical comparisons are valid.[1] Where correlation and causation explains why association is not causation, the DAG is the tool for reasoning about it precisely; where causal inference estimates the size of an effect, the DAG is the step that comes first and decides what to estimate.

Why diagram a causal problem

Most WFM analysis disputes are really disagreements about causal structure that stay hidden because no one writes the structure down. Drawing a DAG forces the assumptions into the open: it makes explicit which variables are believed to cause which, exposes the confounders that must be accounted for, and — critically — identifies the variables that must not be adjusted for. The same diagram doubles as a communication device between WFM analysts, operations leaders, and data-science teams, who can argue about an arrow rather than about a regression output they cannot see inside.[2]

The three building blocks

Every DAG, however large, is built from three elementary three-node structures. Knowing how association flows through each is enough to reason about most WFM analyses.

  • Fork (common cause / confounder): Schedule ← Tenure → Attrition. A single variable causes two others, creating a non-causal association between them. Tenured agents may both prefer certain schedules and leave at different rates, so schedule and attrition will appear related even if the schedule has no effect. The fork is open by default and must be closed by adjusting for the confounder.
  • Chain (mediator): Coaching → Adherence → CSAT. The middle variable transmits the effect of the first to the last. Adjusting for the mediator blocks the very pathway being studied — controlling for adherence here would hide coaching's effect on CSAT rather than reveal it. Whether to adjust depends on the question (total versus direct effect).
  • Collider: Training → Promoted ← Aptitude. Two variables both cause a third. Unlike a fork, a collider is closed by default — training and aptitude are unrelated in the full population — but adjusting for the collider opens a spurious association between its causes.

The collider trap

The collider is the most counterintuitive structure and the most frequently violated rule in workforce analysis. Conditioning on a collider — by filtering, segmenting, or statistically controlling for it — manufactures a correlation that does not exist in the population.[3] In WFM this most often appears as an analysis run on a selected population:

  • Studying whether training improves performance using only agents who were promoted conditions on a collider — promotion is caused by both training and aptitude — and can make training look useless or even harmful among the promoted, even when it helps everyone.
  • Analyzing drivers of CSAT using only agents who survived to tenure 12 months conditions on retention, which is itself an outcome of the things under study. This is the mechanism behind survivorship and selection effects (see regression to the mean and selection bias).
  • Comparing vendors using only contacts that reached an agent conditions on non-deflection, distorting the comparison when deflection depends on the same factors as the outcome.

The rule that falls out of the diagram is simple and powerful: adjust for confounders, leave mediators and colliders alone, and be suspicious of any analysis restricted to a subgroup defined by an outcome.

The backdoor criterion

A DAG tells the analyst exactly which variables to adjust for to estimate a causal effect. A backdoor path is any non-causal path connecting the supposed cause and effect that begins with an arrow pointing into the cause — these are the paths that create confounding. The backdoor criterion says: choose an adjustment set that blocks every backdoor path without opening any collider.[4] The resulting set is the list of variables to control for in the analysis — no more, no less. Adding "everything available" is not safe: throwing a collider or a mediator into a regression actively introduces bias, a result that surprises practitioners trained to control for as many variables as possible.

Using DAGs in workforce management

The diagram is a planning step, drawn before any model is fit:

  1. Draw the structure. List the variables and the causal arrows believed to connect them, using domain knowledge. Disagreements here are the real analysis.
  2. Find the backdoor paths between the intervention and the outcome.
  3. Choose the adjustment set that blocks confounding without opening colliders.
  4. Estimate the effect with the chosen controls using the methods in Causal Inference in Workforce Management — regression, difference-in-differences, instrumental variables — or, where feasible, sidestep the whole problem with a randomized controlled experiment, which balances confounders by design.

The diagram also explains, structurally, why Simpson's paradox happens: it is the visible result of a fork left open, and the DAG tells you whether the aggregated or the segmented number answers the question.

Limitations

A DAG encodes assumptions, not evidence. Its conclusions are only as good as the arrows drawn, and the data cannot confirm that the structure is correct — an omitted arrow for an unobserved confounder will quietly invalidate the adjustment set. DAGs in their basic form represent direction, not magnitude or functional form, and they assume the no-feedback (acyclic) condition, which is a simplification for systems with genuine feedback loops over time. They are a discipline for reasoning, not a substitute for good data or for experimentation where it is possible.[5]

Maturity Model Position

In the WFM Labs Maturity Model™, explicit causal diagramming marks the transition from correlational analytics to causal analytics.

  • Level 1–2 (Emerging / Foundational) — relationships are inferred from dashboards and regressions with no stated causal structure; "control for everything" is the norm, and collider and selection bias go unrecognized.
  • Level 3 (Progressive) — analysts sketch causal diagrams before important analyses, choose adjustment sets deliberately, and avoid conditioning on mediators and colliders.
  • Level 4–5 (Advanced / Pioneering) — DAGs are standard artifacts in analysis design, shared with data-science partners, and embedded in how automated decision systems are specified, so that learned models estimate causal effects rather than fragile associations.

See also

References

  1. Pearl, J., & Mackenzie, D. (2018). The Book of Why: The New Science of Cause and Effect. Basic Books. ISBN 978-0-465-09760-9.
  2. Greenland, S., Pearl, J., & Robins, J. M. (1999). "Causal Diagrams for Epidemiologic Research". Epidemiology, 10(1), 37–48.
  3. Cinelli, C., Forney, A., & Pearl, J. (2022). "A Crash Course in Good and Bad Controls". Sociological Methods & Research. doi:10.1177/00491241221099552.
  4. Pearl, J. (2009). Causality: Models, Reasoning, and Inference. 2nd ed. Cambridge University Press. ISBN 978-0-521-89560-6.
  5. Hernán, M. A., & Robins, J. M. (2020). Causal Inference: What If. Chapman & Hall/CRC.