Applied Measurement and Estimation for WFM
Applied Measurement and Estimation for WFM addresses the most common estimation challenge in workforce management: you need a number — headcount, handle time, volume, cost — and you have no historical data. A new product launch, a channel migration, a technology investment, an organizational restructure. The business wants a staffing plan by Friday. You have nothing to go on except judgment and whatever analogues you can find.
This page provides a structured methodology for these situations, built primarily on Douglas Hubbard's calibrated estimation framework from How to Measure Anything (2014), adapted for the specific estimation problems that WFM practitioners face. The core argument is simple: you can measure more than you think, you need less data than you think, and the structured approaches described here will consistently outperform gut instinct and back-of-envelope math.
Overview
Every WFM team encounters situations where standard forecasting methods break down because the prerequisite — historical data — does not exist. Common triggers include:
- New product or service launch — no call history for a product that doesn't exist yet
- New channel deployment — chat, messaging, or video support with no baseline
- Technology implementation — estimating the impact of a new IVR, chatbot, or AI agent on volume and handle time
- Organizational change — consolidating queues, offshoring, insourcing, or splitting a line of business
- Marketing campaign — estimating the volume spike from a campaign that hasn't run before
- Schedule policy change — projecting the attrition impact of moving from fixed to rotating schedules
In each case, the WFM analyst must produce an estimate that will drive hiring, scheduling, and budgeting decisions — decisions with real financial consequences. The temptation is to throw up your hands and say "we can't forecast this." The reality is that you can produce a useful estimate — one that is better than random and honest about its uncertainty — using the methods described here.
The distinction matters: a useful estimate is not a precise prediction. It is a calibrated range that reduces uncertainty enough to support a decision. Going from "I have absolutely no idea how many calls this will generate" to "I'm 90% confident it will be between 200 and 800 calls per day in the first quarter" is an enormous improvement, even though the range is wide. That range tells you something actionable: you need capacity for at least 200 and probably not more than 800. That's a staffing plan you can build.
The Measurement Objection
When asked to estimate something without data, most people respond with some version of "that can't be measured." Hubbard identifies several forms of this objection and systematically dismantles each one.[1]
"It's completely unique"
No WFM estimation problem is truly unprecedented. A "brand new" product support queue still involves customers contacting an organization through known channels about a product in a known category. The customers have demographics. The product has a complexity profile. The channel has behavioral norms. There are always analogues — other product launches, other companies' experiences with similar products, other queues in your own organization that serve a similar customer base.
The belief in uniqueness is a cognitive bias. It feels unique because it's your situation, but the structural parameters (contact rate per customer, handle time by complexity, first-contact resolution rate by channel) vary within knowable ranges across the industry.
"There's not enough data"
This objection confuses "no data" with "no perfect data." In practice, WFM teams sit on enormous amounts of adjacent data:
- Historical product launches (even if the product differs, the launch dynamics are similar)
- Industry benchmarks for contact rates, AHT, and resolution rates by product category
- Vendor-provided benchmarks from implementations at other organizations
- Agent-reported estimates from subject matter experts who handle adjacent queues
- Pilot data from soft launches, beta tests, or limited rollouts
The question is not "do we have data?" but "what data do we have, and how much does it reduce our uncertainty?"
"The stakeholders won't accept a range"
This is a communication problem, not a measurement problem. Stakeholders who demand a single number are asking for false precision. The responsible answer is a range with a stated confidence level, accompanied by a clear explanation of what drives the width of the range and what would narrow it. In practice, most stakeholders prefer honest uncertainty to confident wrongness — especially after they've been burned by a point estimate that missed badly.
Hubbard's Five Principles
Hubbard's framework rests on five principles that directly apply to WFM estimation problems.[1]
1. It's Not as Unique as You Think
Whatever you're estimating has been done before. A new AI chatbot deployment? Hundreds of organizations have deployed chatbots. A new product line? Your company or your competitors have launched products before. The key is finding the right analogues — situations similar enough to yours that their outcomes constrain your estimate.
For WFM, useful analogue sources include:
- Your own organization's history of similar events
- Industry analyst reports (Gartner, Forrester, ICMI, ContactBabel) that aggregate cross-industry data
- Vendor case studies and implementation benchmarks
- Professional networks and peer organizations willing to share anonymized data
- Published academic research on contact center operations
2. There's More Data Than You Think
Internal data assets are routinely underutilized. A WFM analyst estimating volume for a new product support queue may not have data for that queue, but likely has:
- Contact rates for existing products (contacts per customer per month)
- Marketing's projected customer acquisition numbers
- Product complexity assessments from the development team
- Customer demographic data that correlates with contact behavior
- Seasonal patterns from existing queues that will likely apply to the new one
Each of these is a data point that constrains the estimate. The contact rate may be uncertain, but it's not unbounded — it will almost certainly fall between 0.01 and 0.5 contacts per customer per month, because that's the range observed across all your existing products.
3. You Need Less Data Than You Think
This is perhaps the most counterintuitive principle. Hubbard demonstrates that even 5-10 observations can reduce the width of a confidence interval by more than 50%.[1] The mathematics of sampling explain why: the standard error of a mean decreases with the square root of sample size. Going from 0 to 5 observations is dramatically more valuable than going from 500 to 505.
For WFM, this means:
- Three days of pilot data from a soft launch provide enormous information about AHT and contact complexity
- Five similar product launches in your history provide a meaningful distribution of contact rates
- Ten agent interviews about expected complexity provide calibrated estimates that beat uninformed guesses
The implication: don't wait for a statistically complete dataset. Collect a small amount of real data as early as possible and use it to update your estimate.
4. The First Observations Reduce Uncertainty Dramatically
The value of information is front-loaded. Going from "I have no idea" to "I have 5 data points" is a massive reduction in uncertainty. Going from 500 data points to 505 is negligible. This means that even informal, imperfect early data collection — a one-day pilot, a small survey, a handful of test calls — provides disproportionate value.
In practical WFM terms: if you're launching a new queue in 90 days and you can run a 3-day pilot with 10 agents in week 1, do it. Those 3 days of data will improve your 90-day staffing plan more than any amount of theoretical analysis.
5. Creative Measurement Beats Expensive Studies
Proxy measures, indirect observation, and creative data collection often provide more practical value than formal research. Hubbard calls this "measuring it like a scientist" — finding observable indicators that correlate with what you actually need to know.[1]
WFM examples of creative measurement:
- Proxy for new-product contact rate: Count the number of questions asked in the product's beta forum. The ratio of forum questions to beta users is a proxy for contact rate.
- Proxy for schedule change attrition impact: Measure eNPS (employee Net Promoter Score) before and after announcing the change. eNPS decline correlates with attrition acceleration.
- Proxy for AI chatbot containment rate: Test the chatbot on 100 randomly sampled historical contacts. What fraction does it handle without escalation?
- Proxy for marketing campaign volume impact: Analyze web traffic spikes from past campaigns and correlate with contact volume spikes to establish a traffic-to-contact conversion ratio.
Calibration Training
Before making estimates, practitioners must be calibrated — meaning their stated confidence intervals actually contain the true value at the stated rate. If you say you're "90% confident the AHT will be between 4 and 12 minutes," the true AHT should fall in that range 90% of the time across all your 90% confidence estimates.
The Overconfidence Problem
Research consistently shows that untrained estimators are dramatically overconfident. When asked for 90% confidence intervals, most people capture the true value only 50-60% of the time.[1] They set their ranges too narrow because they anchor on what feels "reasonable" rather than genuinely considering the extremes.
Lichtenstein, Fischhoff, and Phillips (1982) demonstrated this calibration gap across a wide range of estimation tasks, finding that overconfidence is one of the most robust cognitive biases in judgment under uncertainty.[2]
In WFM, this manifests as:
- AHT estimates that are too narrow ("it'll be around 6 minutes" when the actual range of plausible outcomes spans 3-15 minutes)
- Volume forecasts that ignore tail scenarios
- ROI projections that assume the median outcome without accounting for downside risk
How Calibration Training Works
Hubbard's calibration training protocol involves answering 10-20 trivia questions (not WFM questions — general knowledge) with 90% confidence intervals. After each round, the trainee sees how many intervals captured the true answer. If the capture rate is 50% instead of 90%, the trainee learns viscerally that their ranges are too narrow.[1]
The training typically takes 2-4 hours and produces lasting improvement. Trained estimators improve their calibration from ~50-60% to ~80-90% capture rates — not perfect, but dramatically better. The key insight participants report: "I had to make my ranges much wider than felt comfortable."
For WFM teams, running a calibration session before a major estimation exercise (new LOB launch, annual capacity planning, technology business case) materially improves the quality of the resulting estimates. It costs nothing but a few hours and produces better decision inputs.
Equivalent Bets
One technique Hubbard uses for calibration: the equivalent bet. After stating a 90% confidence interval, ask yourself: "Would I bet $1,000 at 9:1 odds that the true value falls in this range?" If that bet feels uncomfortable, your range isn't really 90% — widen it. This technique forces emotional engagement with the stated probability and counteracts the tendency toward false precision.[1]
The Estimation Process
A step-by-step method for producing a calibrated WFM estimate when historical data is unavailable or insufficient.
Step 1: Define the Measurement
Vague estimation targets produce vague estimates. "How many calls will we get?" is not a measurement definition. What calls, when, through which channel, from which customers, measured how?
A proper measurement definition: "Average daily inbound voice calls to the Tier 1 product support queue for Product X, from retail customers in the US and Canada, during weekdays in months 1-3 post-launch, counted at the IVR routing point."
This precision matters because different definitions yield different numbers. Calls counted at the IVR may be 20% higher than calls counted at the agent queue (due to IVR containment). Including weekends changes the daily average. Including business customers changes the volume profile.
Step 2: Determine What's Already Known
Before estimating, inventory what you already know:
- Internal analogues: What happened when you launched Product Y last year? Product Z two years ago?
- External benchmarks: What do industry reports say about contact rates for this product category?
- Adjacent data: What's the contact rate for your existing products? What's the AHT for the closest comparable queue?
- Expert knowledge: What do the product team, the marketing team, and experienced agents estimate?
Document each data source and what it tells you. Even partial, imprecise information constrains the estimate.
Step 3: Anchor on the Extremes
Kahneman and Tversky's work on anchoring bias shows that people's estimates are heavily influenced by their starting point.[3] To counteract this, start from the extremes rather than the middle:
- Lower bound: What is the absolute minimum this could possibly be? What scenario produces the lowest number? (Product is a total flop, no one calls, only a trickle of early adopters.)
- Upper bound: What is the absolute maximum? What's the nightmare scenario? (Product has a critical defect, every customer calls, social media drives additional volume.)
These absurd extremes are easy to set. Now work inward: given what you know from Step 2, where within this range is the outcome most likely to fall?
Step 4: Develop the 90% Confidence Interval
Using your calibration training, set a range you believe will contain the true value 90% of the time. This range should be wider than feels comfortable — that discomfort is calibration working correctly.
Example: "I'm 90% confident that daily call volume for the new product queue will be between 150 and 600 calls per day in months 1-3."
This is not a point forecast. It's a probability statement about a range. The staffing plan should account for the full range — a flexible plan that can handle 150 calls (minimum staffing with flex capacity) and scale to 600 calls (overtime, borrowed agents, outsource contingency).
Step 5: Calculate the Value of Additional Information
Before spending money or time on research, calculate whether the information would change your decision. Hubbard formalizes this as Expected Value of Perfect Information (EVPI) — the maximum you should pay for perfect knowledge.[1]
If your 90% CI for daily volume is 150-600, and you'd staff the same way whether it's 200 or 400 (flexible plan with contingency), then narrowing the range has low value. But if staffing for 150 versus 600 requires fundamentally different approaches (internal team versus outsourced overflow), and the wrong choice costs $500K, then information that narrows the range is worth investing in.
Step 6: Decompose Complex Estimates
Never estimate a complex quantity directly. Break it into components, estimate each, and combine.
Example: Estimating FTE for a new product queue
Instead of estimating "we need X agents," decompose:
- Customer base × Contact rate = Monthly volume
- Monthly volume × AHT = Monthly workload (hours)
- Monthly workload ÷ Productive hours per agent = Base FTE
- Base FTE × Shrinkage multiplier × Occupancy adjustment = Required FTE
Each component can be estimated more accurately than the compound result. Customer base comes from sales projections. Contact rate can be analogued from similar products. AHT can be estimated from complexity assessment. Shrinkage and occupancy use your known operational parameters.
Decomposition also reveals which component contributes the most uncertainty. If contact rate ranges from 0.02 to 0.10 per customer per month (5× range) while AHT ranges from 5 to 8 minutes (1.6× range), the contact rate is the dominant uncertainty. Focus your information-gathering effort there.
Step 7: Iterate with Data
Once actual data begins arriving, update the estimate using Bayesian updating: your initial estimate is the prior, new data shifts the posterior. The formal mathematics connect to Bayesian Methods for Workforce Forecasting, but the practical version is straightforward:
- Week 1 actual data arrives — compare to your prior estimate
- If actual data falls within your range, narrow the range using the new data
- If actual data falls outside your range, widen the range and investigate why your prior was wrong
- Repeat weekly or daily as data accumulates
Within 2-4 weeks of actual data, your estimate should converge to a narrow range. The structured prior from Steps 1-6 ensures you make better decisions during those critical early weeks than you would with no estimate at all.
WFM Applications
New Marketing Campaign Volume Impact
Decomposition:
- Campaign reach (impressions/sends) → Marketing provides this
- Response rate → Use past campaign data; industry average for email campaigns is 1-5%, for TV campaigns 0.01-0.1%[4]
- Contact rate per respondent → What fraction of people who respond to the campaign also contact the support center? Estimate from past campaigns.
- Timing curve → How quickly does volume ramp and decay? Most campaigns show a 3-5 day spike followed by exponential decay.
Calibrated estimate: "90% CI for incremental daily calls during the campaign week: 50-400. Peak day likely 2× the daily average."
New Line of Business Starting AHT
Analogues: Identify 3-5 existing queues with similar characteristics (product complexity, customer sophistication, system tools available). Pull their AHT histories from their first 90 days. Plot the range.
Complexity adjustment: If the new LOB is more complex (more systems to navigate, more regulatory requirements, longer verification steps), adjust upward. Experienced capacity planners use a complexity multiplier of 1.0× (comparable), 1.2-1.5× (moderately more complex), or 1.5-2.0× (significantly more complex).
Learning curve: New agents on a new product will be slower initially. Expect AHT in month 1 to be 1.3-1.8× the steady-state AHT, declining to steady state over 60-90 days as agents gain proficiency.[5]
Technology ROI
Decompose into measurable components:
- Handle time reduction (seconds saved per interaction × interactions per year × cost per second)
- Volume deflection (percentage of contacts automated × current volume × cost per contact)
- Quality improvement (reduction in repeat contacts × current repeat rate × cost per contact)
- Agent satisfaction impact (reduction in attrition × replacement cost per agent)
Each component gets its own 90% CI. Multiply the ranges using Monte Carlo simulation (run 10,000 random draws from each component's distribution, multiply, plot the resulting distribution of total ROI). This produces an honest ROI range rather than a single number built on stacked best-case assumptions.
Attrition Impact of Schedule Changes
Natural experiments: If you've changed schedules before (even for a subset of agents), use the before/after attrition data. If not, look for natural variation — sites or teams with different schedule policies and different attrition rates.
Proxy measures: Survey agents about schedule satisfaction before the change. Correlate historical schedule satisfaction scores with subsequent attrition rates. Use the correlation to project the impact.
Decomposition: Attrition impact = (number of agents who strongly dislike the new schedule) × (probability that dissatisfaction leads to departure within 12 months). Survey data provides the first term. Historical voluntary attrition rates among dissatisfied agents provide the second.
AI Containment Rate for New Use Case
Pilot testing: Run the AI system against 100-500 randomly sampled historical contacts of the relevant type. Score each as contained or escalated. This is the most direct measurement available and provides a tight confidence interval even at small sample sizes.
Analogue analysis: What containment rates have similar AI deployments achieved in comparable use cases? Published vendor data, industry analyst reports, and peer organization experiences provide ranges. Typical first-deployment containment rates for conversational AI range from 15-40% for complex product support to 50-80% for simple transactional inquiries.[6]
The Value of Information
One of Hubbard's most powerful contributions: before investing in measurement, calculate whether the measurement is worth its cost.
Expected Value of Perfect Information (EVPI)
EVPI is the maximum you should pay for perfectly accurate information. The calculation:[1]
- Identify the decision that depends on the measurement
- Determine what you would decide under each possible outcome
- Calculate the cost of being wrong for each scenario
- Weight by the probability of each scenario
- The expected cost of being wrong is the EVPI
WFM example: You're deciding whether to hire 20 or 40 agents for a new queue. If volume is low (40% probability), 20 agents is right and hiring 40 wastes $600K in unnecessary salary. If volume is high (60% probability), 40 agents is right and hiring only 20 causes $900K in customer impact and overtime. Expected cost of the wrong decision = (0.40 × $600K) + (0.60 × $0) if you hire 40, or (0.40 × $0) + (0.60 × $900K) if you hire 20. The EVPI — the value of knowing the answer — is the difference between these expected costs.
If EVPI is $50K, spending $30K on a pilot study is worthwhile. If EVPI is $5K, spending $30K on research destroys value.
When to Research vs. When to Decide
Apply the EVPI framework pragmatically:
- Low EVPI: The decision doesn't change much regardless of what you learn. Decide now and adjust later.
- High EVPI, cheap information available: Run a pilot, pull benchmarks, survey experts. The information costs less than the EVPI.
- High EVPI, expensive information required: Consider whether a phased approach (decide now, build in flexibility, adjust as data arrives) is cheaper than the research.
Most WFM estimation problems fall into the third category. The information is worth something, but formal research takes too long. The answer: make the best estimate you can now, build flexibility into the plan (overtime capacity, outsource agreements, cross-training), and iterate aggressively as real data arrives.
Fermi Estimation for WFM
Enrico Fermi was famous for his ability to estimate quantities from first principles by decomposing them into components — the classic example being "how many piano tuners are there in Chicago?"[7] The same decomposition principle applies directly to WFM estimation.
The Method
- Identify what you need to estimate
- Break it into factors you can estimate (or bound)
- Estimate each factor independently
- Multiply (or add) the factors
- Check the result against sanity bounds
WFM Fermi Example: Contact Volume for a New Mobile App
"We're launching a mobile app. How many support contacts will it generate?"
Decomposition:
- Expected app downloads in first 90 days: 200,000-500,000 (from Marketing projections)
- Fraction of downloaders who actively use the app: 30-50% (industry benchmark for utility apps)
- Contact rate per active user per month: 0.01-0.05 (based on existing digital channel contact rates)
- Monthly contacts: 200,000 × 0.30 × 0.01 = 600 (low) to 500,000 × 0.50 × 0.05 = 12,500 (high)
- 90% CI: roughly 1,000-8,000 contacts per month
That range is wide but actionable. It tells you this is a 5-40 FTE problem, not a 1-FTE or 200-FTE problem. You can staff for the midpoint with contingency plans for the extremes.
Why Decomposition Beats Direct Estimation
Psychological research on judgment under uncertainty shows that people estimate compound quantities poorly but estimate individual components reasonably well.[3] The errors in component estimates tend to partially cancel rather than compound, because overestimates on some factors are offset by underestimates on others. The mathematical result: decomposed estimates have lower total error than direct estimates of the same quantity.
For WFM capacity planners, this means: never estimate total FTE directly. Always decompose into volume × AHT × shrinkage × occupancy × service level requirement. Estimate each component, propagate the uncertainty, and let the math produce the FTE range.
Connection to Probabilistic Methods
Hubbard's framework is fundamentally Bayesian. The initial calibrated estimate is a prior distribution. New data updates it to a posterior distribution. The iterative process described in Step 7 above is informal Bayesian updating.
Formally connecting these approaches:
- Calibrated confidence intervals become prior distributions in a Bayesian model. A 90% CI of 4-12 minutes for AHT can be modeled as a log-normal distribution with parameters chosen to match the stated quantiles.
- Decomposition corresponds to a probabilistic graphical model where volume, AHT, and shrinkage are random variables with their own distributions, and FTE is a derived quantity.
- Monte Carlo simulation (drawing random samples from each component's distribution and computing the compound result) produces the posterior distribution of the quantity of interest without requiring closed-form mathematics.
For a deeper treatment of these methods, see Bayesian Methods for Workforce Forecasting, Probabilistic Forecasting, and Ensemble Forecasting Methods for WFM.
The practical takeaway: Hubbard's approach is the entry point. When you need more rigor — larger teams, higher stakes, recurring estimation problems — the natural progression is into formal probabilistic forecasting with explicit Bayesian models. But the calibrated estimation framework provides 80% of the value with 20% of the mathematical machinery.
Maturity Model Position
In the WFM Maturity Model, applied measurement and estimation capabilities map as follows:
- Level 1 (Reactive): No structured estimation. New initiatives use "best guess" numbers from whoever speaks loudest.
- Level 2 (Developing): Basic analogue-based estimation. WFM team pulls data from similar past events but uses point estimates without ranges.
- Level 3 (Defined): Structured estimation process with decomposition, documented analogues, and stated confidence intervals. Calibration training for key estimators.
- Level 4 (Advanced): Formal probabilistic estimation with Monte Carlo simulation, EVPI calculations, and Bayesian updating as data arrives.
- Level 5 (Optimized): Estimation is embedded in organizational decision-making. Calibration is tracked over time. Estimation accuracy is measured and improved. Decomposition models are reused and refined across initiatives.
Most WFM organizations operate at Level 1-2. Moving to Level 3 requires no new technology — only training and process discipline. The return on that investment is substantial: better staffing decisions, more honest business cases, and fewer "surprised by reality" moments when new initiatives go live.
See Also
- Bayesian Methods for Workforce Forecasting
- Probabilistic Forecasting
- Forecasting Methods
- Judgmental Forecasting
- Ensemble Forecasting Methods for WFM
- Scenario Planning and Contingency Staffing
- Workforce Planning Templates and Frameworks
References
- ↑ 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 Hubbard, Douglas W. How to Measure Anything: Finding the Value of Intangibles in Business, 3rd edition. John Wiley & Sons, 2014. ISBN 978-1-118-53927-9.
- ↑ Lichtenstein, Sarah; Fischhoff, Baruch; Phillips, Lawrence D. "Calibration of Probabilities: The State of the Art to 1980." In Kahneman, Daniel; Slovic, Paul; Tversky, Amos (eds.), Judgment Under Uncertainty: Heuristics and Biases. Cambridge University Press, 1982. pp. 306–334. ISBN 978-0-521-28414-1.
- ↑ 3.0 3.1 Kahneman, Daniel. Thinking, Fast and Slow. Farrar, Straus and Giroux, 2011. ISBN 978-0-374-27563-1.
- ↑ Data & Marketing Association. Response Rate Report 2023. DMA/ANA, 2023.
- ↑ Gans, Noah; Koole, Ger; Mandelbaum, Avishai. "Telephone Call Centers: Tutorial, Review, and Research Prospects." Manufacturing & Service Operations Management, vol. 5, no. 2, 2003, pp. 79–141. doi:10.1287/msom.5.2.79.16071.
- ↑ ContactBabel. The Inner Circle Guide to AI, Chatbots & Machine Learning, 9th edition. ContactBabel, 2024.
- ↑ Weinstein, Lawrence; Adam, John A. Guesstimation: Solving the World's Problems on the Back of a Cocktail Napkin. Princeton University Press, 2008. ISBN 978-0-691-12949-5.
