Erlang Sensitivity and the Staffing Cliff
Erlang Sensitivity and the Staffing Cliff describes the phenomenon where Erlang C's output — required staffing to meet a Service Level target — is highly nonlinear near the tipping point, such that adding or removing 2–3 agents can swing service level by 15–20+ percentage points. This "cliff" in the Erlang curve has profound implications for capacity planning, budget negotiations, and operational risk management.
Overview

The Erlang C function maps agent count to service level for a given offered load. The relationship is not linear — it's an S-curve with a steep transition zone. Below the cliff, adding agents produces massive SL improvement. Above it, adding agents produces diminishing returns. And at the cliff itself, tiny changes in staffing produce wild swings in SL.
This matters because:
- A Finance request to "just cut 3 agents" may push the operation off the cliff
- Small shrinkage variance (a few extra sick calls) can collapse SL if you're operating near the cliff
- Forecast errors that would be harmless in the flat region become catastrophic near the cliff
- The daily aggregate SL can look acceptable while half the intervals are in freefall
Understanding where you sit on the Erlang curve — and how close to the cliff — is one of the most operationally important pieces of knowledge in WFM.
The Cliff: A Worked Example
Consider a queue with 130 Erlangs of offered load (e.g., 1,300 calls/hour at 6-minute AHT). Target SL: 80/20.
| Agents | Occupancy | SL (80/20) | ASA (sec) | Abandon (est.) | SL Change per Agent Added |
|---|---|---|---|---|---|
| 133 | 97.7% | 7% | 245 | 22% | — |
| 135 | 96.3% | 16% | 152 | 16% | +4.5 pts/agent |
| 137 | 94.9% | 30% | 88 | 10% | +7.0 pts/agent |
| 139 | 93.5% | 46% | 52 | 7% | +8.0 pts/agent |
| 141 | 92.2% | 59% | 33 | 5% | +6.5 pts/agent |
| 143 | 90.9% | 70% | 21 | 3.5% | +5.5 pts/agent |
| 145 | 89.7% | 78% | 14 | 2.5% | +4.0 pts/agent |
| 147 | 88.4% | 84% | 10 | 1.8% | +3.0 pts/agent |
| 149 | 87.2% | 89% | 7 | 1.2% | +2.5 pts/agent |
| 152 | 85.5% | 93% | 4 | 0.7% | +1.3 pts/agent |
| 155 | 83.9% | 96% | 2 | 0.3% | +1.0 pts/agent |
| 160 | 81.3% | 98% | 1 | 0.1% | +0.4 pts/agent |
The cliff zone for this queue is approximately 137–145 agents. Within this range:
- Going from 137 to 139 agents (adding 2): SL jumps from 30% to 46% — a 16-point swing from 2 agents
- Going from 139 to 141 (adding 2): SL jumps from 46% to 59% — 13 points from 2 agents
- Going from 141 to 143 (adding 2): SL jumps from 59% to 70% — 11 points from 2 agents
Meanwhile, above the cliff:
- Going from 149 to 152 (adding 3): SL goes from 89% to 93% — 4 points from 3 agents
- Going from 152 to 155 (adding 3): SL goes from 93% to 96% — 3 points from 3 agents
The same number of agents produces 4x the SL impact near the cliff compared to above it. This is why the Erlang curve is dangerous: it creates the illusion that staffing changes have moderate, predictable effects, when in reality the effect depends entirely on where you are on the curve.
Why the Cliff Exists
The mathematical explanation: in a queuing system, wait times are proportional to ρ/(1−ρ), where ρ = a/n (offered load / agents). As ρ approaches 1, this ratio explodes hyperbolically. Near the cliff, you're in the steep part of the hyperbola — each agent removed pushes ρ closer to 1, and the wait-time response is disproportionate.
Intuitively: when you have "just enough" agents, removing one doesn't just make things slightly worse — it creates a persistent queue that never clears, because the remaining agents can't work fast enough to catch up with arrivals during any sustained burst. The queue grows, waits compound, and the system enters a congestion spiral.
Above the cliff, there's enough surplus capacity that temporary bursts create temporary queues that resolve quickly. Below the cliff, every burst creates a queue that persists and compounds.
Implication 1: "Just Cut 5 Agents"
When Finance says "cut 5 agents," the impact depends entirely on where you currently sit:
Scenario A: Safely above the cliff (155 agents, SL = 96%). Cutting 5 agents → 150 agents, SL ≈ 90%. Manageable. SL dropped 6 points. The operation is still healthy.
Scenario B: On the cliff (145 agents, SL = 78%). Cutting 5 agents → 140 agents, SL ≈ 42%. Catastrophic. SL dropped 36 points. The operation is in crisis — occupancy trap territory, abandonment spiking, agents burning out.
Same 5-agent cut. Radically different outcomes. WFM's job is to know which scenario applies and communicate it before the cut happens.
The conversation: "You're asking to cut 5 agents. Here's the Erlang curve for our queue. We're currently at 145 agents — right on the cliff. Cutting 5 puts us at 40% SL. If we were at 155, I'd say yes. At 145, the answer is no unless you're prepared for 40% SL and everything that comes with it."
Implication 2: Forecast Error Amplification
A ±5% forecast error is normal and manageable — except near the cliff. With 130 Erlangs offered load:
- Staffed at 145 agents (target: SL = 78%)
- Actual load comes in 5% high: 136.5 Erlangs
- Effective agents needed for 78% SL at 136.5 Erlangs: ~152
- Actual agents: 145
- Realized SL: approximately 55% — a 23-point miss from a 5% forecast error
The same 5% forecast error at 155 agents (SL = 96%):
- Effective agents needed for 96% at 136.5 Erlangs: ~161
- Actual agents: 155
- Realized SL: approximately 89% — a 7-point miss from the same 5% error
Near the cliff, forecast errors are amplified 3–4x. This is why Probabilistic Planning — planning to a forecast distribution rather than a point forecast — matters more for operations near the cliff. And it's why staffing to the 70th or 80th percentile of forecast rather than the mean is critical during cliff-adjacent intervals.
Implication 3: Shrinkage Variance
You planned for 32% shrinkage and staffed accordingly. But shrinkage on any given day varies — some days it's 28%, some days it's 38%. If you're near the cliff, the high-shrinkage days are devastating.
Example: planned 145 agents available (SL target 78%). Gross scheduled: 145 / (1 - 0.32) = 213 agents.
- Normal day (32% shrinkage): 213 × 0.68 = 145 available → SL = 78% ✓
- Good day (28% shrinkage): 213 × 0.72 = 153 available → SL = 92% (great)
- Bad day (38% shrinkage): 213 × 0.62 = 132 available → SL = 5% (collapse)
Three extra agents calling in sick can be the difference between 78% SL and 5% SL when you're on the cliff. This is not an edge case — shrinkage variance of ±5 points is routine. Flu season, school holidays, Monday mornings — shrinkage spikes are predictable in aggregate and variable in specifics.
The lesson: if you're operating near the cliff, you need shrinkage buffer or standby capacity. The cost of that buffer is much less than the cost of cliff events.
Implication 4: The Interval Problem
Daily SL of 80% can mask interval-level disaster. In a typical day:
- Morning ramp (07:00–09:00): understaffed, on or below the cliff
- Mid-morning (09:00–11:30): fully staffed, well above the cliff
- Lunch (11:30–13:00): understaffed, near the cliff
- Afternoon (13:00–16:00): adequately staffed, above the cliff
- Late afternoon (16:00–18:00): trailing off, near the cliff
The daily average SL might be 80% because the good intervals compensate for the bad ones mathematically. But the customer experience is bimodal: callers during peak intervals experience 20–40% SL while callers during off-peak experience 95%+. The average is a fiction that no one actually lives.
This is why interval-level SL reporting matters, and why interval-level staffing decisions (scheduling the right number of agents per 15-minute or 30-minute interval) are where the cliff analysis has its real impact. The WFM Labs Risk Score™ explicitly accounts for interval-level cliff proximity.
The Pooling Effect: Larger Centers Have Gentler Cliffs
The square-root staffing law (also known as the Halfin-Whitt regime) shows that the number of agents needed above the offered load scales with the square root of the load:
- Agents needed ≈ a + c × √a
where a is offered load and c is a constant determined by the SL target.
This means:
- A 50-Erlang queue needs ~50 + c×7.1 agents → the "buffer" (c×7.1) is a large fraction of total
- A 500-Erlang queue needs ~500 + c×22.4 agents → the "buffer" is a smaller fraction of total
In percentage terms, the larger queue needs a smaller proportional buffer, which means the cliff is gentler relative to total staffing:
| Offered Load | Agents for 80/20 | Buffer Above Load | Buffer as % of Total | SL Impact of Removing 1 Agent |
|---|---|---|---|---|
| 30 Erlangs | 39 | 9 | 23.1% | ~3–4 points |
| 130 Erlangs | 148 | 18 | 12.2% | ~2–3 points |
| 500 Erlangs | 534 | 34 | 6.4% | ~1 point |
| 2000 Erlangs | 2068 | 68 | 3.3% | ~0.5 points |
At 2,000 Erlangs, removing a single agent barely registers. At 30 Erlangs, removing a single agent can drop SL by 3–4 points. This is the pooling effect — and it's why large centers are more resilient to staffing variation and why small centers must be more conservative in their planning.
The practical implication: if you're running a 50-agent center, you're always near the cliff. Your planning must account for shrinkage variance, forecast error, and demand spikes more aggressively than a 500-agent center. This is also the economic argument for virtual pooling across sites — combining two 50-agent centers into one 100-agent virtual queue reduces cliff risk for both.
Connecting to Risk Management
The staffing cliff is fundamentally a risk management problem. The WFM Labs Risk Score™ framework incorporates cliff proximity as a risk factor:
- How close to the cliff are we in each interval? — measured as the difference between scheduled staffing and cliff-onset staffing
- What's the forecast error distribution for this interval? — wider distributions mean higher probability of landing on the cliff
- What's the shrinkage variance for this day/time? — higher variance increases cliff probability
- What's the cost of a cliff event? — abandonment, SL miss penalties, customer impact
The resulting risk score identifies the intervals where the operation is most vulnerable and where buffer investment has the highest return. Probabilistic Planning extends this by planning to a distribution of outcomes rather than a single point estimate.
Maturity Model Position
- Level 1 — Initial (Emerging Operations). The Erlang curve is treated as linear — "add 5 agents, get X% more SL." No awareness of the cliff. Staffing decisions made on averages. Interval-level SL not analyzed.
- Level 2 — Foundational (Traditional WFM Excellence). WFM understands the cliff conceptually ("we can't cut below N agents or SL collapses"). The Erlang curve is used for planning but cliff sensitivity not systematically analyzed. Daily SL reported; interval-level review spotty.
- Level 3 — Progressive (Breaking the Monolith). Cliff proximity calculated per interval. WFM communicates cliff risk to stakeholders: "we have 12 intervals per day within 3 agents of the cliff." Shrinkage buffer sized to account for cliff risk. Forecast error impact modeled at the interval level.
- Level 4 — Advanced (The Ecosystem Emerges). Probabilistic Planning standard — staffing to the 70th–80th percentile of forecast to manage cliff risk. WFM Labs Risk Score™ incorporates cliff proximity, forecast uncertainty, and shrinkage variance. Simulation validates Erlang-based cliff analysis for complex queues. Virtual pooling used to reduce cliff steepness.
- Level 5 — Pioneering (Enterprise-Wide Intelligence). Real-time cliff proximity monitored and managed. Demand shaping and routing adjustments triggered automatically when cliff proximity exceeds threshold. The cost of cliff events is quantified and visible in operational dashboards. Staffing decisions explicitly trade off cliff risk against cost.
See Also
- Erlang C
- Erlang-A
- Service Level
- Traffic Intensity and Server Utilization
- Occupancy
- The Occupancy Trap
- The Service Level Savings Fallacy
- The True Cost of Understaffing
- The ASA-SL-Abandon Relationship
- Average Speed of Answer (ASA)
- Abandonment
- Waiting Time Distributions
- Shrinkage
- WFM Labs Risk Score™
- Probabilistic Planning
- Staffing to Percentile vs Mean Forecast
