Online Optimization and Real-Time Decisions

From WFM Labs

Online Optimization and Real-Time Decisions addresses the fundamental challenge of making irrevocable decisions under incomplete information. In classical (offline) optimization, the problem is fully specified before solving begins. In online optimization, inputs arrive sequentially, and decisions must be made before future inputs are known. This is the daily reality of workforce management: you schedule staff before knowing actual demand, route calls before knowing what calls will arrive next, and offer VTO before knowing whether a late-afternoon spike will materialize.

Overview

Online decisions as information arrives over time

The distinction between offline and online optimization maps directly to WFM operational layers:

WFM Decision Offline Version Online Version
Scheduling Build weekly schedule knowing all demand Build schedule with forecast uncertainty; adjust intraday as actuals deviate
Routing Assign all calls to agents optimally (knowing all calls in advance) Route each call as it arrives, not knowing what comes next
Staffing adjustments Decide all VTO/overtime at once with complete information Decide each VTO offer in real time as conditions evolve
Capacity planning Hire knowing exact future demand Hire based on projections that may be wrong

Online optimization theory provides two things: (1) algorithms designed for sequential decision-making, and (2) competitive analysis — a framework for measuring how well any online algorithm can perform relative to an omniscient offline optimum.

Mathematical Foundation

Competitive Analysis

An online algorithm ALG has a competitive ratio c if for all input sequences σ:

ALG(σ)cOPT(σ)+b

where OPT(σ) is the offline optimum and b is a constant independent of the input. A competitive ratio of 2 means the online algorithm's cost is at most twice the cost of a solution computed with perfect foresight.

The competitive ratio is a worst-case measure. It asks: "how badly can an adversary exploit my algorithm's lack of information?" This pessimism is useful — it bounds the cost of uncertainty itself.

The Secretary Problem

The secretary problem is the canonical optimal stopping problem. You interview n candidates sequentially. After each interview, you must immediately hire or reject (no callbacks). The goal: hire the best candidate.

Optimal strategy: Reject the first n/e37% of candidates (the "exploration phase"), then hire the first candidate better than all those seen so far.

This strategy selects the best candidate with probability 1/e36.8% — provably optimal.

WFM parallel: when to commit to a staffing decision. A center needs to fill 5 overtime slots for next Saturday. Agents volunteer throughout the week. Should you accept the first 5 volunteers (risking that better-suited agents volunteer later)? Or wait (risking that not enough volunteer)? The secretary problem framework provides the optimal stopping rule: observe without committing for approximately 37% of the decision window, then accept the first candidate better than the best seen so far.

The Ski Rental Problem

A skier faces a choice each day: rent skis for $1/day or buy skis for $B. The skier doesn't know how many days they'll ski (the season could end at any time due to weather, injury, etc.).

Optimal deterministic strategy: Rent for B1 days, then buy on day B. This achieves a competitive ratio of 21/B2.

Optimal randomized strategy: Buy on day d where d is drawn from a specific distribution. This achieves a competitive ratio of e/(e1)1.58.

WFM parallel — the overtime vs. hire decision: paying overtime costs $r/day (rent), while hiring a new agent costs $B upfront (buy) but eliminates the overtime cost. If you knew how long the volume increase would last, the decision would be trivial. Without that knowledge, the ski-rental analysis says: use overtime for roughly B/r days, then hire.

More specifically, if hiring costs $5,000 (including training) and overtime costs $200/day, the break-even point is 25 days. The ski-rental strategy: pay overtime for 24 days, then hire on day 25. The worst case occurs when the volume increase ends on day 25 (you just hired unnecessarily) — you pay at most 2× the optimal cost.

Online Assignment and Matching

The online bipartite matching problem: agents (known in advance) must be matched to requests (arriving online). Each request must be assigned to an eligible agent immediately and irrevocably.

RANKING algorithm (Karp, Vazirani, Vazirani 1990): Randomly permute agents, assign each request to the highest-ranked eligible agent. Achieves a competitive ratio of 11/e0.632 for the maximization version — meaning it captures at least 63.2% of the optimal matching.

This is directly applicable to call routing: agents are known, calls arrive online, and each call must be assigned to an eligible (skilled) agent immediately. The RANKING result provides a baseline guarantee for any online routing algorithm.

Online Primal-Dual Methods

The primal-dual framework provides a systematic way to design and analyze online algorithms:

  1. Formulate the problem as an LP (primal)
  2. Write the dual LP
  3. Maintain a feasible dual solution that grows with each online decision
  4. Use dual variable updates to guide primal decisions

For the online set cover problem (relevant to online scheduling), the primal-dual approach achieves an O(logmlogn) competitive ratio, where m is the number of intervals and n is the number of available shifts.

The dual variables have a natural interpretation in WFM: the dual variable for a coverage constraint represents the marginal value of coverage in that interval. High dual variables indicate intervals where coverage is scarce and valuable — exactly the intervals where the algorithm should prioritize assignment.

WFM Applications

Real-Time VTO Offers

VTO (voluntary time off) decisions are online problems. At 10 AM, the system observes that actual volume is below forecast. Should it release agents?

The adversary: Volume could spike at 2 PM (making the VTO decision costly) or remain low (making the VTO decision correct).

Online approach: Model the VTO decision as an online covering problem. Define a "coverage margin" for each future interval. Release agents from intervals with the highest current margin, but maintain a buffer calibrated by the competitive ratio analysis. The buffer decreases as the day progresses (less future uncertainty), enabling more aggressive VTO offers in the afternoon.

A practical threshold: release agents only when the coverage margin exceeds ρminimum acceptable, where ρ is the competitive ratio (e.g., 1.5). This guarantees that even under worst-case demand realizations, coverage remains acceptable.

Intraday Overtime Decisions

The inverse of VTO: when should you call in overtime agents? Each hour, you decide whether to extend shifts or call additional agents. Calling too early wastes money if volume drops; calling too late means understaffing during the spike.

This is a variant of the ski-rental problem with multiple rental periods. The optimal strategy involves thresholds based on the ratio of overtime cost to understaffing cost:

  • If understaffing cost > 3× overtime cost: call overtime aggressively (low regret from false alarms)
  • If understaffing cost ≈ overtime cost: use the randomized ski-rental strategy (commit with probability increasing over time)
  • Monitor the "information value" — if additional data arriving in the next 30 minutes significantly reduces uncertainty, waiting is worth the risk

Dynamic Routing Adjustments

Real-time routing is an online matching problem. As calls arrive, the router assigns each to an available agent with the required skill. The question: should the router assign greedily (give each call the best available agent) or strategically (reserve top agents for potentially harder future calls)?

Greedy routing: Assigns the highest-priority agent for each call. Competitive ratio can degrade to O(n) in adversarial sequences — the greedy approach can be forced into terrible assignments for later calls.

Balance-based routing: Assigns calls to equalize utilization across agents. Better competitive properties because it preserves option value — agents aren't burned out when needed for future demand.

Predictive routing with hedging: Uses demand forecasts but maintains a reserve margin. If the forecast predicts a complex-call spike at 3 PM, the router conserves specialist agents starting at 2:30 PM. The online algorithm uses the forecast as a "prediction" in the learning-augmented framework (Lykouris & Vassilvitskii, 2018) — trusting the forecast when it has been historically accurate, but hedging based on the competitive ratio when forecast confidence is low.

Callback Queue Management

Callback scheduling is a pure online problem: customers request callbacks throughout the day, and the system must schedule callback times without knowing future callback request volume.

Online approach: model available callback capacity as a resource that depletes over time. Use an online packing algorithm: accept callbacks for each time slot until capacity reaches a threshold, then defer to later slots. The threshold is set using competitive analysis — too aggressive means later slots are over-packed; too conservative means early slots are underutilized.

Worked Example

Problem: A center with 80 agents experiences a volume shortfall at 11 AM. Actual volume is running 20% below forecast. The system must decide whether to offer VTO to 10 agents for the 12-5 PM window.

Online analysis:

Remaining uncertainty: 6 hours remain. Historical data shows afternoon volume deviates from AM-adjusted forecast by up to ±15%.

Worst case (adversarial): Volume returns to forecast levels at 12 PM. Releasing 10 agents creates a 10-agent shortfall for 5 hours.

Cost of worst case: 10 agents × 5 hours × estimated understaffing cost ($50/agent-hour) = $2,500.

Cost of inaction: If volume stays low, 10 agents × 6 hours × VTO savings ($25/agent-hour) = $1,500 in potential savings foregone.

Ski-rental framing:

  • "Rent" (keep paying for surplus agents) = $25/agent-hour
  • "Buy" (offer VTO, incur risk of recall failure) = potential $50/agent-hour understaffing cost if volume spikes
  • Break-even: 25t=50(6t)t=4 hours

Strategy: Wait until 1 PM (2 more hours of observation). If volume remains 15%+ below forecast, offer VTO for the 2-5 PM window (3 hours, lower risk). If volume begins recovering, hold staff.

Competitive guarantee: This strategy has competitive ratio ≤ 2 against an adversary: in the worst case, you pay at most 2× the cost of the optimal decision made with perfect foresight.

Maturity Model Position

Level Description
Level 1 (Manual) All adjustments are reactive and unstructured; supervisors make gut-feel decisions
Level 2 (Developing) Threshold-based rules (e.g., "offer VTO when 20% over-staffed"); no formal analysis of threshold quality
Level 3 (Defined) Break-even analysis for overtime/VTO decisions; basic scenario planning for intraday adjustments
Level 4 (Quantitative) Online algorithms with competitive guarantees deployed; decisions incorporate remaining uncertainty quantification
Level 5 (Optimizing) Learning-augmented online algorithms that combine forecasts with worst-case hedging; continuous competitive ratio monitoring; adaptive threshold adjustment based on observed forecast accuracy

See Also

References

  • Borodin, A. & El-Yaniv, R. (1998). Online Computation and Competitive Analysis. Cambridge University Press.
  • Karp, R.M., Vazirani, U.V., & Vazirani, V.V. (1990). "An optimal algorithm for on-line bipartite matching." Proceedings of STOC 1990.
  • Lykouris, T. & Vassilvitskii, S. (2018). "Competitive caching with machine learned advice." Proceedings of ICML 2018.
  • Kleinberg, R. & Tardos, É. (2006). Algorithm Design. Pearson.
  • Buchbinder, N. & Naor, J. (2009). "The design of competitive online algorithms via a primal-dual approach." Foundations and Trends in Theoretical Computer Science, 3(2-3), 93–263.