Machine Learning for Volume Forecasting

ML forecasting: multiple input features producing more accurate predictions.

Machine learning (ML) methods for contact volume forecasting encompass a family of data-driven approaches — including gradient boosting, recurrent neural networks, and additive regression models — that learn predictive patterns from historical data without requiring explicit specification of a statistical model structure. These methods have gained significant adoption in large-scale forecasting applications following competitive benchmarks, most notably the M4 and M5 forecasting competitions, which provided evidence about when ML methods outperform classical statistical approaches and when they do not. In workforce planning, ML-based volume forecasting is most valuable when contact volumes are influenced by a rich set of external predictors, when interaction effects between variables are complex, or when the number of time series to be forecast simultaneously is large enough that manual model-building is impractical. ML methods are associated with WFM Labs Maturity Model Levels 4 and 5, though the boundary between statistical and ML methods is increasingly blurred by hybrid approaches.

Gradient Boosting Methods

Overview

Gradient boosting constructs an ensemble of decision trees, each tree learning from the residuals of the previous ensemble, iteratively reducing forecast error. XGBoost (Extreme Gradient Boosting), introduced by Chen and Guestrin (2016), and its successors LightGBM and CatBoost became dominant algorithms in tabular data forecasting by combining strong predictive performance with computational efficiency.^[1]

Application to Contact Volume Forecasting

In contact center volume forecasting, gradient boosting is applied by converting the time-series forecasting problem into a supervised regression problem. Lagged volume values, calendar features (day of week, hour of day, week of year, holiday indicators), and external predictors (marketing send counts, website traffic, CRM pipeline activity) are constructed as features. The model learns the relationship between these features and future volume, producing a point forecast.

Key advantages in this context:

Naturally incorporates non-linear interactions between calendar effects and external drivers.
Handles missing values without imputation.
Scales efficiently to thousands of simultaneous time series (skill queues, channel-level forecasts).
Supports Hierarchical Forecasting applications where forecasts must be produced at multiple aggregation levels.

Key limitations:

Does not extrapolate outside the range of training data; structural breaks (new channels, major product changes) require retraining.
Feature engineering quality — particularly the construction of lag features and calendar encodings — has substantial influence on model performance.
Produces point forecasts by default; Probabilistic Forecasting requires additional techniques (quantile regression, conformal prediction).

Recurrent Neural Networks and LSTMs

Long Short-Term Memory networks (LSTMs) are a class of recurrent neural network designed to learn long-range temporal dependencies in sequential data. Unlike gradient boosting, LSTMs operate directly on the sequence structure of time series, learning temporal patterns through gated memory cells that can retain information across many time steps.

In contact volume forecasting, LSTMs are most applicable when:

Long-range seasonal dependencies (e.g., annual cycles with complex intraday patterns) are important and difficult to encode as tabular features.
Multiple related time series can be jointly modeled, allowing the network to learn shared patterns across skill queues or geographies.
The forecasting horizon is medium-term (days to weeks) rather than short-interval (next 15 minutes).

LSTMs require substantially more training data than gradient boosting or classical statistical methods, and are sensitive to hyperparameter tuning. The M4 competition findings (Makridakis et al., 2018) demonstrated that pure ML methods — including LSTMs — did not consistently outperform statistical methods on the full competition dataset, though hybrid approaches (statistical + ML) performed strongly.^[2]

Prophet

Prophet, developed by Taylor and Letham at Meta (Facebook) and released as an open-source library in 2018, is an additive regression model designed for business time series with strong seasonal patterns and irregular holidays.^[3]

Prophet decomposes a time series into trend, weekly seasonality, annual seasonality, and holiday effects, fitting each component with Fourier series or piecewise linear segments. Primary advantages for contact center forecasting:

Automatic change point detection — identifies structural changes in trend without manual specification.
Holiday and event modeling — named holidays and one-time events can be registered directly, preventing these observations from distorting seasonality estimation.
Analyst-friendly interface — parameters are interpretable and adjustable by domain experts without deep statistical training, aligning with the role of Judgmental Forecasting overlays in forecast governance.

Prophet's limitations include reduced performance on highly irregular time series, limited ability to incorporate complex external predictors compared to gradient boosting, and a tendency to underperform specialized statistical models on short time series with strong autocorrelation structure.

M4 and M5 Competition Findings

The M4 competition (2018, 100,000 time series across multiple domains and frequencies) produced several findings directly relevant to contact volume forecasting practice:

No single ML method dominated. Pure ML methods (neural networks, gradient boosting) did not systematically outperform classical statistical methods across the full dataset.
Hybrid methods won. The top-performing entries combined statistical base forecasts with ML-based error correction or ensemble weighting, achieving accuracy gains over either method alone.
Forecast Combination outperforms selection. Simple averages of multiple model forecasts consistently outperformed single best-model selection, consistent with the broader forecasting literature.

The M5 competition (2020, Walmart retail sales data) extended these findings to a hierarchical forecasting context and confirmed the strong performance of gradient boosting (specifically LightGBM) when external predictors and hierarchical structure are present.

When ML Beats Classical Methods — and When It Does Not

ML vs Classical Method Conditions
Condition	Favors ML	Favors Classical (ARIMA/ETS)
Number of time series	Many (>100)	Few (<20)
External predictors available	Yes (marketing, web traffic, CRM)	No or few
Historical data length	Long (2+ years, high frequency)	Short or sparse
Structural breaks present	With retraining	Intervention modeling
Interpretability required	Lower	Higher
Irregular holidays/events	Prophet, gradient boosting	Manual adjustment
Intermittent demand	Specialized methods preferred	Croston's method

Implementation Considerations

Feature Engineering

The quality of features engineered from raw time series and external data typically determines ML forecast quality more than algorithm selection. Standard feature sets for contact volume forecasting include:

Lagged volume values (t-1, t-7, t-14, t-28 for daily data)
Rolling means and standard deviations over multiple windows
Calendar indicators: day of week, hour of day, month, week of year
Holiday and event flags
Promotion/campaign flags from marketing systems
Prior-period service level (captures feedback loops between volume and deflection)

Retraining Cadence

ML models for contact volume must be retrained on a defined cadence to incorporate recent data and adapt to structural changes. Monthly or quarterly full retraining with weekly incremental updates is a common pattern.

Evaluation and Governance

ML forecasts should be evaluated against MAPE, WAPE, and Forecast Bias benchmarks using a proper holdout set, and compared to classical baselines (seasonal naive, ETS) before deployment. Production ML forecasting systems require monitoring for data drift — changes in the statistical properties of input features that signal model degradation.

Maturity Model Considerations

At Level 3 (Integrated), organizations may deploy Prophet or simple gradient boosting models for specific high-volume queues, typically as pilot implementations alongside existing statistical models.

At Level 4 (Optimized), ML-based forecasting is systematically deployed across the queue portfolio, with formal model comparison and governance. Hybrid approaches combining statistical and ML methods are used where evidence supports performance improvement.

At Level 5 (Adaptive), ML forecasting systems self-retrain on new data, detect their own performance degradation, and flag anomalies for human review. Forecasts cover human and AI agent capacity simultaneously as part of a unified workforce demand model.

Related Concepts

References

↑ Chen, T. & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794).
↑ Makridakis, S., Spiliotis, E. & Assimakopoulos, V. (2018). The M4 competition: Results, findings, conclusion and way forward. International Journal of Forecasting, 34(4), 802–808.
↑ Taylor, S.J. & Letham, B. (2018). Forecasting at scale. The American Statistician, 72(1), 37–45.

[1] Chen, T. & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794).

[2] Makridakis, S., Spiliotis, E. & Assimakopoulos, V. (2018). The M4 competition: Results, findings, conclusion and way forward. International Journal of Forecasting, 34(4), 802–808.

[3] Taylor, S.J. & Letham, B. (2018). Forecasting at scale. The American Statistician, 72(1), 37–45.

[1]

[2]

[3]

Anonymous

Search

Machine Learning for Volume Forecasting

Namespaces

More

Page actions

Contents

Gradient Boosting Methods

Overview

Application to Contact Volume Forecasting

Recurrent Neural Networks and LSTMs

Prophet

M4 and M5 Competition Findings

When ML Beats Classical Methods — and When It Does Not

Implementation Considerations

Feature Engineering

Retraining Cadence

Evaluation and Governance

Maturity Model Considerations

Related Concepts

References

Navigation

Navigation

Core WFM

Applied Science

Beyond Contact Centers

Strategy & Transformation

Signature Models

Community

Wiki tools

Wiki tools

Anonymous

Search

Machine Learning for Volume Forecasting

Gradient Boosting Methods

Overview

Application to Contact Volume Forecasting

Recurrent Neural Networks and LSTMs

Prophet

M4 and M5 Competition Findings

When ML Beats Classical Methods — and When It Does Not

Implementation Considerations

Feature Engineering

Retraining Cadence

Evaluation and Governance

Maturity Model Considerations

Related Concepts

References

Navigation

Wiki tools

Page tools

Categories