Business Continuity Planning for Contact Centers

From WFM Labs

Business Continuity Planning for Contact Centers is the discipline of ensuring that customer service operations can survive, adapt to, and recover from disruptive events — natural disasters, pandemics, technology failures, cyberattacks, and other crises. Contact centers are among the most operationally sensitive functions in any organization: they are the customer-facing voice of the enterprise, and their failure during a crisis is immediately visible, reputationally damaging, and often financially costly.

The COVID-19 pandemic in 2020 delivered a generation-defining lesson in BCP for the contact center industry. Operations that had business continuity plans — particularly those that had invested in work-from-home infrastructure — adapted in days. Operations without plans lost weeks of productivity, experienced severe service degradation, and in some cases never fully recovered their pre-crisis operating models.

BCP Framework

Business continuity planning follows a four-phase framework:

Prevention

Reducing the likelihood and potential impact of disruption before it occurs:

  • Geographic redundancy — operating from multiple sites so that a single-site event does not eliminate capacity. The minimum for a critical operation is two geographically separated sites.
  • Technology redundancy — redundant network paths, failover telephony (cloud-based telephony inherently provides this vs. on-premises PBX), backup data centers, and redundant WFM system access.
  • Work-from-home readiness — agents provisioned and tested for remote work before a crisis requires it. The infrastructure (VPN, laptops, headsets, softphones, WFM remote access) must exist before the emergency.
  • Cross-training — agents trained across multiple skill groups so that the remaining workforce can cover critical queues even with reduced headcount. See Cross-Training and Skill Mix Strategy.
  • Vendor diversification — not relying on a single telecom carrier, cloud provider, or technology vendor for all critical systems.

Preparedness

Plans, training, and resources positioned for rapid deployment:

  • BCP documentation — written plans covering each disaster scenario, with defined roles, decision authority, communication protocols, and activation triggers.
  • Emergency contact trees — current contact information for all employees, organized by role and escalation path. Must be maintained outside the primary communication systems (since those systems may be the ones that failed).
  • Crisis communication templates — pre-drafted customer communications (IVR messages, website banners, social media statements, email templates) for each scenario.
  • Regular testing — tabletop exercises and live drills (see Testing section below).
  • Insurance and contracts — business interruption insurance, SLA provisions with vendors for emergency response, mutual aid agreements with BPO partners.

Response

Actions taken during the disruptive event:

  • Activation decision — who declares a BCP event and triggers the plan. This must be a named role with alternates, not a committee.
  • Assessment — rapid assessment of the scope and expected duration of the disruption. Determines which BCP tier to activate.
  • Workforce mobilization — activating remote work, redirecting volume to surviving sites, deploying surge resources.
  • Customer communication — informing customers of service impact and expected recovery (IVR updates, website notices, proactive outreach).
  • Stakeholder communication — internal updates to executive leadership, affected business units, and regulatory bodies if applicable.
  • Prioritized service — when capacity is reduced, which contact types get service and which are deferred? This prioritization must be pre-defined, not improvised during the crisis.

Recovery

Returning to normal operations:

  • Phased restoration — bring capacity back online in priority order. Critical services first, then standard operations, then non-essential.
  • After-action review — what worked, what failed, what needs to change in the plan. This is the most important phase and the one most often skipped.
  • Backlog management — the crisis likely created a contact backlog (deferred contacts, unresolved issues, follow-ups). The WFM function must plan for the post-crisis surge.
  • Employee recovery — the crisis may have affected employees personally (especially natural disasters and pandemics). Support services, schedule flexibility, and compassion are not optional.

Disaster Scenarios

Natural Disaster

Hurricanes, earthquakes, floods, wildfires, ice storms, and tornadoes can render a contact center site uninhabitable. The impact varies from temporary (power outage resolved in hours) to permanent (facility destroyed).

WFM response:

  • Redirect volume to unaffected sites via ACD routing changes
  • Activate work-from-home for agents in unaffected areas who normally work on-site
  • Deploy reduced-service IVR with self-service options
  • Reduce service level targets temporarily to maximize coverage of available staff
  • Re-forecast for modified demand (natural disasters often produce demand surges as affected customers call in)

Pandemic

COVID-19 demonstrated the pandemic scenario. The unique challenge: a pandemic affects all sites simultaneously and persists for months, not days.

WFM response:

  • Rapid work-from-home migration for the entire workforce
  • Absenteeism surge management (illness, childcare, mental health)
  • Re-forecasting for radically changed demand patterns (volume may spike, shift between channels, or change composition)
  • Extended operation under degraded conditions (sustained higher AHT, sustained higher absenteeism)
  • Schedule redesign for new working conditions (home environment constraints)

Key COVID lesson: operations that had tested work-from-home capabilities adapted in 3-5 days. Operations that had to build the capability from scratch required 3-6 weeks and experienced severe service degradation during the transition.

Technology Failure

ACD failure, WFM system outage, CRM downtime, network outage, or cloud provider failure.

WFM response:

  • Activate backup telephony (cloud failover, PSTN backup)
  • Manual schedule management if WFM system is unavailable (pre-printed emergency schedules, email-based shift communication)
  • Simplified IVR with reduced routing complexity
  • Extended handle times expected (agents working without full system support)

Cyberattack

Ransomware, DDoS, data breach, or targeted attack on contact center systems.

WFM response:

  • Isolation of affected systems (may mean intentional shutdown of compromised platforms)
  • Paper-based or phone-based manual operations if digital systems are compromised
  • See Contact Center Security for detailed incident response
  • Customer communication about potential data exposure
  • Regulatory notification within required timeframes (GDPR: 72 hours; state breach notification laws vary)

Crisis Staffing Models

When capacity is reduced, the operation must choose a staffing posture:

Skeleton Crew

Minimum viable staffing — enough agents to handle the most critical contact types (emergencies, safety, financial urgency). All other contact types are deferred to IVR, callback queue, or web/email.

Triggers: severe facility damage, mass absenteeism (>40%), complete technology failure requiring manual operations.

Essential Services Only

Moderate staffing focused on the top 2-3 contact types by business criticality. Non-essential contact types are deflected with messaging about expected return to normal service.

Triggers: moderate absenteeism (20-40%), partial technology failure, single-site outage with multi-site operation.

Surge Capacity

The opposite scenario: a crisis drives demand up (natural disaster affecting customers, product recall, service outage). Staffing must scale beyond normal.

Surge levers:

  • Overtime for existing agents (voluntary first, mandatory if necessary)
  • Cross-trained agents activated on affected queues
  • BPO partner surge agreements (pre-contracted capacity that can be activated within 24-48 hours)
  • Callback scheduling to smooth the demand peak
  • Self-service deflection for contacts that can be resolved without an agent

WFM Role in BCP

The WFM function plays a central role in crisis response:

Rapid Re-Forecasting

During a crisis, normal forecasting models break. Historical patterns are irrelevant when the operating environment has fundamentally changed. WFM must:

  • Abandon automated forecasts temporarily and switch to judgmental methods
  • Build short-horizon forecasts (hours and days, not weeks) based on real-time observation
  • Update continuously as the situation evolves
  • Model multiple scenarios (optimistic recovery, baseline, pessimistic) to bound the staffing requirement

Emergency Schedule Generation

Normal optimization constraints may be relaxed during a crisis:

  • Shift length limits may be temporarily extended
  • Break scheduling may be simplified
  • Skill routing may be collapsed to fewer, broader skill groups
  • Schedule changes may be deployed with shortened notification periods (with appropriate employee communication)

The WFM system must be capable of generating emergency schedules rapidly — within hours, not the normal multi-day planning cycle.

Cross-Trained Agent Activation

The cross-training strategy pays its highest dividend during a crisis. Agents trained in multiple skill groups can be redeployed to critical queues when the normal staffing plan is disrupted. The value of the cross-training investment is the option value of flexibility during disruption.

Communication

The WFM function is often the communication hub during a crisis because it has the most current information about:

  • Who is scheduled and available
  • What the current service level is
  • What the forecast demand looks like
  • Where the staffing gaps are

This operational intelligence must be communicated to crisis leadership in real time.

Testing and Drills

A BCP that has not been tested is a document, not a plan.

Tabletop Exercises

Scenario-based discussion exercises where the crisis team walks through a scenario step by step. No actual system changes — the team describes what they would do. Identifies gaps in the plan without operational risk.

Frequency: quarterly for critical operations; semi-annually for others.

Functional Drills

Specific components tested in isolation:

  • Failover drill: Switch telephony to backup and verify call routing.
  • Work-from-home drill: Send a team home to work remotely for a shift and verify all systems function.
  • Manual operations drill: Run for one hour without the WFM system or CRM, using backup procedures.
  • Communication drill: Activate the emergency contact tree and measure response time.

Frequency: each component at least annually.

Full-Scale Exercise

A simulated crisis that activates the full BCP, including workforce mobilization, technology failover, and crisis communication. The gold standard but operationally expensive.

Frequency: annually for critical operations.

Maturity Model Position

In the WFM Labs Maturity Model™:

  • Level 1 — Initial organizations have no documented BCP for the contact center. Crisis response is improvised. "We'll figure it out" is the plan.
  • Level 2 — Foundational organizations have documented BCPs with defined scenarios and response procedures. Emergency contact lists are maintained. Technology redundancy exists (cloud telephony, backup network). Testing is ad hoc.
  • Level 3 — Progressive organizations test their BCPs regularly (quarterly tabletops, annual drills). Work-from-home infrastructure is pre-deployed and tested. Cross-training supports crisis flexibility. The WFM function has defined rapid-reforecast and emergency schedule procedures. Surge agreements with BPO partners are in place.
  • Level 4 — Advanced organizations treat BCP as an integrated operating discipline, not a separate plan. Real-time dashboards monitor resilience indicators (single-site concentration risk, cross-training coverage, remote-ready percentage). Automated failover handles routine technology disruptions without human intervention. Post-crisis after-action reviews produce documented improvements.
  • Level 5 — Pioneering organizations operate in a permanently resilient posture — distributed workforce, cloud-native technology, continuous cross-training, and adaptive scheduling that responds to disruption in real time without a formal "BCP activation." The distinction between normal operations and crisis operations has collapsed because the operating model is inherently resilient.

See Also

References

  • Business Continuity Institute. Good Practice Guidelines 2018 Edition. BCI, 2018.
  • Cleveland, B. Call Center Management on Fast Forward (4th ed.). ICMI Press, 2019.
  • Deloitte. COVID-19: Managing the Customer Service Impact. 2020.
  • ICMI. Lessons from the Pandemic: Contact Center Business Continuity. Research report, 2021.
  • ISO 22301:2019. Security and resilience — Business continuity management systems — Requirements.