WFM Data Models and Schemas

From WFM Labs

WFM Data Models and Schemas defines the canonical data entities, relationships, and schema design patterns that underpin workforce management systems. Every WFM capability — forecasting, scheduling, adherence, reporting — depends on a well-structured data layer. This page provides the engineering reference for building or integrating that layer.

Overview

A WFM system's data model must represent two fundamentally different concerns simultaneously: planning (what should happen) and reality (what actually happened). The schema must bridge these domains at every level — from individual agent state to aggregate queue performance — while maintaining the temporal grain needed for Erlang-based staffing calculations.

The entities described here appear in every WFM system regardless of vendor. Vendor-specific implementations vary in naming and normalization, but the conceptual model is stable across NICE, Verint, Genesys, Calabrio, and custom-built systems.

Canonical Entities

Agent

The central human entity. Every WFM operation ultimately maps to or from an agent record.

Field Type Description
agent_id UUID/string Unique identifier (often maps to ACD agent ID)
employee_id string HRIS employee number (may differ from agent_id)
display_name string Name for wallboards and reports
skills array<skill_id> Routing skills the agent can handle
skill_proficiency map<skill_id, float> Proficiency weight per skill (0.0–1.0)
team_id FK → Team Organizational assignment
site_id FK → Site Physical or virtual location
hire_date date Used for tenure calculations and ramp curves
employment_type enum Full-time, part-time, contingent, GigCX
schedule_group_id FK → ScheduleGroup Shift pattern eligibility
timezone string (IANA) Agent's local timezone (critical for scheduling)
status enum Active, on-leave, terminated, in-training

Design note: The agent entity must serve as a join key between ACD data (uses agent_id), HRIS data (uses employee_id), and scheduling data (uses either). Maintaining a reliable mapping between these identifiers is one of the hardest data quality problems in WFM.

Interval

The atomic time unit for WFM planning. Nearly all WFM math operates at the interval level.

Field Type Description
interval_id bigint/composite Unique identifier (often date + time-slot index)
interval_start timestamp(tz) Start of interval in UTC
interval_end timestamp(tz) End of interval in UTC
interval_length_minutes int 15 or 30 (rarely 60)
day_of_week int 1–7 for pattern matching
is_holiday boolean Flag for special-day handling
fiscal_week int Business calendar alignment

Why 15 minutes is standard: The Erlang C formula assumes arrivals follow a Poisson process with a stationary arrival rate within the interval. At 30 minutes, arrival rates often shift enough to violate stationarity — morning ramp-up, lunch dips. At 15 minutes, the stationarity assumption holds for most contact center patterns. Going below 15 minutes (e.g., 5-minute intervals) improves theoretical accuracy but creates practical problems: too many intervals to manage manually, most ACD platforms report at 15-minute minimum, and the staffing granularity exceeds what scheduling can actually deliver. The 15-minute interval is the equilibrium between mathematical precision and operational reality.

Queue

The logical grouping of work demand that agents serve.

Field Type Description
queue_id UUID/string Maps to ACD queue/skill
queue_name string Display name
channel enum Voice, chat, email, messaging, back-office
service_level_target float Target SL (e.g., 0.80)
service_level_seconds int Target answer time (e.g., 20 seconds)
asa_target float Target average speed of answer
occupancy_ceiling float Max acceptable occupancy (burnout guard)
concurrency int Simultaneous contacts per agent (1 for voice, 2-4 for chat)
priority int Routing priority relative to other queues

Forecast

A prediction of future demand for a specific queue and interval.

Field Type Description
forecast_id UUID Unique forecast version identifier
queue_id FK → Queue What queue this predicts
interval_id FK → Interval What interval this covers
forecast_version string Version label (e.g., "2026-W20-final")
volume float Predicted contact count
aht_seconds float Predicted average handle time
volume_lower float Lower confidence bound (e.g., P10)
volume_upper float Upper confidence bound (e.g., P90)
confidence_level float Confidence interval width (e.g., 0.80)
method string Algorithm used (ARIMA, Prophet, ensemble)
created_at timestamp(tz) When this forecast was generated
is_active boolean Currently used for staffing

Design note: Store multiple forecast versions. The ability to compare "forecast at publish time" vs "forecast at execution time" vs "actuals" is fundamental to forecast accuracy measurement. Never overwrite — always version.

Schedule

The planned assignment of agents to activities across intervals.

Field Type Description
schedule_id UUID Schedule version identifier
agent_id FK → Agent Who is scheduled
schedule_date date Calendar date
shift_start timestamp(tz) Shift start time
shift_end timestamp(tz) Shift end time
activities array<ScheduleActivity> Ordered list of activities within the shift

Each ScheduleActivity contains:

Field Type Description
activity_type enum Queue-work, break, lunch, training, coaching, meeting, admin
start_time timestamp(tz) Activity start
end_time timestamp(tz) Activity end
queue_ids array<queue_id> Which queues (for queue-work activities)
is_paid boolean Paid vs unpaid (payroll integration)
is_adherence_tracked boolean Whether adherence monitoring applies

Adherence Event

The comparison between planned schedule and actual agent behavior.

Field Type Description
event_id UUID Unique event identifier
agent_id FK → Agent Which agent
timestamp timestamp(tz) When the state was observed
planned_state enum What the schedule says (queue-work, break, etc.)
actual_state enum What the ACD reports (available, on-call, after-call-work, aux, logged-off)
is_adherent boolean Computed: does actual match planned?
variance_seconds int How long the deviation lasted
exception_code string If approved (e.g., "manager-approved-late-break")

Contact

An individual interaction between a customer and the contact center.

Field Type Description
contact_id UUID/string Unique contact identifier from ACD
queue_id FK → Queue Queue that handled the contact
agent_id FK → Agent Agent who handled (null if abandoned)
interval_id FK → Interval Interval of arrival
channel enum Voice, chat, email, messaging
arrived_at timestamp(tz) When contact entered queue
answered_at timestamp(tz) When agent connected (null if abandoned)
completed_at timestamp(tz) When interaction ended
handle_time_seconds int Total handle time (talk + hold + ACW)
talk_time_seconds int Talk/active time only
hold_time_seconds int Hold time
acw_seconds int After-call work time
wait_time_seconds int Time in queue before answer
is_abandoned boolean Customer disconnected before answer
transfer_count int Number of transfers (critical for deduplication)
disposition_code string Outcome classification

Entity Relationships

The core relationships form a hierarchy and a set of temporal associations:

Organizational hierarchy:

  • Agent → Team → Site → Business Unit
  • Agent → ScheduleGroup (shift pattern eligibility)
  • Agent → Skill[] (many-to-many with proficiency)

Temporal associations:

  • Schedule covers Agent × Date → ScheduleActivity[]
  • Forecast predicts Queue × Interval → volume, AHT
  • Contact arrives at Queue × Interval, handled by Agent
  • AdherenceEvent compares Schedule(Agent, Interval) vs ActualState(Agent, Interval)

Key joins for analytics:

  • Contact JOIN Agent ON agent_id → agent-level performance
  • Contact JOIN Interval ON interval_id → interval-level volume
  • Forecast JOIN Contact ON (queue_id, interval_id) → forecast accuracy
  • Schedule JOIN AdherenceEvent ON (agent_id, interval_id) → adherence rate

Schema Design Patterns

Star Schema for Analytics (OLAP)

The dominant pattern for WFM reporting and historical analysis:

Fact tables:

  • fact_contact — one row per contact: handle time, wait time, abandoned flag, disposition. Grain: individual contact.
  • fact_adherence — one row per agent per interval: planned state, actual state, adherent flag, variance seconds. Grain: agent × interval.
  • fact_staffing — one row per queue per interval: required staff, scheduled staff, actual staff, forecast volume, actual volume. Grain: queue × interval.

Dimension tables:

  • dim_agent — agent attributes (team, site, skills, tenure, employment type). Slowly-changing dimension Type 2 for tracking team changes over time.
  • dim_interval — interval attributes (date, time-of-day, day-of-week, holiday flag, fiscal period). Pre-populated for 2+ years forward.
  • dim_queue — queue attributes (channel, service level target, concurrency, priority).
  • dim_activity — schedule activity types (queue-work, break, training, etc.).
  • dim_date — calendar dimension with business-calendar fields (fiscal week, pay period, holiday).

Example query — forecast accuracy by queue and week:

SELECT
  dq.queue_name,
  di.fiscal_week,
  SUM(fs.forecast_volume) AS forecast_vol,
  SUM(fs.actual_volume)   AS actual_vol,
  ABS(SUM(fs.forecast_volume) - SUM(fs.actual_volume))
    / NULLIF(SUM(fs.actual_volume), 0) AS mape
FROM fact_staffing fs
JOIN dim_queue dq ON fs.queue_key = dq.queue_key
JOIN dim_interval di ON fs.interval_key = di.interval_key
WHERE di.calendar_date BETWEEN '2026-01-01' AND '2026-03-31'
GROUP BY dq.queue_name, di.fiscal_week
ORDER BY dq.queue_name, di.fiscal_week;

Normalized Schema for Operations (OLTP)

The operational WFM database that powers real-time scheduling, adherence, and intraday management uses a normalized (3NF) design:

  • Separate tables for agents, schedules, schedule_activities, forecasts, forecast_versions
  • Foreign key constraints enforced to prevent orphaned records
  • Indexes on (agent_id, schedule_date) and (queue_id, interval_start) for fast lookups
  • Temporal columns stored in UTC with timezone metadata — never local time in the database

Denormalized for Real-Time

Real-time adherence and wallboard systems need sub-second reads. Common pattern: maintain a materialized current-state table updated by event stream:

agent_current_state:
  agent_id        UUID PRIMARY KEY
  current_acd_state   enum        -- available, on-call, acw, aux, logged-off
  state_since         timestamp   -- when they entered this state
  planned_state       enum        -- what schedule says right now
  is_adherent         boolean     -- computed on write
  queue_id            UUID        -- current queue assignment
  updated_at          timestamp   -- last event processed

This table is overwritten on every state change (not appended). Historical adherence events go to the fact table; this table serves only the real-time view.

Historical Data Retention

Data Type Operational (hot) Analytical (warm) Archive (cold)
Contact detail 90 days 2–3 years (aggregated to interval) 7+ years for regulated industries
Agent state events 30 days 1 year (aggregated to interval adherence) Typically not archived
Schedules Current + 8 weeks forward, 90 days back 2 years (for pattern analysis) Per retention policy
Forecasts Current + all versions for active period 2 years (for accuracy trending) Compress to weekly aggregates
Interval aggregates 2 years at 15-min grain 5 years at daily grain 7+ years at weekly grain

Aggregation strategy: Roll up interval-level data to daily, weekly, and monthly aggregates on a scheduled basis. Keep the finest grain for the shortest period. Most WFM analytics beyond 6 months operate at daily or weekly grain anyway — storing 3 years of 15-minute data is expensive and rarely queried.

Data Quality: Common Issues

Missing Intervals

ACD platforms occasionally drop intervals during outages or maintenance windows. A missing interval looks like zero volume, which poisons forecasts. Detection: compare expected interval count (96 per day at 15-min grain) against actual count. Mitigation: interpolate from adjacent intervals or flag as excluded from forecast training data.

Timezone Errors

The single most common data quality issue in WFM. The ACD reports in UTC. The WFM system stores in site-local time. The HRIS uses corporate headquarters timezone. DST transitions create duplicate or missing hours twice per year. Prevention: store everything in UTC internally. Convert to local time only at the presentation layer. Carry the IANA timezone identifier with every agent and site record.

Duplicate Contacts

Transfers generate multiple contact records for what the customer experiences as a single interaction. A contact transferred twice appears as three records, inflating volume by 200%. Detection: join on transfer correlation ID (if available) or look for contacts from the same ANI within 60 seconds. Impact: overstated volume → overstated staffing requirements → wasted labor budget.

Orphaned Schedules

Agent terminated in HRIS but still has future schedules in WFM. Creates phantom staffing — the schedule shows adequate coverage, but nobody will work those shifts. Detection: daily reconciliation of active agents between HRIS and WFM schedule. Impact: understaffing on the day of execution.

AHT Outliers

A single 3-hour call in a 15-minute interval with 20 contacts can skew the interval AHT from 300 seconds to 1,200 seconds. Mitigation: cap AHT at a configurable percentile (P95 or P99) for forecasting purposes. Store the raw value but use the capped value in Erlang calculations.

Common Pitfalls

  • Designing the schema around a single vendor's export format. Vendor schemas are optimized for their product, not for your analytics. Build a canonical model and map vendors into it.
  • Storing local times in the database. This creates unsolvable problems during DST transitions and multi-site operations. UTC internally, always.
  • Skipping the forecast version dimension. Without versioning, you cannot measure forecast accuracy over time or compare algorithm performance.
  • Over-normalizing the real-time layer. Joins kill real-time performance. Denormalize aggressively for anything that needs sub-second response.
  • Ignoring the agent identity problem. The mapping between ACD agent ID, HRIS employee ID, and WFM agent ID is fragile. Invest in a master identity service early.

Maturity Model Position

Level Characteristics
Foundational Vendor-provided schema only. No canonical model. Data quality issues discovered ad hoc. No retention policy.
Progressive Canonical entities defined. Star schema for analytics. Basic data quality checks. Retention policy documented.
Advanced Event-sourced data layer. Automated quality monitoring. Schema versioned and tested. Multi-vendor canonical mapping. Real-time and analytical layers separated with clear sync patterns.

See Also

References

  • Mehrotra, V. (1997). "Ringing Up Big Business." OR/MS Today, 24(4). — Foundational work on interval-level staffing models.
  • Gans, N., Koole, G., & Mandelbaum, A. (2003). "Telephone Call Centers: Tutorial, Review, and Research Prospects." Manufacturing & Service Operations Management, 5(2), 79–141.
  • Kimball, R. & Ross, M. (2013). The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling. 3rd ed. Wiley. — Star schema design patterns.
  • Cleveland, B. (2012). Call Center Management on Fast Forward. ICMI Press. — Interval planning and Erlang application.