WFM Data Models and Schemas

WFM Data Models and Schemas defines the canonical data entities, relationships, and schema design patterns that underpin workforce management systems. Every WFM capability — forecasting, scheduling, adherence, reporting — depends on a well-structured data layer. This page provides the engineering reference for building or integrating that layer.

Overview

A WFM system's data model must represent two fundamentally different concerns simultaneously: planning (what should happen) and reality (what actually happened). The schema must bridge these domains at every level — from individual agent state to aggregate queue performance — while maintaining the temporal grain needed for Erlang-based staffing calculations.

The entities described here appear in every WFM system regardless of vendor. Vendor-specific implementations vary in naming and normalization, but the conceptual model is stable across NICE, Verint, Genesys, Calabrio, and custom-built systems.

Canonical Entities

Agent

The central human entity. Every WFM operation ultimately maps to or from an agent record.

Field	Type	Description
agent_id	UUID/string	Unique identifier (often maps to ACD agent ID)
employee_id	string	HRIS employee number (may differ from agent_id)
display_name	string	Name for wallboards and reports
skills	array<skill_id>	Routing skills the agent can handle
skill_proficiency	map<skill_id, float>	Proficiency weight per skill (0.0–1.0)
team_id	FK → Team	Organizational assignment
site_id	FK → Site	Physical or virtual location
hire_date	date	Used for tenure calculations and ramp curves
employment_type	enum	Full-time, part-time, contingent, GigCX
schedule_group_id	FK → ScheduleGroup	Shift pattern eligibility
timezone	string (IANA)	Agent's local timezone (critical for scheduling)
status	enum	Active, on-leave, terminated, in-training

Design note: The agent entity must serve as a join key between ACD data (uses agent_id), HRIS data (uses employee_id), and scheduling data (uses either). Maintaining a reliable mapping between these identifiers is one of the hardest data quality problems in WFM.

Interval

The atomic time unit for WFM planning. Nearly all WFM math operates at the interval level.

Field	Type	Description
interval_id	bigint/composite	Unique identifier (often date + time-slot index)
interval_start	timestamp(tz)	Start of interval in UTC
interval_end	timestamp(tz)	End of interval in UTC
interval_length_minutes	int	15 or 30 (rarely 60)
day_of_week	int	1–7 for pattern matching
is_holiday	boolean	Flag for special-day handling
fiscal_week	int	Business calendar alignment

Why 15 minutes is standard: The Erlang C formula assumes arrivals follow a Poisson process with a stationary arrival rate within the interval. At 30 minutes, arrival rates often shift enough to violate stationarity — morning ramp-up, lunch dips. At 15 minutes, the stationarity assumption holds for most contact center patterns. Going below 15 minutes (e.g., 5-minute intervals) improves theoretical accuracy but creates practical problems: too many intervals to manage manually, most ACD platforms report at 15-minute minimum, and the staffing granularity exceeds what scheduling can actually deliver. The 15-minute interval is the equilibrium between mathematical precision and operational reality.

Queue

The logical grouping of work demand that agents serve.

Field	Type	Description
queue_id	UUID/string	Maps to ACD queue/skill
queue_name	string	Display name
channel	enum	Voice, chat, email, messaging, back-office
service_level_target	float	Target SL (e.g., 0.80)
service_level_seconds	int	Target answer time (e.g., 20 seconds)
asa_target	float	Target average speed of answer
occupancy_ceiling	float	Max acceptable occupancy (burnout guard)
concurrency	int	Simultaneous contacts per agent (1 for voice, 2-4 for chat)
priority	int	Routing priority relative to other queues

Forecast

A prediction of future demand for a specific queue and interval.

Field	Type	Description
forecast_id	UUID	Unique forecast version identifier
queue_id	FK → Queue	What queue this predicts
interval_id	FK → Interval	What interval this covers
forecast_version	string	Version label (e.g., "2026-W20-final")
volume	float	Predicted contact count
aht_seconds	float	Predicted average handle time
volume_lower	float	Lower confidence bound (e.g., P10)
volume_upper	float	Upper confidence bound (e.g., P90)
confidence_level	float	Confidence interval width (e.g., 0.80)
method	string	Algorithm used (ARIMA, Prophet, ensemble)
created_at	timestamp(tz)	When this forecast was generated
is_active	boolean	Currently used for staffing

Design note: Store multiple forecast versions. The ability to compare "forecast at publish time" vs "forecast at execution time" vs "actuals" is fundamental to forecast accuracy measurement. Never overwrite — always version.

Schedule

The planned assignment of agents to activities across intervals.

Field	Type	Description
schedule_id	UUID	Schedule version identifier
agent_id	FK → Agent	Who is scheduled
schedule_date	date	Calendar date
shift_start	timestamp(tz)	Shift start time
shift_end	timestamp(tz)	Shift end time
activities	array<ScheduleActivity>	Ordered list of activities within the shift

Each ScheduleActivity contains:

Field	Type	Description
activity_type	enum	Queue-work, break, lunch, training, coaching, meeting, admin
start_time	timestamp(tz)	Activity start
end_time	timestamp(tz)	Activity end
queue_ids	array<queue_id>	Which queues (for queue-work activities)
is_paid	boolean	Paid vs unpaid (payroll integration)
is_adherence_tracked	boolean	Whether adherence monitoring applies

Adherence Event

The comparison between planned schedule and actual agent behavior.

Field	Type	Description
event_id	UUID	Unique event identifier
agent_id	FK → Agent	Which agent
timestamp	timestamp(tz)	When the state was observed
planned_state	enum	What the schedule says (queue-work, break, etc.)
actual_state	enum	What the ACD reports (available, on-call, after-call-work, aux, logged-off)
is_adherent	boolean	Computed: does actual match planned?
variance_seconds	int	How long the deviation lasted
exception_code	string	If approved (e.g., "manager-approved-late-break")

Contact

An individual interaction between a customer and the contact center.

Field	Type	Description
contact_id	UUID/string	Unique contact identifier from ACD
queue_id	FK → Queue	Queue that handled the contact
agent_id	FK → Agent	Agent who handled (null if abandoned)
interval_id	FK → Interval	Interval of arrival
channel	enum	Voice, chat, email, messaging
arrived_at	timestamp(tz)	When contact entered queue
answered_at	timestamp(tz)	When agent connected (null if abandoned)
completed_at	timestamp(tz)	When interaction ended
handle_time_seconds	int	Total handle time (talk + hold + ACW)
talk_time_seconds	int	Talk/active time only
hold_time_seconds	int	Hold time
acw_seconds	int	After-call work time
wait_time_seconds	int	Time in queue before answer
is_abandoned	boolean	Customer disconnected before answer
transfer_count	int	Number of transfers (critical for deduplication)
disposition_code	string	Outcome classification

Entity Relationships

The core relationships form a hierarchy and a set of temporal associations:

Organizational hierarchy:

Agent → Team → Site → Business Unit
Agent → ScheduleGroup (shift pattern eligibility)
Agent → Skill[] (many-to-many with proficiency)

Temporal associations:

Schedule covers Agent × Date → ScheduleActivity[]
Forecast predicts Queue × Interval → volume, AHT
Contact arrives at Queue × Interval, handled by Agent
AdherenceEvent compares Schedule(Agent, Interval) vs ActualState(Agent, Interval)

Key joins for analytics:

Contact JOIN Agent ON agent_id → agent-level performance
Contact JOIN Interval ON interval_id → interval-level volume
Forecast JOIN Contact ON (queue_id, interval_id) → forecast accuracy
Schedule JOIN AdherenceEvent ON (agent_id, interval_id) → adherence rate

Schema Design Patterns

Star Schema for Analytics (OLAP)

The dominant pattern for WFM reporting and historical analysis:

Fact tables:

fact_contact — one row per contact: handle time, wait time, abandoned flag, disposition. Grain: individual contact.
fact_adherence — one row per agent per interval: planned state, actual state, adherent flag, variance seconds. Grain: agent × interval.
fact_staffing — one row per queue per interval: required staff, scheduled staff, actual staff, forecast volume, actual volume. Grain: queue × interval.

Dimension tables:

dim_agent — agent attributes (team, site, skills, tenure, employment type). Slowly-changing dimension Type 2 for tracking team changes over time.
dim_interval — interval attributes (date, time-of-day, day-of-week, holiday flag, fiscal period). Pre-populated for 2+ years forward.
dim_queue — queue attributes (channel, service level target, concurrency, priority).
dim_activity — schedule activity types (queue-work, break, training, etc.).
dim_date — calendar dimension with business-calendar fields (fiscal week, pay period, holiday).

Example query — forecast accuracy by queue and week:

SELECT
  dq.queue_name,
  di.fiscal_week,
  SUM(fs.forecast_volume) AS forecast_vol,
  SUM(fs.actual_volume)   AS actual_vol,
  ABS(SUM(fs.forecast_volume) - SUM(fs.actual_volume))
    / NULLIF(SUM(fs.actual_volume), 0) AS mape
FROM fact_staffing fs
JOIN dim_queue dq ON fs.queue_key = dq.queue_key
JOIN dim_interval di ON fs.interval_key = di.interval_key
WHERE di.calendar_date BETWEEN '2026-01-01' AND '2026-03-31'
GROUP BY dq.queue_name, di.fiscal_week
ORDER BY dq.queue_name, di.fiscal_week;

Normalized Schema for Operations (OLTP)

The operational WFM database that powers real-time scheduling, adherence, and intraday management uses a normalized (3NF) design:

Separate tables for agents, schedules, schedule_activities, forecasts, forecast_versions
Foreign key constraints enforced to prevent orphaned records
Indexes on (agent_id, schedule_date) and (queue_id, interval_start) for fast lookups
Temporal columns stored in UTC with timezone metadata — never local time in the database

Denormalized for Real-Time

Real-time adherence and wallboard systems need sub-second reads. Common pattern: maintain a materialized current-state table updated by event stream:

agent_current_state:
  agent_id        UUID PRIMARY KEY
  current_acd_state   enum        -- available, on-call, acw, aux, logged-off
  state_since         timestamp   -- when they entered this state
  planned_state       enum        -- what schedule says right now
  is_adherent         boolean     -- computed on write
  queue_id            UUID        -- current queue assignment
  updated_at          timestamp   -- last event processed

This table is overwritten on every state change (not appended). Historical adherence events go to the fact table; this table serves only the real-time view.

Historical Data Retention

Data Type	Operational (hot)	Analytical (warm)	Archive (cold)
Contact detail	90 days	2–3 years (aggregated to interval)	7+ years for regulated industries
Agent state events	30 days	1 year (aggregated to interval adherence)	Typically not archived
Schedules	Current + 8 weeks forward, 90 days back	2 years (for pattern analysis)	Per retention policy
Forecasts	Current + all versions for active period	2 years (for accuracy trending)	Compress to weekly aggregates
Interval aggregates	2 years at 15-min grain	5 years at daily grain	7+ years at weekly grain

Aggregation strategy: Roll up interval-level data to daily, weekly, and monthly aggregates on a scheduled basis. Keep the finest grain for the shortest period. Most WFM analytics beyond 6 months operate at daily or weekly grain anyway — storing 3 years of 15-minute data is expensive and rarely queried.

Data Quality: Common Issues

Missing Intervals

ACD platforms occasionally drop intervals during outages or maintenance windows. A missing interval looks like zero volume, which poisons forecasts. Detection: compare expected interval count (96 per day at 15-min grain) against actual count. Mitigation: interpolate from adjacent intervals or flag as excluded from forecast training data.

Timezone Errors

The single most common data quality issue in WFM. The ACD reports in UTC. The WFM system stores in site-local time. The HRIS uses corporate headquarters timezone. DST transitions create duplicate or missing hours twice per year. Prevention: store everything in UTC internally. Convert to local time only at the presentation layer. Carry the IANA timezone identifier with every agent and site record.

Duplicate Contacts

Transfers generate multiple contact records for what the customer experiences as a single interaction. A contact transferred twice appears as three records, inflating volume by 200%. Detection: join on transfer correlation ID (if available) or look for contacts from the same ANI within 60 seconds. Impact: overstated volume → overstated staffing requirements → wasted labor budget.

Orphaned Schedules

Agent terminated in HRIS but still has future schedules in WFM. Creates phantom staffing — the schedule shows adequate coverage, but nobody will work those shifts. Detection: daily reconciliation of active agents between HRIS and WFM schedule. Impact: understaffing on the day of execution.

AHT Outliers

A single 3-hour call in a 15-minute interval with 20 contacts can skew the interval AHT from 300 seconds to 1,200 seconds. Mitigation: cap AHT at a configurable percentile (P95 or P99) for forecasting purposes. Store the raw value but use the capped value in Erlang calculations.

Common Pitfalls

Designing the schema around a single vendor's export format. Vendor schemas are optimized for their product, not for your analytics. Build a canonical model and map vendors into it.
Storing local times in the database. This creates unsolvable problems during DST transitions and multi-site operations. UTC internally, always.
Skipping the forecast version dimension. Without versioning, you cannot measure forecast accuracy over time or compare algorithm performance.
Over-normalizing the real-time layer. Joins kill real-time performance. Denormalize aggressively for anything that needs sub-second response.
Ignoring the agent identity problem. The mapping between ACD agent ID, HRIS employee ID, and WFM agent ID is fragile. Invest in a master identity service early.

Maturity Model Position

Level	Characteristics
Foundational	Vendor-provided schema only. No canonical model. Data quality issues discovered ad hoc. No retention policy.
Progressive	Canonical entities defined. Star schema for analytics. Basic data quality checks. Retention policy documented.
Advanced	Event-sourced data layer. Automated quality monitoring. Schema versioned and tested. Multi-vendor canonical mapping. Real-time and analytical layers separated with clear sync patterns.

References

Mehrotra, V. (1997). "Ringing Up Big Business." OR/MS Today, 24(4). — Foundational work on interval-level staffing models.
Gans, N., Koole, G., & Mandelbaum, A. (2003). "Telephone Call Centers: Tutorial, Review, and Research Prospects." Manufacturing & Service Operations Management, 5(2), 79–141.
Kimball, R. & Ross, M. (2013). The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling. 3rd ed. Wiley. — Star schema design patterns.
Cleveland, B. (2012). Call Center Management on Fast Forward. ICMI Press. — Interval planning and Erlang application.

Anonymous

Search