WFM Data Models and Schemas
WFM Data Models and Schemas defines the canonical data entities, relationships, and schema design patterns that underpin workforce management systems. Every WFM capability — forecasting, scheduling, adherence, reporting — depends on a well-structured data layer. This page provides the engineering reference for building or integrating that layer.
Overview
A WFM system's data model must represent two fundamentally different concerns simultaneously: planning (what should happen) and reality (what actually happened). The schema must bridge these domains at every level — from individual agent state to aggregate queue performance — while maintaining the temporal grain needed for Erlang-based staffing calculations.
The entities described here appear in every WFM system regardless of vendor. Vendor-specific implementations vary in naming and normalization, but the conceptual model is stable across NICE, Verint, Genesys, Calabrio, and custom-built systems.
Canonical Entities
Agent
The central human entity. Every WFM operation ultimately maps to or from an agent record.
| Field | Type | Description |
|---|---|---|
| agent_id | UUID/string | Unique identifier (often maps to ACD agent ID) |
| employee_id | string | HRIS employee number (may differ from agent_id) |
| display_name | string | Name for wallboards and reports |
| skills | array<skill_id> | Routing skills the agent can handle |
| skill_proficiency | map<skill_id, float> | Proficiency weight per skill (0.0–1.0) |
| team_id | FK → Team | Organizational assignment |
| site_id | FK → Site | Physical or virtual location |
| hire_date | date | Used for tenure calculations and ramp curves |
| employment_type | enum | Full-time, part-time, contingent, GigCX |
| schedule_group_id | FK → ScheduleGroup | Shift pattern eligibility |
| timezone | string (IANA) | Agent's local timezone (critical for scheduling) |
| status | enum | Active, on-leave, terminated, in-training |
Design note: The agent entity must serve as a join key between ACD data (uses agent_id), HRIS data (uses employee_id), and scheduling data (uses either). Maintaining a reliable mapping between these identifiers is one of the hardest data quality problems in WFM.
Interval
The atomic time unit for WFM planning. Nearly all WFM math operates at the interval level.
| Field | Type | Description |
|---|---|---|
| interval_id | bigint/composite | Unique identifier (often date + time-slot index) |
| interval_start | timestamp(tz) | Start of interval in UTC |
| interval_end | timestamp(tz) | End of interval in UTC |
| interval_length_minutes | int | 15 or 30 (rarely 60) |
| day_of_week | int | 1–7 for pattern matching |
| is_holiday | boolean | Flag for special-day handling |
| fiscal_week | int | Business calendar alignment |
Why 15 minutes is standard: The Erlang C formula assumes arrivals follow a Poisson process with a stationary arrival rate within the interval. At 30 minutes, arrival rates often shift enough to violate stationarity — morning ramp-up, lunch dips. At 15 minutes, the stationarity assumption holds for most contact center patterns. Going below 15 minutes (e.g., 5-minute intervals) improves theoretical accuracy but creates practical problems: too many intervals to manage manually, most ACD platforms report at 15-minute minimum, and the staffing granularity exceeds what scheduling can actually deliver. The 15-minute interval is the equilibrium between mathematical precision and operational reality.
Queue
The logical grouping of work demand that agents serve.
| Field | Type | Description |
|---|---|---|
| queue_id | UUID/string | Maps to ACD queue/skill |
| queue_name | string | Display name |
| channel | enum | Voice, chat, email, messaging, back-office |
| service_level_target | float | Target SL (e.g., 0.80) |
| service_level_seconds | int | Target answer time (e.g., 20 seconds) |
| asa_target | float | Target average speed of answer |
| occupancy_ceiling | float | Max acceptable occupancy (burnout guard) |
| concurrency | int | Simultaneous contacts per agent (1 for voice, 2-4 for chat) |
| priority | int | Routing priority relative to other queues |
Forecast
A prediction of future demand for a specific queue and interval.
| Field | Type | Description |
|---|---|---|
| forecast_id | UUID | Unique forecast version identifier |
| queue_id | FK → Queue | What queue this predicts |
| interval_id | FK → Interval | What interval this covers |
| forecast_version | string | Version label (e.g., "2026-W20-final") |
| volume | float | Predicted contact count |
| aht_seconds | float | Predicted average handle time |
| volume_lower | float | Lower confidence bound (e.g., P10) |
| volume_upper | float | Upper confidence bound (e.g., P90) |
| confidence_level | float | Confidence interval width (e.g., 0.80) |
| method | string | Algorithm used (ARIMA, Prophet, ensemble) |
| created_at | timestamp(tz) | When this forecast was generated |
| is_active | boolean | Currently used for staffing |
Design note: Store multiple forecast versions. The ability to compare "forecast at publish time" vs "forecast at execution time" vs "actuals" is fundamental to forecast accuracy measurement. Never overwrite — always version.
Schedule
The planned assignment of agents to activities across intervals.
| Field | Type | Description |
|---|---|---|
| schedule_id | UUID | Schedule version identifier |
| agent_id | FK → Agent | Who is scheduled |
| schedule_date | date | Calendar date |
| shift_start | timestamp(tz) | Shift start time |
| shift_end | timestamp(tz) | Shift end time |
| activities | array<ScheduleActivity> | Ordered list of activities within the shift |
Each ScheduleActivity contains:
| Field | Type | Description |
|---|---|---|
| activity_type | enum | Queue-work, break, lunch, training, coaching, meeting, admin |
| start_time | timestamp(tz) | Activity start |
| end_time | timestamp(tz) | Activity end |
| queue_ids | array<queue_id> | Which queues (for queue-work activities) |
| is_paid | boolean | Paid vs unpaid (payroll integration) |
| is_adherence_tracked | boolean | Whether adherence monitoring applies |
Adherence Event
The comparison between planned schedule and actual agent behavior.
| Field | Type | Description |
|---|---|---|
| event_id | UUID | Unique event identifier |
| agent_id | FK → Agent | Which agent |
| timestamp | timestamp(tz) | When the state was observed |
| planned_state | enum | What the schedule says (queue-work, break, etc.) |
| actual_state | enum | What the ACD reports (available, on-call, after-call-work, aux, logged-off) |
| is_adherent | boolean | Computed: does actual match planned? |
| variance_seconds | int | How long the deviation lasted |
| exception_code | string | If approved (e.g., "manager-approved-late-break") |
Contact
An individual interaction between a customer and the contact center.
| Field | Type | Description |
|---|---|---|
| contact_id | UUID/string | Unique contact identifier from ACD |
| queue_id | FK → Queue | Queue that handled the contact |
| agent_id | FK → Agent | Agent who handled (null if abandoned) |
| interval_id | FK → Interval | Interval of arrival |
| channel | enum | Voice, chat, email, messaging |
| arrived_at | timestamp(tz) | When contact entered queue |
| answered_at | timestamp(tz) | When agent connected (null if abandoned) |
| completed_at | timestamp(tz) | When interaction ended |
| handle_time_seconds | int | Total handle time (talk + hold + ACW) |
| talk_time_seconds | int | Talk/active time only |
| hold_time_seconds | int | Hold time |
| acw_seconds | int | After-call work time |
| wait_time_seconds | int | Time in queue before answer |
| is_abandoned | boolean | Customer disconnected before answer |
| transfer_count | int | Number of transfers (critical for deduplication) |
| disposition_code | string | Outcome classification |
Entity Relationships
The core relationships form a hierarchy and a set of temporal associations:
Organizational hierarchy:
- Agent → Team → Site → Business Unit
- Agent → ScheduleGroup (shift pattern eligibility)
- Agent → Skill[] (many-to-many with proficiency)
Temporal associations:
- Schedule covers Agent × Date → ScheduleActivity[]
- Forecast predicts Queue × Interval → volume, AHT
- Contact arrives at Queue × Interval, handled by Agent
- AdherenceEvent compares Schedule(Agent, Interval) vs ActualState(Agent, Interval)
Key joins for analytics:
- Contact JOIN Agent ON agent_id → agent-level performance
- Contact JOIN Interval ON interval_id → interval-level volume
- Forecast JOIN Contact ON (queue_id, interval_id) → forecast accuracy
- Schedule JOIN AdherenceEvent ON (agent_id, interval_id) → adherence rate
Schema Design Patterns
Star Schema for Analytics (OLAP)
The dominant pattern for WFM reporting and historical analysis:
Fact tables:
- fact_contact — one row per contact: handle time, wait time, abandoned flag, disposition. Grain: individual contact.
- fact_adherence — one row per agent per interval: planned state, actual state, adherent flag, variance seconds. Grain: agent × interval.
- fact_staffing — one row per queue per interval: required staff, scheduled staff, actual staff, forecast volume, actual volume. Grain: queue × interval.
Dimension tables:
- dim_agent — agent attributes (team, site, skills, tenure, employment type). Slowly-changing dimension Type 2 for tracking team changes over time.
- dim_interval — interval attributes (date, time-of-day, day-of-week, holiday flag, fiscal period). Pre-populated for 2+ years forward.
- dim_queue — queue attributes (channel, service level target, concurrency, priority).
- dim_activity — schedule activity types (queue-work, break, training, etc.).
- dim_date — calendar dimension with business-calendar fields (fiscal week, pay period, holiday).
Example query — forecast accuracy by queue and week:
SELECT
dq.queue_name,
di.fiscal_week,
SUM(fs.forecast_volume) AS forecast_vol,
SUM(fs.actual_volume) AS actual_vol,
ABS(SUM(fs.forecast_volume) - SUM(fs.actual_volume))
/ NULLIF(SUM(fs.actual_volume), 0) AS mape
FROM fact_staffing fs
JOIN dim_queue dq ON fs.queue_key = dq.queue_key
JOIN dim_interval di ON fs.interval_key = di.interval_key
WHERE di.calendar_date BETWEEN '2026-01-01' AND '2026-03-31'
GROUP BY dq.queue_name, di.fiscal_week
ORDER BY dq.queue_name, di.fiscal_week;
Normalized Schema for Operations (OLTP)
The operational WFM database that powers real-time scheduling, adherence, and intraday management uses a normalized (3NF) design:
- Separate tables for agents, schedules, schedule_activities, forecasts, forecast_versions
- Foreign key constraints enforced to prevent orphaned records
- Indexes on (agent_id, schedule_date) and (queue_id, interval_start) for fast lookups
- Temporal columns stored in UTC with timezone metadata — never local time in the database
Denormalized for Real-Time
Real-time adherence and wallboard systems need sub-second reads. Common pattern: maintain a materialized current-state table updated by event stream:
agent_current_state: agent_id UUID PRIMARY KEY current_acd_state enum -- available, on-call, acw, aux, logged-off state_since timestamp -- when they entered this state planned_state enum -- what schedule says right now is_adherent boolean -- computed on write queue_id UUID -- current queue assignment updated_at timestamp -- last event processed
This table is overwritten on every state change (not appended). Historical adherence events go to the fact table; this table serves only the real-time view.
Historical Data Retention
| Data Type | Operational (hot) | Analytical (warm) | Archive (cold) |
|---|---|---|---|
| Contact detail | 90 days | 2–3 years (aggregated to interval) | 7+ years for regulated industries |
| Agent state events | 30 days | 1 year (aggregated to interval adherence) | Typically not archived |
| Schedules | Current + 8 weeks forward, 90 days back | 2 years (for pattern analysis) | Per retention policy |
| Forecasts | Current + all versions for active period | 2 years (for accuracy trending) | Compress to weekly aggregates |
| Interval aggregates | 2 years at 15-min grain | 5 years at daily grain | 7+ years at weekly grain |
Aggregation strategy: Roll up interval-level data to daily, weekly, and monthly aggregates on a scheduled basis. Keep the finest grain for the shortest period. Most WFM analytics beyond 6 months operate at daily or weekly grain anyway — storing 3 years of 15-minute data is expensive and rarely queried.
Data Quality: Common Issues
Missing Intervals
ACD platforms occasionally drop intervals during outages or maintenance windows. A missing interval looks like zero volume, which poisons forecasts. Detection: compare expected interval count (96 per day at 15-min grain) against actual count. Mitigation: interpolate from adjacent intervals or flag as excluded from forecast training data.
Timezone Errors
The single most common data quality issue in WFM. The ACD reports in UTC. The WFM system stores in site-local time. The HRIS uses corporate headquarters timezone. DST transitions create duplicate or missing hours twice per year. Prevention: store everything in UTC internally. Convert to local time only at the presentation layer. Carry the IANA timezone identifier with every agent and site record.
Duplicate Contacts
Transfers generate multiple contact records for what the customer experiences as a single interaction. A contact transferred twice appears as three records, inflating volume by 200%. Detection: join on transfer correlation ID (if available) or look for contacts from the same ANI within 60 seconds. Impact: overstated volume → overstated staffing requirements → wasted labor budget.
Orphaned Schedules
Agent terminated in HRIS but still has future schedules in WFM. Creates phantom staffing — the schedule shows adequate coverage, but nobody will work those shifts. Detection: daily reconciliation of active agents between HRIS and WFM schedule. Impact: understaffing on the day of execution.
AHT Outliers
A single 3-hour call in a 15-minute interval with 20 contacts can skew the interval AHT from 300 seconds to 1,200 seconds. Mitigation: cap AHT at a configurable percentile (P95 or P99) for forecasting purposes. Store the raw value but use the capped value in Erlang calculations.
Common Pitfalls
- Designing the schema around a single vendor's export format. Vendor schemas are optimized for their product, not for your analytics. Build a canonical model and map vendors into it.
- Storing local times in the database. This creates unsolvable problems during DST transitions and multi-site operations. UTC internally, always.
- Skipping the forecast version dimension. Without versioning, you cannot measure forecast accuracy over time or compare algorithm performance.
- Over-normalizing the real-time layer. Joins kill real-time performance. Denormalize aggressively for anything that needs sub-second response.
- Ignoring the agent identity problem. The mapping between ACD agent ID, HRIS employee ID, and WFM agent ID is fragile. Invest in a master identity service early.
Maturity Model Position
| Level | Characteristics |
|---|---|
| Foundational | Vendor-provided schema only. No canonical model. Data quality issues discovered ad hoc. No retention policy. |
| Progressive | Canonical entities defined. Star schema for analytics. Basic data quality checks. Retention policy documented. |
| Advanced | Event-sourced data layer. Automated quality monitoring. Schema versioned and tested. Multi-vendor canonical mapping. Real-time and analytical layers separated with clear sync patterns. |
See Also
- WFM Ecosystem Architecture
- WFM Technology Selection and Vendor Evaluation
- WFM Data Infrastructure and Integration Architecture
- Python for Workforce Management
- Erlang B
- Traffic Intensity and Server Utilization
References
- Mehrotra, V. (1997). "Ringing Up Big Business." OR/MS Today, 24(4). — Foundational work on interval-level staffing models.
- Gans, N., Koole, G., & Mandelbaum, A. (2003). "Telephone Call Centers: Tutorial, Review, and Research Prospects." Manufacturing & Service Operations Management, 5(2), 79–141.
- Kimball, R. & Ross, M. (2013). The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling. 3rd ed. Wiley. — Star schema design patterns.
- Cleveland, B. (2012). Call Center Management on Fast Forward. ICMI Press. — Interval planning and Erlang application.
