WFM Data Governance and Quality

From WFM Labs

WFM Data Governance and Quality addresses the organizational and technical practices that ensure WFM systems operate on trustworthy data. Every WFM output — forecasts, schedules, adherence scores, staffing recommendations — is only as reliable as its input data. This page provides the framework for governing that data across the WFM ecosystem.

Overview

WFM is uniquely sensitive to data quality problems because its outputs are operationally binding. A forecast error becomes overstaffing or understaffing the next day. A schedule built on phantom agents creates real coverage gaps. An adherence score calculated from misaligned timestamps falsely flags compliant agents.

Unlike BI dashboards where a data error produces a wrong number on a screen, WFM data errors produce wrong decisions that affect labor costs, service levels, and employee experience. This makes data governance not a compliance exercise but an operational necessity.

Data Quality Dimensions for WFM

Completeness

Every expected data point must exist. Missing data in WFM creates silent failures — the system produces results that look valid but are built on partial information.

Data Element Completeness Check Impact of Missing Data
Interval volume 96 intervals per day per queue (15-min grain) Missing intervals appear as zero volume → forecast underestimates demand
Agent schedule Every active agent has a schedule for each scheduled day Unscheduled agents appear as unstaffed → coverage calculations undercount supply
Contact records CDR count matches ACD contact counter Missing contacts → understated volume → understated staffing
Agent skills Every agent has at least one skill assigned Skill-less agents are unroutable → phantom staffing
AHT components Talk + hold + ACW = total handle time Missing ACW → understated AHT → understated staffing

Detection pattern: Run daily completeness checks that compare expected record counts against actual counts. For intervals: expected = queues × 96 × days. For agents: expected = active_agents from HRIS. Alert on any deficit > 1%.

Accuracy

Data values must reflect reality.

Data Element Accuracy Risk Verification Method
Handle time ACW timer left running (agent forgot to close) → inflated AHT Cap at P99; flag contacts > 2× median AHT for review
Contact volume Transfers double-counted as separate contacts Reconcile with unique ANI/session count; compare ACD volume report vs CDR count
Agent state duration Clock drift between ACD and WFM servers NTP sync validation; compare state duration totals against shift length
Forecast vs actuals Forecast generated with wrong timezone → shifted by hours Compare forecast peak-hour to actual peak-hour; shift should be < 1 interval
Shrinkage rates Planned shrinkage includes categories that aren't actually shrinkage Audit shrinkage categories quarterly; validate against actual non-productive time

Timeliness

Data must arrive fast enough for its intended use.

Data Feed Required Freshness Common Failure
Agent state (adherence) < 2 seconds ACD event API down; webhook queue backing up; network latency
Queue metrics (wallboard) < 10 seconds Polling interval too long; API rate-limited
Contact records (intraday) < 5 minutes CDR generation delayed; batch job stuck
HRIS updates (scheduling) < 24 hours HRIS sync job failed silently; delta detection broken
Forecast refresh < 15 minutes for intraday Reforecast engine crashed; model training timed out

Monitoring pattern: Track data freshness for each feed. Maintain a last_successful_sync timestamp per integration. Alert when freshness exceeds the required threshold by 2×.

Consistency

The same entity must be represented the same way across all systems.

The double-count problem: A transferred contact appears as two contacts in the ACD CDR but as one contact in the CRM. If WFM uses ACD volume and CRM resolution rate, the metrics are inconsistent — 200 ACD contacts with 150 CRM resolutions looks like 75% resolution, but 50 of those ACD contacts are transfers, so the real resolution rate is 100%.

The identity problem: Agent "Jane Smith" is agent_id "a-4821" in the ACD, employee_id "E-10042" in HRIS, and user "jsmith@company.com" in the QA system. If the mapping table has a stale entry, Jane's QA scores don't appear on her WFM profile, and her schedule adherence can't be reconciled with her HRIS contracted hours.

The timezone problem: The ACD stores timestamps in UTC. The WFM system converts to site-local time. The HRIS uses corporate HQ timezone. During DST transitions, a 2:30 AM agent state event could map to two different intervals depending on which timezone is applied first.

Common Data Quality Issues

Timezone Mismatches

The most pervasive data quality issue in multi-site WFM operations.

Scenario: Site A is in Eastern Time (UTC-5). Site B is in Pacific Time (UTC-8). The ACD reports all timestamps in UTC. The WFM system's "9:00 AM" interval must map to different UTC ranges for each site.

Failure modes:

  • Forecast trained on UTC intervals without site-local alignment → peak hour is wrong
  • Schedule published in site-local time but adherence checked against UTC → every agent appears 3 hours out of adherence
  • DST transition: 2:00 AM event on "spring forward" day doesn't exist in local time; "fall back" creates duplicate 1:00–2:00 AM intervals

Prevention:

  • Store all timestamps in UTC with IANA timezone metadata
  • Convert to local time only at the presentation layer
  • Pre-compute interval boundaries in UTC for each site, accounting for DST
  • Test DST transitions explicitly in your integration test suite — twice per year

Double-Counted Transfers

Scenario: A customer calls billing (Queue A). The billing agent transfers to tech support (Queue B). The ACD generates two CDRs: one for Queue A (talk time = 45 seconds, transfer = true) and one for Queue B (talk time = 320 seconds, disposition = resolved).

Impact on WFM:

  • Volume inflated by 100% for the transferred contact (counted in both queues)
  • AHT for Queue A is understated (45 seconds of transfer-out vs a typical 300-second handle)
  • Staffing for Queue A is overstated (volume up, AHT down partially cancel, but not perfectly)

Remediation:

  • For volume: Count arrivals to Queue A. Separately count arrivals to Queue B. Do not sum them for "total contacts" without deduplicating transfers. Use the transfer correlation ID or session ID to identify linked contacts.
  • For AHT: Decide on a convention. Option 1: attribute full handle time to the originating queue (customer's perspective). Option 2: attribute handle time to each queue separately (agent's perspective). Document the convention and apply it consistently.

Missing After-Call Work

Scenario: Agents bypass the ACW state by going directly to "available" after a call. The ACD records zero ACW time.

Impact: AHT is understated by 30–60 seconds per contact (typical ACW duration). Over a day, this understatement reduces the Erlang-calculated staffing requirement by 5–10%, creating chronic understaffing.

Detection: Compare the percentage of contacts with zero ACW against a baseline. If historically 5% of contacts have zero ACW and suddenly 30% do, the agents' desktop configuration changed or agents found a workaround.

Prevention: Configure the ACD to enforce minimum ACW duration (even 5 seconds) so every contact generates an ACW record. If the business allows immediate availability, accept the AHT measurement gap and add a fixed ACW offset to forecasting inputs.

Phantom Agents

Scenario: Agent terminated in HRIS on Friday. WFM sync runs Monday. Agent has published schedules for the next two weeks. Schedule shows adequate coverage, but a phantom shift covers nobody.

Impact: Understaffing on the day of execution. The severity scales linearly — 5 phantom agents in a 100-agent center means 5% understaffing during their scheduled shifts.

Prevention:

  • Run HRIS-to-WFM reconciliation daily, not weekly
  • Flag any agent scheduled for future dates who is not "active" in HRIS
  • Automate: termination event triggers immediate schedule unpublish for the agent

Data Lineage

Tracing a Forecast

Every forecast number should be traceable back to its source data:

Staffing requirement (45 agents for Queue A, 10:00-10:15 Monday)
  ← Erlang C calculation
    ← Inputs: forecast volume (142 contacts), forecast AHT (295 sec),
       SL target (80/20), shrinkage (32%)
      ← Forecast volume (142)
        ← Model: ARIMA(2,1,1) trained on 52 weeks of historical volume
          ← Historical volume source: ACD CDRs, 15-min aggregation
            ← Data quality: 99.8% interval completeness,
               transfers deduplicated, outliers capped at P99
      ← Forecast AHT (295 sec)
        ← Model: 8-week weighted moving average
          ← Historical AHT source: ACD CDRs, ACW included
      ← Shrinkage (32%)
        ← Components: breaks (8%), training (4%), coaching (3%),
           meetings (2%), PTO (6%), absenteeism (5%), other (4%)
          ← Source: 12-week rolling actual shrinkage from adherence data

Why lineage matters: When the Monday 10:00 AM interval is understaffed by 8 agents, you need to determine if the forecast was wrong (volume was 180, not 142), the AHT was wrong (actual was 340 seconds, not 295), the shrinkage was wrong (38%, not 32%), or the schedule was wrong (only 38 agents scheduled against a 45-agent requirement). Without lineage, you debug by guessing.

Lineage Implementation

  • Tag every forecast with a version ID and the source data date range
  • Store the model parameters used for each forecast version
  • Log shrinkage component values at the time the staffing calculation ran
  • Maintain an immutable audit log of forecast → staffing → schedule transformations

Data Ownership and RACI

Data Domain Responsible Accountable Consulted Informed
ACD contact data IT / Telecom VP Technology WFM, QA BI, Finance
Agent master data HRIS / HR Ops CHRO WFM, Payroll, IT All
Schedule data WFM team WFM Director Operations, HR Agents, Payroll
Forecast data WFM Forecasting WFM Director Operations, Finance Scheduling, RT
Adherence data WFM Real-Time WFM Director Operations HR, QA
Quality scores QA team QA Director WFM, Training Operations
Shrinkage inputs WFM Planning WFM Director HR, Training, Operations Finance

Key ownership disputes and resolution:

  • Who owns AHT? The ACD produces it, WFM consumes it, QA influences it through coaching. Resolution: IT owns the data pipeline; WFM owns the metric definition (what's included/excluded); QA owns improvement initiatives.
  • Who owns the agent skill profile? HRIS tracks certifications, training tracks completions, WFM maps skills to queues, the ACD uses skills for routing. Resolution: HRIS is the system of record for skill certifications. WFM translates certifications to routing skills. Changes flow HRIS → WFM → ACD.
  • Who owns schedule changes? WFM publishes the schedule. Supervisors swap shifts. Agents trade shifts. The automation platform modifies activities. Resolution: WFM owns the schedule of record. All modifications flow through the WFM system API, with the modification source tagged for audit.

Privacy and Compliance

GDPR and Workforce Data

WFM systems process personal data extensively. Under GDPR (and similar regulations in CCPA, LGPD, PIPA):

  • Data minimization: Collect only what's needed for WFM operations. Agent demographic data beyond what's required for scheduling (e.g., ethnicity, religion) should not be in the WFM system unless required for accommodation scheduling.
  • Purpose limitation: Data collected for scheduling cannot be repurposed for performance management without separate legal basis. Adherence data showing a bathroom break pattern is scheduling data, not performance data.
  • Right to erasure: When an agent leaves and requests deletion, their personal data must be removable from WFM systems. Practical approach: pseudonymize rather than delete — replace agent name with "Agent-XXXX" in historical records to preserve aggregate accuracy while removing PII.
  • Data retention limits: Keeping 10 years of detailed agent adherence data is hard to justify under data minimization. Define retention periods aligned with legitimate business need. See retention policies below.

PCI-DSS for Contact Centers

Contact centers handling payment data must comply with PCI-DSS. WFM implications:

  • WFM systems should never store payment card data (they shouldn't, but verify)
  • Screen recording paused during payment capture — this affects adherence if the "recording paused" state isn't mapped to a valid schedule activity
  • Agent desktop monitoring tools integrated with WFM must not capture card data from screen shares

Agent Monitoring Disclosure

Most jurisdictions require disclosure that agents are being monitored. WFM adherence monitoring qualifies:

  • Disclose adherence monitoring in employment agreements
  • In two-party consent jurisdictions, ensure agents acknowledge real-time state tracking
  • Union environments may have additional constraints — see Union Environments and WFM

Data Retention Policies

Data Category Operational (Hot) Analytical (Warm) Compliance (Cold) Justification
Contact detail records 90 days 2 years (aggregated to interval) 7 years for regulated industries Tax, audit, dispute resolution
Agent state events 30 days 1 year (daily adherence summaries) Not retained No regulatory requirement for second-level state data
Schedules (published) 90 days + 8 weeks forward 2 years 7 years (shift records for labor law) Wage-and-hour dispute evidence
Forecast versions Current + 90 days of history 2 years (for accuracy trending) Not retained No regulatory requirement for forecast history
QA evaluations 90 days 2 years Per regulatory requirement May be needed for dispute resolution
Agent PII While employed + 90 days Pseudonymized in historical records Per jurisdiction GDPR: minimum necessary; US: varies by state

Aggregation schedule:

Daily job:
  - Raw contact records > 90 days → aggregate to interval grain → archive raw
  - Raw agent state events > 30 days → aggregate to daily adherence summary → delete raw

Weekly job:
  - 15-minute interval data > 2 years → aggregate to daily grain → archive 15-min
  - Daily interval data > 5 years → aggregate to weekly grain → archive daily

Monthly job:
  - Audit: verify aggregation completeness (no gaps)
  - Report: storage consumption by data category
  - Compliance: flag any data past retention policy for review

Master Data Management

Agent Master

The agent master record is the single source of truth for who an agent is and what they can do:

Attribute Source System Sync Frequency Conflict Resolution
Name, employee ID HRIS Daily HRIS always wins
Team, site assignment HRIS Daily HRIS always wins; WFM overrides for same-day moves logged as exceptions
Skills / proficiency Training (certification) → WFM (routing skill mapping) On certification event Training certifies; WFM maps to routing skill; ACD consumes
Schedule group WFM Real-time WFM owns
ACD agent ID ACD provisioning On hire/system change IT provisions; mapping stored in WFM
Contact information HRIS Daily HRIS always wins

Agent identity reconciliation: Run a daily reconciliation that joins HRIS active employees, ACD provisioned agents, and WFM agent records. Flag:

  • Active in HRIS but not provisioned in ACD (can't receive contacts)
  • Provisioned in ACD but not active in HRIS (potential security issue)
  • Active in HRIS but no WFM agent record (won't be scheduled)
  • WFM agent record but terminated in HRIS (phantom agent)

Queue Master

Attribute Source Notes
Queue ID ACD Unique identifier from ACD
Queue name WFM (business name may differ from ACD technical name) WFM maintains friendly names
Channel ACD Voice, chat, email, messaging
SL target Business / Operations Reviewed quarterly; stored in WFM
Concurrency Business / Operations Agents handling multiple simultaneous contacts
Skills required WFM / Routing Maps queues to agent skills
Forecast group WFM Groups queues for forecast modeling

Shift Catalog

Standardized shift definitions that constrain scheduling:

Attribute Description
Shift ID Unique identifier
Start time range Earliest and latest start time (e.g., 7:00–9:00 AM)
Duration Shift length in hours (e.g., 8.0, 10.0)
Break rules Break count, duration, earliest/latest placement
Lunch rules Duration, earliest/latest placement, paid/unpaid
Eligible schedule groups Which agent groups can work this shift
Effective dates When this shift pattern is valid

Common Pitfalls

  • No data quality monitoring. Data quality checks run once during implementation, then never again. Implement automated daily checks with alerting. A data quality dashboard reviewed weekly costs less than one bad forecast.
  • Governance by document only. Data ownership policies in a SharePoint document nobody reads. Embed governance in the system: automated reconciliation, enforced naming conventions, required metadata fields.
  • Treating HRIS as optional. "We'll maintain agent data directly in WFM." This creates a maintenance burden that scales linearly with headcount and eventually diverges from HRIS reality.
  • No retention policy enforcement. Policy says "retain 2 years." Database has 7 years of detail data because nobody implemented the cleanup job. Storage costs grow; query performance degrades; GDPR risk accumulates.
  • Ignoring the shrinkage data problem. Shrinkage is the hardest WFM input to measure accurately because it's composed of many small categories (breaks, training, coaching, meetings, PTO, absenteeism), each with its own data source and accuracy issues. Garbage shrinkage data → garbage staffing calculations → chronic over/understaffing.

Maturity Model Position

Level Characteristics
Foundational No formal data governance. Data quality issues discovered when reports look wrong. No retention policy. Agent data maintained in multiple systems with manual reconciliation. No data lineage.
Progressive Data ownership RACI documented. Automated daily quality checks on key metrics (completeness, volume reconciliation). Retention policy defined and partially enforced. Agent master managed in HRIS with automated sync to WFM.
Advanced Full data lineage from source to forecast to staffing decision. Automated reconciliation across all integrated systems with exception alerting. Retention policy enforced by automated jobs. Privacy impact assessment completed for WFM data. Data quality SLAs defined and monitored. Master data management with conflict resolution rules.

See Also

References

  • DAMA International. (2017). DAMA-DMBOK: Data Management Body of Knowledge. 2nd ed. Technics Publications. — Comprehensive data governance framework.
  • Redman, T. C. (2008). Data Driven: Profiting from Your Most Important Business Asset. Harvard Business Press. — Business case for data quality.
  • European Parliament. (2016). "General Data Protection Regulation (GDPR)." Regulation (EU) 2016/679.
  • PCI Security Standards Council. (2022). "PCI DSS v4.0." https://www.pcisecuritystandards.org/
  • Cleveland, B. (2012). Call Center Management on Fast Forward. ICMI Press. — Shrinkage measurement and forecasting accuracy.
  • Loshin, D. (2010). Master Data Management. Morgan Kaufmann. — MDM patterns and conflict resolution.