ImpactAudit v2 | Repo Index

ImpactAudit v2

Decision-grade workforce risk and business impact modeling with conservative, audit-first design. Built for planning, prioritization, and executive decision support. Not for automated employment actions.

Production Time-split validated Leakage audits enforced Versioned artifacts Artifact-backed metrics

Executive summary

ImpactAudit v2 connects workforce risk signals to downstream business exposure using a staged, auditable pipeline. The system favors validity and governance over inflated metrics. It is designed to rank and prioritize cohorts, quantify exposure, and support scenario discussions with executives and finance.

Core idea: In low base-rate environments, the decision value is cohort prioritization and lift, not perfect individual prediction.

Design principles

  • Conservatism over optimism
  • Auditability over automation
  • Interpretability over complexity
  • Human review over prescriptions
  • Uncertainty surfaced, not hidden

Pipeline architecture

  • Stage 1: Employee-period attrition risk model (time-aware training, leakage checks)
  • Stage 2: BU-period aggregation layer (stable KPIs suitable for executive reporting)
  • Stage 3: Revenue and KPI exposure modeling (production)
  • Stage 4: Intervention effectiveness and ROI reporting (production)

Each stage writes versioned outputs and diagnostics to support traceability and governance. The mainline remains validation-first with reproducible artifacts.

Leakage-validated performance

In low-attrition environments, lift and prioritization are the decision-relevant metrics. PR-AUC must be interpreted relative to the base rate. Higher attrition environments can be utilized with model recalibration.

Attrition model (current run)
AUC: 0.622 Manifest-backed
Split date: 2025-07-01. Horizon: 3 months. Test base rate: 0.75%.
Attrition model (current run)
PR-AUC: 0.0169 Manifest-backed
Interpret as enrichment over baseline, not an absolute score.
Lift (typical guidance)
Lift@Top-10% ~ 2.0x to 2.5x Operational
Best for prioritization queues and cohort interventions.
Important: unusually high test performance should be treated as leakage risk until audits pass. This repo prioritizes validity over inflated metrics.

Why this model is decision-grade for its intended use

What matters is the combination below. This is the basis for decision capability in low-event workforce settings, not a single headline metric.

What matters (the combination)

  • Low base rate environments where raw accuracy is not the decision metric
  • Stable structural features that remain interpretable and governance-safe
  • Lift-driven prioritization to enrich high-risk cohorts for planning and focus
  • Artifact-backed validation with reproducible runs, manifests, and feature hashes
  • Governance constraints that prevent misuse and enforce human review
  • Scenario-aware outputs that support planning and impact discussions
Why this matters: that combination is rare in people analytics.

Where most systems fail

  • Chasing higher AUC with leaky features
  • Overfitting volatile behavioral signals
  • Making implicit causal claims
  • Encouraging misuse at the individual level

Quickstart

Commands below are representative. Adjust paths to match your local environment and config layout. Keep time splits and leakage audits enabled.

1. Create environment

python -m venv .venv
# Windows
.venv\Scripts\activate
# macOS/Linux
source .venv/bin/activate

pip install -r requirements.txt

2. Configure

  • Confirm config.toml points to input and output locations
  • Verify date parsing and consistent time zone assumptions
  • Verify stable IDs: employee_id, business_unit_id

3. Run the pipeline

python run_pipeline.py --config config.toml

4. Run Stage 4 reports

python run_gap4.py --config config.toml
Rule: if a run shows unusually high test performance, treat it as leakage risk until audits pass.

Data inputs

  • employees.csv: employee_id, hire_date, term_date, business_unit_id, optional job_family, level, location, manager_id
  • comp_history.csv: employee_id, effective_date, base_pay, bonus_target, compa_ratio, merit_pct, promo_flag
  • perf_history.csv: employee_id, review_date, rating, calibrated_rating, goal_attainment_pct
  • business_kpi.csv: business_unit_id, period_start, revenue, operating_margin_pct, productivity_index, avg_vacancy_days, quality_incidents, headcount_proxy
Data quality stance: the pipeline should degrade conservatively and surface warnings rather than silently produce confident outputs.

Outputs (aligned to repo)

Stage 1 outputs (employee-level, referenced only)

  • Location: artifacts/attrition/<client_id>/<run_id>/
  • manifest.json: run metadata, split date, base rates, AUC, PR-AUC, config, feature hash
  • feature_cols.json: exact feature list for the run
  • Model bundle files (model binary and any supporting files) stored alongside the manifest

Stage 2 outputs (business-unit aggregation)

  • Primary file: bu_period_features.csv
  • Typical location: reports/ or configured output directory
  • Contains BU-period features derived from Stage 1 scoring and structure

Stage 3 outputs (revenue and KPI exposure)

  • Primary file: bu_period_scored.csv
  • Scenario outputs: kpi_impact_output.csv, kpi_impact_summary.csv
  • Typical location: reports/ or configured output directory

Stage 4 outputs (intervention effectiveness and ROI)

  • Stage 4 writes two report files (paths printed by run_gap4.py)
  • Typical location: reports/ or configured output directory
  • Effectiveness report: intervention effectiveness vs controls
  • ROI report: conservative financial return estimates

Diagnostics

  • debug_hard_excluded_cols.csv, debug_change_cols_detected.csv, debug_static_cols_used.csv
  • debug_single_feature_scan_no_change.csv
  • Typical location: analysis/ or configured output directory
Note: If your repo uses different exact filenames for Stage 4 outputs, keep the “paths printed by run_gap4.py” language and list the two conceptual outputs. That stays accurate across config changes.

Repository structure

impactaudit/
  data/                    # Curated inputs (not raw dumps)
  src/impactaudit/          # Core library code
  jobs/                    # Entry points (train, score, aggregate)
  artifacts/               # Versioned model bundles and manifests
  analysis/                # Audits, ablations, diagnostics
  reports/                 # Executive-ready outputs (configured)
  docs/                    # Methodology, governance, legal

Governance

  • Intended for workforce planning, scenario modeling, and executive decision support.
  • Not intended for automated employment decisions or disciplinary action.
  • KPI what-if outputs are model-based associations, not causal claims.
  • Apply appropriate access controls and human review.
Prohibited use: do not use this system to make or support individual employment decisions.

Status

  • Architecture complete
  • Leakage suite implemented and enforced
  • Production-appropriate for intended use: structural risk stratification, planning, and impact exposure
  • Further gains typically require new data signals, not more tuning

Author: Ryan Brush
Enterprise Data, People Analytics, and AI Modeling