ImpactAudit v2 | Repo Index

Executive summary

ImpactAudit v2 connects workforce risk signals to downstream business exposure using a staged, auditable pipeline. The system favors validity and governance over inflated metrics. It is designed to rank and prioritize cohorts, quantify exposure, and support scenario discussions with executives and finance.

Core idea: In low base-rate environments, the decision value is cohort prioritization and lift, not perfect individual prediction.

Design principles

Conservatism over optimism
Auditability over automation
Interpretability over complexity
Human review over prescriptions
Uncertainty surfaced, not hidden

Pipeline architecture

Stage 1: Employee-period attrition risk model (time-aware training, leakage checks)
Stage 2: BU-period aggregation layer (stable KPIs suitable for executive reporting)
Stage 3: Revenue and KPI exposure modeling (production)
Stage 4: Intervention effectiveness and ROI reporting (production)

Each stage writes versioned outputs and diagnostics to support traceability and governance. The mainline remains validation-first with reproducible artifacts.

Leakage-validated performance

In low-attrition environments, lift and prioritization are the decision-relevant metrics. PR-AUC must be interpreted relative to the base rate. Higher attrition environments can be utilized with model recalibration.

Attrition model (current run)

AUC: 0.622 Manifest-backed

Split date: 2025-07-01. Horizon: 3 months. Test base rate: 0.75%.

Attrition model (current run)

PR-AUC: 0.0169 Manifest-backed

Interpret as enrichment over baseline, not an absolute score.

Lift (typical guidance)

Lift@Top-10% ~ 2.0x to 2.5x Operational

Best for prioritization queues and cohort interventions.

Important: unusually high test performance should be treated as leakage risk until audits pass. This repo prioritizes validity over inflated metrics.

Why this model is decision-grade for its intended use

What matters is the combination below. This is the basis for decision capability in low-event workforce settings, not a single headline metric.

What matters (the combination)

Low base rate environments where raw accuracy is not the decision metric
Stable structural features that remain interpretable and governance-safe
Lift-driven prioritization to enrich high-risk cohorts for planning and focus
Artifact-backed validation with reproducible runs, manifests, and feature hashes
Governance constraints that prevent misuse and enforce human review
Scenario-aware outputs that support planning and impact discussions

Why this matters: that combination is rare in people analytics.

Where most systems fail

Chasing higher AUC with leaky features
Overfitting volatile behavioral signals
Making implicit causal claims
Encouraging misuse at the individual level

Quickstart

Commands below are representative. Adjust paths to match your local environment and config layout. Keep time splits and leakage audits enabled.

1. Create environment

python -m venv .venv
# Windows
.venv\Scripts\activate
# macOS/Linux
source .venv/bin/activate

pip install -r requirements.txt

2. Configure

Confirm config.toml points to input and output locations
Verify date parsing and consistent time zone assumptions
Verify stable IDs: employee_id, business_unit_id

3. Run the pipeline

python run_pipeline.py --config config.toml

4. Run Stage 4 reports

python run_gap4.py --config config.toml

Rule: if a run shows unusually high test performance, treat it as leakage risk until audits pass.

Data inputs

employees.csv: employee_id, hire_date, term_date, business_unit_id, optional job_family, level, location, manager_id
comp_history.csv: employee_id, effective_date, base_pay, bonus_target, compa_ratio, merit_pct, promo_flag
perf_history.csv: employee_id, review_date, rating, calibrated_rating, goal_attainment_pct
business_kpi.csv: business_unit_id, period_start, revenue, operating_margin_pct, productivity_index, avg_vacancy_days, quality_incidents, headcount_proxy

Data quality stance: the pipeline should degrade conservatively and surface warnings rather than silently produce confident outputs.

Outputs (aligned to repo)

Stage 1 outputs (employee-level, referenced only)

Location: artifacts/attrition/<client_id>/<run_id>/
manifest.json: run metadata, split date, base rates, AUC, PR-AUC, config, feature hash
feature_cols.json: exact feature list for the run
Model bundle files (model binary and any supporting files) stored alongside the manifest

Stage 2 outputs (business-unit aggregation)

Primary file: bu_period_features.csv
Typical location: reports/ or configured output directory
Contains BU-period features derived from Stage 1 scoring and structure

Stage 3 outputs (revenue and KPI exposure)

Primary file: bu_period_scored.csv
Scenario outputs: kpi_impact_output.csv, kpi_impact_summary.csv
Typical location: reports/ or configured output directory

Stage 4 outputs (intervention effectiveness and ROI)

Stage 4 writes two report files (paths printed by run_gap4.py)
Typical location: reports/ or configured output directory
Effectiveness report: intervention effectiveness vs controls
ROI report: conservative financial return estimates

Diagnostics

debug_hard_excluded_cols.csv, debug_change_cols_detected.csv, debug_static_cols_used.csv
debug_single_feature_scan_no_change.csv
Typical location: analysis/ or configured output directory

Note: If your repo uses different exact filenames for Stage 4 outputs, keep the “paths printed by run_gap4.py” language and list the two conceptual outputs. That stays accurate across config changes.

Repository structure

impactaudit/
  data/                    # Curated inputs (not raw dumps)
  src/impactaudit/          # Core library code
  jobs/                    # Entry points (train, score, aggregate)
  artifacts/               # Versioned model bundles and manifests
  analysis/                # Audits, ablations, diagnostics
  reports/                 # Executive-ready outputs (configured)
  docs/                    # Methodology, governance, legal

Governance

Intended for workforce planning, scenario modeling, and executive decision support.
Not intended for automated employment decisions or disciplinary action.
KPI what-if outputs are model-based associations, not causal claims.
Apply appropriate access controls and human review.

Prohibited use: do not use this system to make or support individual employment decisions.

Status

Architecture complete
Leakage suite implemented and enforced
Production-appropriate for intended use: structural risk stratification, planning, and impact exposure
Further gains typically require new data signals, not more tuning

Author: Ryan Brush
Enterprise Data, People Analytics, and AI Modeling

Legal disclaimer, liability limitation, and restricted use

No legal, employment, financial, or professional advice

ImpactAudit and all associated models, analyses, outputs, documentation, and artifacts are provided for informational and decision-support purposes only. Nothing in this repository constitutes legal advice, employment or labor advice, financial, accounting, actuarial, or investment advice, compliance determinations, or professional opinions requiring licensure. Users are responsible for obtaining independent professional advice before acting.

No employment decisions or individual action

Under no circumstances may outputs be used to make or support hiring, firing, promotion, demotion, discipline, termination, or compensation decisions, target or profile individual employees, automate individual employment actions, or serve as evidence in disputes, investigations, or litigation. Any such use is explicitly prohibited.

No causal claims or guarantees

ImpactAudit does not assert causation or guarantee outcomes. Results are sensitive to data quality, assumptions, and uncertainty. No warranty is made that any action evaluated will produce any particular outcome.

Limitation of liability and no warranty

To the maximum extent permitted by law, the authors, contributors, licensors, and distributors are not liable for any direct, indirect, incidental, consequential, special, punitive, or exemplary damages arising from use, inability to use, or reliance on this system or its outputs. ImpactAudit is provided "as is" and "as available" without warranties of any kind, express or implied.

Indemnification

Users agree to indemnify, defend, and hold harmless the authors, contributors, licensors, and distributors from claims, damages, losses, liabilities, costs, or expenses arising from unauthorized or prohibited use, employment or financial decisions made using the system, or violations of law, regulation, or policy.

Full legal terms are also provided in LEGAL.md. Use of this repository constitutes acceptance of these restrictions.