Executive summary
ImpactAudit v2 connects workforce risk signals to downstream business exposure using a staged, auditable pipeline. The system favors validity and governance over inflated metrics. It is designed to rank and prioritize cohorts, quantify exposure, and support scenario discussions with executives and finance.
Design principles
- Conservatism over optimism
- Auditability over automation
- Interpretability over complexity
- Human review over prescriptions
- Uncertainty surfaced, not hidden
Pipeline architecture
- Stage 1: Employee-period attrition risk model (time-aware training, leakage checks)
- Stage 2: BU-period aggregation layer (stable KPIs suitable for executive reporting)
- Stage 3: Revenue and KPI exposure modeling (production)
- Stage 4: Intervention effectiveness and ROI reporting (production)
Each stage writes versioned outputs and diagnostics to support traceability and governance. The mainline remains validation-first with reproducible artifacts.
Leakage-validated performance
In low-attrition environments, lift and prioritization are the decision-relevant metrics. PR-AUC must be interpreted relative to the base rate. Higher attrition environments can be utilized with model recalibration.
Why this model is decision-grade for its intended use
What matters is the combination below. This is the basis for decision capability in low-event workforce settings, not a single headline metric.
What matters (the combination)
- Low base rate environments where raw accuracy is not the decision metric
- Stable structural features that remain interpretable and governance-safe
- Lift-driven prioritization to enrich high-risk cohorts for planning and focus
- Artifact-backed validation with reproducible runs, manifests, and feature hashes
- Governance constraints that prevent misuse and enforce human review
- Scenario-aware outputs that support planning and impact discussions
Where most systems fail
- Chasing higher AUC with leaky features
- Overfitting volatile behavioral signals
- Making implicit causal claims
- Encouraging misuse at the individual level
Quickstart
Commands below are representative. Adjust paths to match your local environment and config layout. Keep time splits and leakage audits enabled.
1. Create environment
python -m venv .venv
# Windows
.venv\Scripts\activate
# macOS/Linux
source .venv/bin/activate
pip install -r requirements.txt
2. Configure
- Confirm
config.tomlpoints to input and output locations - Verify date parsing and consistent time zone assumptions
- Verify stable IDs:
employee_id,business_unit_id
3. Run the pipeline
python run_pipeline.py --config config.toml
4. Run Stage 4 reports
python run_gap4.py --config config.toml
Data inputs
employees.csv: employee_id, hire_date, term_date, business_unit_id, optional job_family, level, location, manager_idcomp_history.csv: employee_id, effective_date, base_pay, bonus_target, compa_ratio, merit_pct, promo_flagperf_history.csv: employee_id, review_date, rating, calibrated_rating, goal_attainment_pctbusiness_kpi.csv: business_unit_id, period_start, revenue, operating_margin_pct, productivity_index, avg_vacancy_days, quality_incidents, headcount_proxy
Outputs (aligned to repo)
Stage 1 outputs (employee-level, referenced only)
- Location:
artifacts/attrition/<client_id>/<run_id>/ manifest.json: run metadata, split date, base rates, AUC, PR-AUC, config, feature hashfeature_cols.json: exact feature list for the run- Model bundle files (model binary and any supporting files) stored alongside the manifest
Stage 2 outputs (business-unit aggregation)
- Primary file:
bu_period_features.csv - Typical location:
reports/or configured output directory - Contains BU-period features derived from Stage 1 scoring and structure
Stage 3 outputs (revenue and KPI exposure)
- Primary file:
bu_period_scored.csv - Scenario outputs:
kpi_impact_output.csv,kpi_impact_summary.csv - Typical location:
reports/or configured output directory
Stage 4 outputs (intervention effectiveness and ROI)
- Stage 4 writes two report files (paths printed by
run_gap4.py) - Typical location:
reports/or configured output directory - Effectiveness report: intervention effectiveness vs controls
- ROI report: conservative financial return estimates
Diagnostics
debug_hard_excluded_cols.csv,debug_change_cols_detected.csv,debug_static_cols_used.csvdebug_single_feature_scan_no_change.csv- Typical location:
analysis/or configured output directory
Repository structure
impactaudit/
data/ # Curated inputs (not raw dumps)
src/impactaudit/ # Core library code
jobs/ # Entry points (train, score, aggregate)
artifacts/ # Versioned model bundles and manifests
analysis/ # Audits, ablations, diagnostics
reports/ # Executive-ready outputs (configured)
docs/ # Methodology, governance, legal
Governance
- Intended for workforce planning, scenario modeling, and executive decision support.
- Not intended for automated employment decisions or disciplinary action.
- KPI what-if outputs are model-based associations, not causal claims.
- Apply appropriate access controls and human review.
Status
- Architecture complete
- Leakage suite implemented and enforced
- Production-appropriate for intended use: structural risk stratification, planning, and impact exposure
- Further gains typically require new data signals, not more tuning
Author: Ryan Brush
Enterprise Data, People Analytics, and AI Modeling
Legal disclaimer, liability limitation, and restricted use
No legal, employment, financial, or professional advice
ImpactAudit and all associated models, analyses, outputs, documentation, and artifacts are provided for informational and decision-support purposes only. Nothing in this repository constitutes legal advice, employment or labor advice, financial, accounting, actuarial, or investment advice, compliance determinations, or professional opinions requiring licensure. Users are responsible for obtaining independent professional advice before acting.
No employment decisions or individual action
Under no circumstances may outputs be used to make or support hiring, firing, promotion, demotion, discipline, termination, or compensation decisions, target or profile individual employees, automate individual employment actions, or serve as evidence in disputes, investigations, or litigation. Any such use is explicitly prohibited.
No causal claims or guarantees
ImpactAudit does not assert causation or guarantee outcomes. Results are sensitive to data quality, assumptions, and uncertainty. No warranty is made that any action evaluated will produce any particular outcome.
Limitation of liability and no warranty
To the maximum extent permitted by law, the authors, contributors, licensors, and distributors are not liable for any direct, indirect, incidental, consequential, special, punitive, or exemplary damages arising from use, inability to use, or reliance on this system or its outputs. ImpactAudit is provided "as is" and "as available" without warranties of any kind, express or implied.
Indemnification
Users agree to indemnify, defend, and hold harmless the authors, contributors, licensors, and distributors from claims, damages, losses, liabilities, costs, or expenses arising from unauthorized or prohibited use, employment or financial decisions made using the system, or violations of law, regulation, or policy.
LEGAL.md. Use of this repository constitutes acceptance of these restrictions.