Replication workflow¶
Use this workflow to reproduce the binary classification, multiclass, regression, ensured, and fast calibrated explanations studies published by the team.
1. Provision the evaluation environment¶
python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -e .[dev,eval]
The [eval] extra installs xgboost, venn-abers, and plotting
libraries referenced throughout the studies.
2. Match the published datasets¶
Use the manifests under
evaluation/
for dataset sources, preprocessing notes, and random seeds.
3. Execute the scripted pipelines¶
Run the notebooks and scripts in the evaluation directory that align with your study:
Classification_Experiment_sota.pycovers the 25-dataset binary baseline and persistsresults_sota.pklfor diffs.multiclass/andregression/notebooks implement the multiclass and interval regression papers.ensure/andfastCE/contain ensured-explanations and fast plugin artefacts, each with accompanying result archives.
4. Compare outputs¶
Each evaluation asset ships with *.pkl or .zip archives so you can diff
against the published tables. Preserve the bundled random seeds (0 or
42 depending on the asset) to align distributions.
5. Document deviations¶
Record any dataset or calibrator changes in your replication log and cross-link active ADRs via Release checklist before you publish.