Integrate with scikit-learn pipelines¶
WrapCalibratedExplainer can manage preprocessing pipelines so calibration and inference use the same transformations.
Configure a preprocessing pipeline¶
from sklearn.compose import ColumnTransformer
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.impute import SimpleImputer
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from calibrated_explanations.api.config import ExplainerConfig
from calibrated_explanations import WrapCalibratedExplainer
dataset = load_breast_cancer()
x_train, x_test, y_train, y_test = train_test_split(
dataset.data,
dataset.target,
test_size=0.2,
stratify=dataset.target,
random_state=0,
)
x_proper, x_cal, y_proper, y_cal = train_test_split(
x_train,
y_train,
test_size=0.25,
stratify=y_train,
random_state=0,
)
numeric = [0, 1, 2]
preprocessor = ColumnTransformer(
[
(
"num",
Pipeline(
steps=[
("impute", SimpleImputer(strategy="median")),
("scale", StandardScaler()),
]
),
numeric,
)
],
remainder="drop",
)
config = ExplainerConfig(
model=RandomForestClassifier(random_state=0),
preprocessor=preprocessor,
)
explainer = WrapCalibratedExplainer.from_config(config)
When you call fit and calibrate, the wrapper fits both the underlying model
and the preprocessing pipeline.
explainer.fit(x_proper, y_proper)
explainer.calibrate(x_cal, y_cal)
factual = explainer.explain_factual(x_test)
Inspect telemetry snapshots¶
The runtime attaches preprocessing metadata to telemetry so you can audit which transformers executed:
telemetry = explainer.runtime_telemetry
pre = telemetry.get("preprocessor", {})
print(pre.get("transformer_id"))
print(pre.get("mapping_snapshot"))
Each entry includes the transformer identifier, mapping snapshot (when available), and whether auto-encoding is enabled.
Tips¶
Keep preprocessing deterministic so calibration and inference remain aligned.
Prefer column selectors over positional slicing when working with pandas DataFrames.
When migrating to a public configuration API, replace
_from_configwith the official constructor provided by the release notes.
Telemetry note: Runtime telemetry remains opt-in. Enable it only when governance teams need pipeline provenance, then follow :doc:
configure_telemetryto capture preprocessing metadata for each batch.