# Integrate with scikit-learn pipelines

WrapCalibratedExplainer can manage preprocessing pipelines so calibration and
inference use the same transformations.

## Configure a preprocessing pipeline

```python
from sklearn.compose import ColumnTransformer
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.impute import SimpleImputer
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from calibrated_explanations.api.config import ExplainerConfig
from calibrated_explanations import WrapCalibratedExplainer

dataset = load_breast_cancer()
x_train, x_test, y_train, y_test = train_test_split(
    dataset.data,
    dataset.target,
    test_size=0.2,
    stratify=dataset.target,
    random_state=0,
)
x_proper, x_cal, y_proper, y_cal = train_test_split(
    x_train,
    y_train,
    test_size=0.25,
    stratify=y_train,
    random_state=0,
)

numeric = [0, 1, 2]
preprocessor = ColumnTransformer(
    [
        (
            "num",
            Pipeline(
                steps=[
                    ("impute", SimpleImputer(strategy="median")),
                    ("scale", StandardScaler()),
                ]
            ),
            numeric,
        )
    ],
    remainder="drop",
)

config = ExplainerConfig(
    model=RandomForestClassifier(random_state=0),
    preprocessor=preprocessor,
)
explainer = WrapCalibratedExplainer.from_config(config)
```

When you call `fit` and `calibrate`, the wrapper fits both the underlying model
and the preprocessing pipeline.

```python
explainer.fit(x_proper, y_proper)
explainer.calibrate(x_cal, y_cal)
factual = explainer.explain_factual(x_test)
```

## Inspect telemetry snapshots

The runtime attaches preprocessing metadata to telemetry so you can audit which
transformers executed:

```python
telemetry = explainer.runtime_telemetry
pre = telemetry.get("preprocessor", {})
print(pre.get("transformer_id"))
print(pre.get("mapping_snapshot"))
```

Each entry includes the transformer identifier, mapping snapshot (when
available), and whether auto-encoding is enabled.

## Tips

- Keep preprocessing deterministic so calibration and inference remain aligned.
- Prefer column selectors over positional slicing when working with pandas
  DataFrames.
- When migrating to a public configuration API, replace `_from_config` with the
  official constructor provided by the release notes.

> **Telemetry note:** Runtime telemetry remains opt-in. Enable it only when governance teams need pipeline provenance, then follow :doc:`configure_telemetry` to capture preprocessing metadata for each batch.