# Normalization and Difficulty Estimation

For regression tasks, Calibrated Explanations can use **normalized residuals** and **difficulty estimation** to improve interval calibration, especially for heteroscedastic data (where prediction variance differs across instances).

## What is Difficulty Estimation?

Difficulty estimation adjusts prediction intervals based on how "difficult" each instance is to predict. Instances with higher variance in the underlying model get wider intervals, while easier-to-predict instances get narrower intervals.

**Without difficulty estimation**: All instances get the same interval width (homoscedastic assumption)

**With difficulty estimation**: Interval width scales with estimated prediction difficulty

## When to Use

Consider difficulty estimation when:

* **Heteroscedastic data**: Prediction errors vary systematically across the feature space
* **Mixed complexity**: Some regions of input space are inherently harder to predict
* **Instance-specific intervals**: You need intervals that reflect per-instance uncertainty
* **Residual patterns**: You observe that residuals have non-constant variance

```{admonition} Signs You Need Difficulty Estimation
:class: tip

* Residual plots show "fan" or "funnel" shapes
* Some subgroups have much larger prediction errors
* Interval coverage varies significantly across the feature space
* Simple conformal intervals are too wide for easy cases or too narrow for hard cases
```

## How It Works

The underlying `IntervalRegressor` from the `crepes` library supports several normalization strategies:

1. **Standard (no normalization)**: Residuals are used directly
2. **Sigma normalization**: Residuals are divided by an estimated standard deviation
3. **Difficulty-based**: Uses a secondary model to estimate per-instance difficulty

The difficulty estimator predicts how uncertain each instance is, and the conformal intervals are scaled accordingly.

## Configuration

Difficulty estimation is configured through the internal `IntervalRegressor` or `ConformalPredictiveSystem` when setting up the explainer.

### Basic Usage

For most users, the default CPS configuration handles difficulty automatically:

```python
from calibrated_explanations import WrapCalibratedExplainer

explainer = WrapCalibratedExplainer(model)
explainer.fit(x_proper, y_proper)
explainer.calibrate(x_cal, y_cal)

# Intervals are already calibrated with CPS
prediction, (low, high) = explainer.predict(
    x_test,
    uq_interval=True,
    low_high_percentiles=(5, 95)
)
```

### Advanced Configuration

For advanced users needing explicit difficulty estimation:

```python
from crepes import ConformalPredictiveSystem

# Create CPS with specific normalization
cps = ConformalPredictiveSystem()

# Fit with normalization model
# (Consult crepes documentation for full options)
cps.fit(
    residuals=y_cal - model.predict(x_cal),
    sigmas=difficulty_estimates  # Per-instance difficulty scores
)
```

## Trade-offs

| Aspect | Without Normalization | With Normalization |
| :--- | :--- | :--- |
| Interval width | Constant | Instance-specific |
| Complexity | Simpler | Requires difficulty model |
| Coverage | May vary by region | More uniform coverage |
| Computation | Faster | Slightly slower |

## Interpreting Normalized Intervals

When difficulty estimation is active:

* **Narrow intervals**: The model is confident about this instance
* **Wide intervals**: High estimated difficulty or sparse calibration region
* **Varying widths**: Expected behavior reflecting instance-specific uncertainty

```python
prediction, (low, high) = explainer.predict(x_test, uq_interval=True)

# Examine interval widths
widths = high - low

# Identify easy vs hard instances
easy_instances = widths < np.percentile(widths, 25)
hard_instances = widths > np.percentile(widths, 75)

print(f"Easy instances (narrow intervals): {easy_instances.sum()}")
print(f"Hard instances (wide intervals): {hard_instances.sum()}")
```

## Research Background

Difficulty estimation and normalized conformal prediction are documented in:

> Löfström, T., et al. (2025). Calibrated Explanations for Regression.
> Machine Learning 114, 100.
> [DOI: 10.1007/s10994-024-06642-8](https://link.springer.com/article/10.1007/s10994-024-06642-8)

The underlying conformal prediction methodology:

> Boström, H., et al. (2021). crepes: Conformal Regressors and Predictive Systems.
> [crepes documentation](https://crepes.readthedocs.io/)

## Cross-References

* {doc}`../../tasks/regression` - Regression task documentation
* {doc}`../../foundations/concepts/probabilistic_regression` - Probabilistic regression concepts
* {doc}`../../tasks/capabilities` - Full capability manifest
* {doc}`../../citing` - Citation information