# Tune runtime performance (opt-in)

v0.9.0 introduces three performance controls that stay disabled by default: the
calibrator cache, a multiprocessing backend, and vectorised perturbations baked
into the core explainer. This guide shows how to enable each feature
consciously, how to tune the new configuration surface, and how to revert to the
baseline behaviour if they are not a fit for your deployment.

## Prerequisites

- Install ``calibrated-explanations`` as usual. Optional extras are only
  required when you enable the fast explanations plugin.
- Import :class:`calibrated_explanations.api.config.ExplainerBuilder` (or build an
  :class:`~calibrated_explanations.api.config.ExplainerConfig` manually) so you can
  toggle the cache and parallel backends without mutating the public
  ``WrapCalibratedExplainer`` API.
- Keep governance approvals handy—the release checklist treats these as opt-in
  features, so document who enabled them and why.

## Enable the calibrator cache

The cache saves intermediate calibration artefacts so repeated explanation runs
avoid recomputing identical payloads. It is disabled unless you flip the feature
flag when building your configuration.

When you already have a fitted wrapper, reuse its learner when constructing the
builder so the cached artefacts align with your deployed estimator:

```python
from calibrated_explanations import WrapCalibratedExplainer
from calibrated_explanations.api.config import ExplainerBuilder

from tests.helpers.doc_utils import run_quickstart_classification

context = run_quickstart_classification()
model = context.explainer.learner
builder = ExplainerBuilder(model)
config = (
    builder.perf_cache(
        True,
        max_items=256,
        max_bytes=8 * 1024 * 1024,
        namespace="service-a",
        version="v2",
        ttl=600,
    )
    .build_config()
)
explainer = WrapCalibratedExplainer.from_config(config)
```

- ``max_items`` caps the number of cached entries (defaults to 512).
- ``max_bytes`` imposes an approximate memory ceiling using array ``nbytes`` when
  available.
- ``namespace``/``version`` isolate callers so multiple services can safely
  share an in-memory cache.
- ``ttl`` (seconds) expires entries proactively; omit it to cache until evicted
  by LRU.

You can toggle the cache at runtime with the ``CE_CACHE`` environment variable.
The format accepts comma-separated directives:

```bash
CE_CACHE="enable,max_items=1024,ttl=900" python serve.py
```

Valid tokens include ``enable``/``on``/``off`` as well as ``namespace=``,
``version=``, ``max_items=``, ``max_bytes=``, and ``ttl=``. To roll back, rebuild
the configuration with ``perf_cache(False)`` or export ``CE_CACHE=off``.

## Enable multiprocessing for perturbations

The parallel backend runs perturbation-heavy steps across worker processes. Like
the cache, it remains off until you enable it on the configuration object.

```python
config_parallel = (
    builder.perf_parallel(True, backend="threads", workers=4, min_batch=8)
    .perf_cache(True)
    .build_config()
)
explainer_parallel = WrapCalibratedExplainer.from_config(config_parallel)
```

- ``backend`` accepts ``"threads"``, ``"processes"``, ``"joblib"``, or
  ``"auto"`` (chooses a strategy based on platform and CPU count).
- ``workers`` caps the worker pool; omit it to use all logical CPUs.
- ``min_batch`` skips the executor for very small workloads so sequential
  execution stays cheaper.
- ``min_instances`` sets the floor for instance-parallel execution; defaults to
  ``max(8, chunk_size)`` so small-but-parallel-worthy batches are not forced
  to run serially.
- ``tiny_workload`` overrides the tiny-workload guard used before spinning a
  pool; omit it to rely on the adaptive per-granularity defaults (≈8–16 by
  default).

The ``CE_PARALLEL`` environment variable mirrors the builder options:

```bash
CE_PARALLEL="enable,threads,workers=8,min_batch=4,min_instances=8,tiny=12" python serve.py
```

Set ``CE_PARALLEL=off`` to fall back to single-threaded execution without
touching code. The executor resets the calibrator cache after forking, so cached
payloads remain process safe.

## Use vectorised perturbations via FAST explanations

Vectorised perturbations now ship in the core explainer. ``explain_factual`` and
``explore_alternatives`` rely on numpy masking rather than deep Python loops, so
you benefit immediately when the cache or parallel executor is enabled. The
``explain_fast`` plugin continues to offer additional heuristics, but it is no
longer required for SIMD-friendly perturbation handling.

## Filter features using internal FAST explanations

When the number of features is large, you can reduce compute by using internal
FAST explanations to discard unimportant features before running the full
factual/alternative explainers.

Enable this at build time:

```python
from calibrated_explanations import WrapCalibratedExplainer
from calibrated_explanations.api.config import ExplainerBuilder

builder = ExplainerBuilder(model)
config = (
    builder
    .perf_parallel(True, backend="threads", workers=4, granularity="feature")
    .perf_feature_filter(True, per_instance_top_k=8)
    .build_config()
)
wrapper = WrapCalibratedExplainer._from_config(config)
wrapper.calibrate(x_cal, y_cal)
explanations = wrapper.explain_factual(x_test)
```

At runtime you can override or disable the filter via ``CE_FEATURE_FILTER``:

```bash
CE_FEATURE_FILTER="enable,top_k=8" python serve.py
```

Internally, each factual/alternative call:

- runs an internal FAST pass on the same batch to obtain per-instance weights,
- aggregates those weights and keeps at most ``top_k`` features for the batch,
- passes the resulting ``features_to_ignore`` to the existing execution plugins.

If the FAST plugin is not installed or fails, the filter is skipped and the
behaviour falls back to the unfiltered explainers.

## Roll back to the baseline runtime

1. Rebuild any configuration objects with ``perf_cache(False)`` and
   ``perf_parallel(False)``.
2. Remove the FAST plugin bundle (``pip uninstall external-plugins``) or revoke
   trust via ``CE_DENY_PLUGIN``/``calibrated_explanations.plugins.cli`` if you
   previously enabled it for additional heuristics.
3. Restart long-lived services to clear cached artefacts, worker pools, and any
   process-level telemetry counters.

Document the change in your release notes or change log so operators know the
performance toggles returned to their v0.8.x defaults. Capture cache metrics via
``explainer._perf_cache.metrics.snapshot()`` or the telemetry callback if you
need before/after validation.