# Tune runtime performance (opt-in) v0.9.0 introduces three performance controls that stay disabled by default: the calibrator cache, a multiprocessing backend, and vectorised perturbations baked into the core explainer. This guide shows how to enable each feature consciously, how to tune the new configuration surface, and how to revert to the baseline behaviour if they are not a fit for your deployment. ## Prerequisites - Install ``calibrated-explanations`` as usual. Optional extras are only required when you enable the fast explanations plugin. - Import :class:`calibrated_explanations.api.config.ExplainerBuilder` (or build an :class:`~calibrated_explanations.api.config.ExplainerConfig` manually) so you can toggle the cache and parallel backends without mutating the public ``WrapCalibratedExplainer`` API. - Keep governance approvals handy—the release checklist treats these as opt-in features, so document who enabled them and why. ## Enable the calibrator cache The cache saves intermediate calibration artefacts so repeated explanation runs avoid recomputing identical payloads. It is disabled unless you flip the feature flag when building your configuration. When you already have a fitted wrapper, reuse its learner when constructing the builder so the cached artefacts align with your deployed estimator: ```python from calibrated_explanations import WrapCalibratedExplainer from calibrated_explanations.api.config import ExplainerBuilder from tests.helpers.doc_utils import run_quickstart_classification context = run_quickstart_classification() model = context.explainer.learner builder = ExplainerBuilder(model) config = ( builder.perf_cache( True, max_items=256, max_bytes=8 * 1024 * 1024, namespace="service-a", version="v2", ttl=600, ) .build_config() ) explainer = WrapCalibratedExplainer.from_config(config) ``` - ``max_items`` caps the number of cached entries (defaults to 512). - ``max_bytes`` imposes an approximate memory ceiling using array ``nbytes`` when available. - ``namespace``/``version`` isolate callers so multiple services can safely share an in-memory cache. - ``ttl`` (seconds) expires entries proactively; omit it to cache until evicted by LRU. You can toggle the cache at runtime with the ``CE_CACHE`` environment variable. The format accepts comma-separated directives: ```bash CE_CACHE="enable,max_items=1024,ttl=900" python serve.py ``` Valid tokens include ``enable``/``on``/``off`` as well as ``namespace=``, ``version=``, ``max_items=``, ``max_bytes=``, and ``ttl=``. To roll back, rebuild the configuration with ``perf_cache(False)`` or export ``CE_CACHE=off``. ## Enable multiprocessing for perturbations The parallel backend runs perturbation-heavy steps across worker processes. Like the cache, it remains off until you enable it on the configuration object. ```python config_parallel = ( builder.perf_parallel(True, backend="threads", workers=4, min_batch=8) .perf_cache(True) .build_config() ) explainer_parallel = WrapCalibratedExplainer.from_config(config_parallel) ``` - ``backend`` accepts ``"threads"``, ``"processes"``, ``"joblib"``, or ``"auto"`` (chooses a strategy based on platform and CPU count). - ``workers`` caps the worker pool; omit it to use all logical CPUs. - ``min_batch`` skips the executor for very small workloads so sequential execution stays cheaper. - ``min_instances`` sets the floor for instance-parallel execution; defaults to ``max(8, chunk_size)`` so small-but-parallel-worthy batches are not forced to run serially. - ``tiny_workload`` overrides the tiny-workload guard used before spinning a pool; omit it to rely on the adaptive per-granularity defaults (≈8–16 by default). The ``CE_PARALLEL`` environment variable mirrors the builder options: ```bash CE_PARALLEL="enable,threads,workers=8,min_batch=4,min_instances=8,tiny=12" python serve.py ``` Set ``CE_PARALLEL=off`` to fall back to single-threaded execution without touching code. The executor resets the calibrator cache after forking, so cached payloads remain process safe. ## Use vectorised perturbations via FAST explanations Vectorised perturbations now ship in the core explainer. ``explain_factual`` and ``explore_alternatives`` rely on numpy masking rather than deep Python loops, so you benefit immediately when the cache or parallel executor is enabled. The ``explain_fast`` plugin continues to offer additional heuristics, but it is no longer required for SIMD-friendly perturbation handling. ## Filter features using internal FAST explanations When the number of features is large, you can reduce compute by using internal FAST explanations to discard unimportant features before running the full factual/alternative explainers. Enable this at build time: ```python from calibrated_explanations import WrapCalibratedExplainer from calibrated_explanations.api.config import ExplainerBuilder builder = ExplainerBuilder(model) config = ( builder .perf_parallel(True, backend="threads", workers=4, granularity="feature") .perf_feature_filter(True, per_instance_top_k=8) .build_config() ) wrapper = WrapCalibratedExplainer._from_config(config) wrapper.calibrate(x_cal, y_cal) explanations = wrapper.explain_factual(x_test) ``` At runtime you can override or disable the filter via ``CE_FEATURE_FILTER``: ```bash CE_FEATURE_FILTER="enable,top_k=8" python serve.py ``` Internally, each factual/alternative call: - runs an internal FAST pass on the same batch to obtain per-instance weights, - aggregates those weights and keeps at most ``top_k`` features for the batch, - passes the resulting ``features_to_ignore`` to the existing execution plugins. If the FAST plugin is not installed or fails, the filter is skipped and the behaviour falls back to the unfiltered explainers. ## Roll back to the baseline runtime 1. Rebuild any configuration objects with ``perf_cache(False)`` and ``perf_parallel(False)``. 2. Remove the FAST plugin bundle (``pip uninstall external-plugins``) or revoke trust via ``CE_DENY_PLUGIN``/``calibrated_explanations.plugins.cli`` if you previously enabled it for additional heuristics. 3. Restart long-lived services to clear cached artefacts, worker pools, and any process-level telemetry counters. Document the change in your release notes or change log so operators know the performance toggles returned to their v0.8.x defaults. Capture cache metrics via ``explainer._perf_cache.metrics.snapshot()`` or the telemetry callback if you need before/after validation.