Tune runtime performance (opt-in)¶
v0.9.0 introduces three performance controls that stay disabled by default: the calibrator cache, a multiprocessing backend, and vectorised perturbations baked into the core explainer. This guide shows how to enable each feature consciously, how to tune the new configuration surface, and how to revert to the baseline behaviour if they are not a fit for your deployment.
Prerequisites¶
Install
calibrated-explanationsas usual. Optional extras are only required when you enable the fast explanations plugin.Import :class:
calibrated_explanations.api.config.ExplainerBuilder(or build an :class:~calibrated_explanations.api.config.ExplainerConfigmanually) so you can toggle the cache and parallel backends without mutating the publicWrapCalibratedExplainerAPI.Keep governance approvals handy—the release checklist treats these as opt-in features, so document who enabled them and why.
Enable the calibrator cache¶
The cache saves intermediate calibration artefacts so repeated explanation runs avoid recomputing identical payloads. It is disabled unless you flip the feature flag when building your configuration.
When you already have a fitted wrapper, reuse its learner when constructing the builder so the cached artefacts align with your deployed estimator:
from calibrated_explanations import WrapCalibratedExplainer
from calibrated_explanations.api.config import ExplainerBuilder
from tests.helpers.doc_utils import run_quickstart_classification
context = run_quickstart_classification()
model = context.explainer.learner
builder = ExplainerBuilder(model)
config = (
builder.perf_cache(
True,
max_items=256,
max_bytes=8 * 1024 * 1024,
namespace="service-a",
version="v2",
ttl=600,
)
.build_config()
)
explainer = WrapCalibratedExplainer.from_config(config)
max_itemscaps the number of cached entries (defaults to 512).max_bytesimposes an approximate memory ceiling using arraynbyteswhen available.namespace/versionisolate callers so multiple services can safely share an in-memory cache.ttl(seconds) expires entries proactively; omit it to cache until evicted by LRU.
You can toggle the cache at runtime with the CE_CACHE environment variable.
The format accepts comma-separated directives:
CE_CACHE="enable,max_items=1024,ttl=900" python serve.py
Valid tokens include enable/on/off as well as namespace=,
version=, max_items=, max_bytes=, and ttl=. To roll back, rebuild
the configuration with perf_cache(False) or export CE_CACHE=off.
Enable multiprocessing for perturbations¶
The parallel backend runs perturbation-heavy steps across worker processes. Like the cache, it remains off until you enable it on the configuration object.
config_parallel = (
builder.perf_parallel(True, backend="threads", workers=4, min_batch=8)
.perf_cache(True)
.build_config()
)
explainer_parallel = WrapCalibratedExplainer.from_config(config_parallel)
backendaccepts"threads","processes","joblib", or"auto"(chooses a strategy based on platform and CPU count).workerscaps the worker pool; omit it to use all logical CPUs.min_batchskips the executor for very small workloads so sequential execution stays cheaper.min_instancessets the floor for instance-parallel execution; defaults tomax(8, chunk_size)so small-but-parallel-worthy batches are not forced to run serially.tiny_workloadoverrides the tiny-workload guard used before spinning a pool; omit it to rely on the adaptive per-granularity defaults (≈8–16 by default).
The CE_PARALLEL environment variable mirrors the builder options:
CE_PARALLEL="enable,threads,workers=8,min_batch=4,min_instances=8,tiny=12" python serve.py
Set CE_PARALLEL=off to fall back to single-threaded execution without
touching code. The executor resets the calibrator cache after forking, so cached
payloads remain process safe.
Use vectorised perturbations via FAST explanations¶
Vectorised perturbations now ship in the core explainer. explain_factual and
explore_alternatives rely on numpy masking rather than deep Python loops, so
you benefit immediately when the cache or parallel executor is enabled. The
explain_fast plugin continues to offer additional heuristics, but it is no
longer required for SIMD-friendly perturbation handling.
Filter features using internal FAST explanations¶
When the number of features is large, you can reduce compute by using internal FAST explanations to discard unimportant features before running the full factual/alternative explainers.
Enable this at build time:
from calibrated_explanations import WrapCalibratedExplainer
from calibrated_explanations.api.config import ExplainerBuilder
builder = ExplainerBuilder(model)
config = (
builder
.perf_parallel(True, backend="threads", workers=4, granularity="feature")
.perf_feature_filter(True, per_instance_top_k=8)
.build_config()
)
wrapper = WrapCalibratedExplainer._from_config(config)
wrapper.calibrate(x_cal, y_cal)
explanations = wrapper.explain_factual(x_test)
At runtime you can override or disable the filter via CE_FEATURE_FILTER:
CE_FEATURE_FILTER="enable,top_k=8" python serve.py
Internally, each factual/alternative call:
runs an internal FAST pass on the same batch to obtain per-instance weights,
aggregates those weights and keeps at most
top_kfeatures for the batch,passes the resulting
features_to_ignoreto the existing execution plugins.
If the FAST plugin is not installed or fails, the filter is skipped and the behaviour falls back to the unfiltered explainers.
Roll back to the baseline runtime¶
Rebuild any configuration objects with
perf_cache(False)andperf_parallel(False).Remove the FAST plugin bundle (
pip uninstall external-plugins) or revoke trust viaCE_DENY_PLUGIN/calibrated_explanations.plugins.cliif you previously enabled it for additional heuristics.Restart long-lived services to clear cached artefacts, worker pools, and any process-level telemetry counters.
Document the change in your release notes or change log so operators know the
performance toggles returned to their v0.8.x defaults. Capture cache metrics via
explainer._perf_cache.metrics.snapshot() or the telemetry callback if you
need before/after validation.