Note: This guide was written at v0.10.3. The reject policy API is stable; verify against src/calibrated_explanations/core/reject/policy.py for the current enum members and RejectPolicySpec for the canonical spec form.

Reject Policy Usage

This document captures the v0.10.3 guidance for opting into ADR-029’s reject integration via the new RejectPolicy enum. The goal is to preserve the legacy reject=False behavior (policy NONE) while allowing either per-call overrides or explainer-level defaults that implicitly enable reject orchestration when a non-NONE policy is selected.

Overview

  • Policy enum: All call-sites that want reject-aware behavior now accept a reject_policy: RejectPolicy = RejectPolicy.NONE. The canonical members are NONE, FLAG, ONLY_REJECTED, and ONLY_ACCEPTED. See src/calibrated_explanations/core/reject/policy.py for the list and docstrings.

  • Implicit orchestration: Selecting any policy other than NONE automatically initializes and invokes RejectOrchestrator, even if the legacy reject flag was False. The return value becomes a RejectResult envelope (src/calibrated_explanations/explanations/reject.py) carrying predictions, explanations, reject status, policy, and metadata.

Per-call policy overrides

Each explanation or prediction entry point supports the reject_policy keyword argument to vary integration behavior on a per-call basis:

from calibrated_explanations.core.calibrated_explainer import CalibratedExplainer
from calibrated_explanations.core.reject.policy import RejectPolicy

explainer = CalibratedExplainer(learner, x_cal, y_cal)

# Legacy behavior (returns CalibratedExplanations / prediction tuple)
result = explainer.explain_factual(x_test)

# Non-NONE policy returns a RejectResult envelope
envelope = explainer.explain_factual(x_test, reject_policy=RejectPolicy.ONLY_ACCEPTED)
assert envelope.policy == RejectPolicy.ONLY_ACCEPTED
  • Per-call policies override any explainer default.

  • Passing RejectPolicy.NONE (the default) keeps the original return type and skips reject orchestration entirely.

Explainer-level defaults

To avoid specifying the same policy on every call, you can configure a default policy on the explainer itself:

from calibrated_explanations.core.calibrated_explainer import CalibratedExplainer
from calibrated_explanations.core.reject.policy import RejectPolicy

explainer = CalibratedExplainer(
    learner,
    x_cal,
    y_cal,
    default_reject_policy=RejectPolicy.FLAG,
)

# Subsequent calls inherit the policy
envelope = explainer.predict(x_test)
assert envelope.policy == RejectPolicy.FLAG

For the WrapCalibratedExplainer, pass default_reject_policy at calibration time only (the wrapper’s constructor intentionally omits this argument to keep defaults centralized):

from calibrated_explanations.core.wrap_explainer import WrapCalibratedExplainer

wrapper = WrapCalibratedExplainer(learner)
wrapper.fit(X_fit, y_fit)
wrapper.calibrate(x_cal, y_cal, default_reject_policy=RejectPolicy.FLAG)

envelope = wrapper.explain_factual(x_test)
assert envelope.policy == RejectPolicy.FLAG

Release note summary

  • Added the RejectPolicy enum and RejectResult envelope so callers can opt into reject-aware outputs without changing existing defaults.

  • Explanation and prediction entry points now accept reject_policy, with non-NONE selections implicitly enabling reject orchestration and returning a structured envelope.

  • Explainer defaults (default_reject_policy) and wrapper calibration now offer reusable policy configuration, while per-call overrides continue to take precedence.

NCF selection and the w parameter

The non-conformity function (NCF) controls how the reject learner scores each instance. Pass ncf and w via RejectPolicySpec or initialize_reject_learner. When omitted, the framework uses ncf="default" (task-dependent internal score).

Public NCF values are:

  • default: internal score is hinge for binary + thresholded regression, margin for multiclass.

  • ensured: blended score score = (1 - w) * interval_width + w * default_score.

Legacy ncf="entropy" is silently mapped to ncf="default" for compatibility. Explicit ncf="hinge" and ncf="margin" are no longer supported.

The w parameter is operational only for ensured; for default it is accepted but ignored.

Recommended w ranges per NCF:

NCF

Task

Safe w range

Starting point

Notes

default

Binary / Multiclass / Regression¹

Task-dependent internal score; w ignored

ensured

Binary / Multiclass / Regression¹

0.1–0.9

0.5

Uses blending; requires w > 0.0; w < 0.1 warns

Guard rails:

  • w=0.0 with ncf='ensured' raises ValidationError.

  • w < 0.1 with ncf='ensured' emits a UserWarning.

  • In multiclass mode, default uses internal margin scoring.

¹ Regression requires a threshold. The reject framework supports regression only when a decision threshold is supplied to initialize_reject_learner(threshold=t). The framework converts regression into threshold-binarized conformal classification — it models P(y threshold) and runs conformal prediction on that binary event. This is not conformal prediction intervals. Omitting threshold for a regression explainer raises ValidationError.

from calibrated_explanations import RejectPolicySpec

# Safe starting configuration for multiclass:
spec = RejectPolicySpec.flag(ncf="default", w=0.5)

# Check which NCF was selected:
wrapper.explainer.reject_orchestrator.initialize_reject_learner(ncf="default", w=0.4)
print(wrapper.explainer.reject_ncf)             # "default"
print(wrapper.explainer.reject_ncf_auto_selected)  # False

Per-instance breakdowns

When a non-NONE policy is active the RejectResult.metadata dictionary contains per-instance keys that let you inspect the rejection breakdown without invoking the orchestrator directly. These keys are:

  • ambiguity_mask: numpy.ndarray[bool] — True for instances whose prediction set contains more than one label (ambiguous).

  • novelty_mask: numpy.ndarray[bool] — True for instances whose prediction set is empty (novelty).

  • prediction_set_size: numpy.ndarray[int] — Integer size of the prediction set per instance.

  • epsilon: numpy.ndarray[float] — Per-instance epsilon used when constructing the prediction set.

Metadata audit semantics (hardened)

Reject-aware wrapped collections expose two denominator scopes:

  • raw_total_examples: original collection size used by the unsliced reject computation (audit baseline).

  • sliced_total_examples: current view length after slicing/indexing.

raw_reject_counts always stores canonical sums for the active view:

  • rejected, ambiguity_mask, novelty_mask → sum of True entries.

  • prediction_set_size → numeric sum of per-instance set sizes.

metadata() returns a lightweight aggregate view, metadata_summary() is an alias to that lightweight view, and metadata_full() includes JSON-safe per-instance arrays for the current view.

RejectPolicySpec supports canonical user-facing NCF values (default, ensured) and is fully round-trippable via to_dict() / from_dict(). Legacy entropy payloads are normalized to default on read. For custom runtime callables, initialize the reject learner directly rather than encoding callables in policy specs.

resolve_policy_spec(...) accepts multiple interoperable input forms:

  • RejectPolicySpec objects,

  • policy dict payloads from to_dict(),

  • plain policy values (for legacy compatibility).

Short example:

res = explainer.predict(x_test, reject_policy=RejectPolicy.FLAG)
meta = res.metadata or {}
ambig = meta.get("ambiguity_mask")
nov = meta.get("novelty_mask")
sizes = meta.get("prediction_set_size")

print("Ambiguous count:", int(np.sum(ambig)) if ambig is not None else 0)
print("Novelty count:", int(np.sum(nov)) if nov is not None else 0)
print("Prediction set sizes sample:", sizes[:10] if sizes is not None else None)