> **Status note (2025-12-22):** Last edited 2025-12-22 · Archive after: Retain indefinitely as an engineering standard · Implementation window: Per Standard status (see Decision).

# Standard-003: Test Coverage Standardization

Formerly ADR-019. Reclassified as an engineering standard to keep ADRs scoped to
architectural or contract decisions.

Status: Active
Date: 2025-10-06
Deciders: Core maintainers
Reviewers: TBD
Supersedes: None
Superseded-by: None

## Context

Pytest is the primary regression harness for the package, yet neither local defaults nor
continuous integration enforce minimum coverage. `pytest.ini` runs tests quietly without
loading `pytest-cov`, while the CI workflow executes `pytest --cov=src/calibrated_explanations`
without a `--cov-fail-under` threshold. Contributors are asked to target roughly 90% coverage,
with reports uploaded to Codecov, but there is no guardrail preventing significant regressions.
Legacy runtime modules (for example `core.interval_regressor`) remain effectively untested, so
confidence in calibration guarantees erodes as the code evolves.【F:pytest.ini†L1-L17】【F:.github/workflows/test.yml†L33-L49】【F:CONTRIBUTING.md†L49-L58】【F:src/calibrated_explanations/core/interval_regressor.py†L1-L120】

## Decision

Adopt a layered coverage policy that couples numeric targets with risk-based exceptions,
while right-sizing enforcement for OSS development:

- **Package-wide floor:** Target **90% statement coverage** across
  `src/calibrated_explanations`. OSS/mainline CI reports the percentage but does not block.
  Release/stable branches enforce `--cov-fail-under=90`.
- **Critical paths:** Target **95% coverage** on calibrated prediction helpers, interval
  regression, serialization, and plugin registries. OSS/mainline CI reports per-path coverage;
  release/stable branches enforce via `coverage report --fail-under` per-path configuration.
- **Change-based gating:** Add a `coverage xml` step and integrate the Codecov “patch coverage”
  gate at **≥88%** for modified lines/files. This is advisory on OSS/mainline and blocking on
  release/stable branches. Pull requests that lower patch coverage below the threshold must
  justify waivers in the review checklist.
- **Parity reference harness:** Maintain canonical parity fixtures (factual, alternatives, fast,
  predictions) under `tests/parity_reference/` and require the parity harness job to pass in CI.
  The canonical harness entrypoint is `tests/parity_reference/run_parity_reference.py`; run it locally with::

      python tests/parity_reference/run_parity_reference.py --dataset <classification|regression|multiclass|probabilistic_regression>

  Use `--update` to refresh golden fixtures after intentional changes to explanation outputs. This harness is the canonical regression gate for OSS parity checks.
- **Documented exemptions:** Generated code, visualization golden files, and deprecated
  shims can be excluded via `.coveragerc` with explicit comments that describe the rationale
  and expiry date.
- **Public API guardrails:** Coverage thresholds MUST continue to exercise the
  WrapCalibratedExplainer contract (fit/calibrate/explain/predict flows,
  plotting helpers, uncertainty/threshold options). No part of the published
  API may be marked as deprecated or excluded from coverage unless a future ADR
  redefines the contract.

## Alternatives Considered

1. **Status quo (Codecov dashboards only).** Rejected because it allows silent regressions and
   does not give reviewers an actionable pass/fail signal.
2. **Per-module 100% coverage.** Rejected as unrealistic for plotting backends and third-party
   wrappers, potentially discouraging contributions.
3. **Runtime smoke-only checks.** Rejected; these do not measure statement coverage and fail
   to capture unexecuted branches in calibration math.

## Consequences

Positive:
- Quantitative gate keeps critical calibration logic exercised by tests before release.
- Contributors receive immediate feedback locally and in CI when coverage slips.
- Patch coverage guard discourages untested features while permitting incremental debt paydown.
- OSS contributions are not blocked by legacy coverage gaps.

Negative/Risks:
- Initial CI failures until legacy debt is addressed; requires remediation efforts.
- Slightly longer test runtime from additional reporting/threshold checks.
- Advisory-only enforcement can slow convergence without clear ownership.

## Adoption & Migration

1. Land this ADR and announce during contributor sync and release notes.
2. Introduce a shared `.coveragerc` that encodes thresholds and named exemptions.
3. Update CI (`test.yml`) to run `pytest --cov=src/calibrated_explanations --cov-report=xml \
   --cov-report=term` and pass the XML to Codecov with patch gating enabled; enforce
   `--cov-fail-under=90` and per-path fail-under only on release/stable branches.
4. Add a `make test-cov` (or invoke via `tox` target) so developers can trigger the same checks
   locally; ensure the dev extra installs `pytest-cov` by default.
5. Complete remediation tasks outlined in the coverage improvement plan so that historical debt
   does not block adoption.

## Open Questions

- **Cadence:** Review and prune `.coveragerc` exemptions during the planning phase of each minor release (e.g., v0.10.0, v0.11.0).
- **Subpackage Thresholds:** The critical-path list defined in the Decision section is sufficient for v1.0.0. Subpackage-specific thresholds are deferred to avoid excessive configuration maintenance.
- **Mutation Testing:** Defer to v0.11+ or later. While valuable, it is not a blocking requirement for v1.0.0 stability.

## Implementation Status

- 2025-10-06 – ADR accepted alongside the coverage remediation plan and
  baseline assessment.
- v0.6.x – `.coveragerc` drafted with provisional exemptions and
  baseline metrics recorded to shape the remediation backlog while CI
  continues to run without fail-under gates.
- v0.7.0 – CI introduces `--cov-fail-under=80` with exit-zero preview
  reports, coverage dashboards are published, and contributor templates
  document the waiver workflow.
- v0.8.0 – Critical-path modules (`core`, calibration, serialization,
  registry) are raised to ≥95% coverage, Codecov patch gating at ≥85%
  is advisory on mainline, and local tooling (`make test-cov`) mirrors
  the CI workflow.
- v0.9.0 – Package-wide floor raised to ≥88%, waiver inventory trimmed,
  Codecov patch gating tightened to ≥88%, and coverage enforcement is
  blocking on release branches per the milestone gate while remaining
  advisory on mainline.
- v1.0.0-rc – CI enforces the final ≥90% package floor on release
  branches, coverage dashboards become part of the release checklist,
  and branch protection rules require green coverage jobs before freeze.
- v1.0.0 – Stable release maintains ≥90% gating with scheduled audits of
  exemptions and telemetry-driven monitoring to detect regressions ahead
  of v1.0.x maintenance updates.