Status note (2025-12-22): Last edited 2025-12-22 · Archive after: Retain indefinitely as an engineering standard · Implementation window: Per Standard status (see Decision).

Standard-003: Test Coverage Standardization¶

Formerly ADR-019. Reclassified as an engineering standard to keep ADRs scoped to architectural or contract decisions.

Status: Active Date: 2025-10-06 Deciders: Core maintainers Reviewers: TBD Supersedes: None Superseded-by: None

Context¶

Pytest is the primary regression harness for the package, yet neither local defaults nor continuous integration enforce minimum coverage. pytest.ini runs tests quietly without loading pytest-cov, while the CI workflow executes pytest --cov=src/calibrated_explanations without a --cov-fail-under threshold. Contributors are asked to target roughly 90% coverage, with reports uploaded to Codecov, but there is no guardrail preventing significant regressions. Legacy runtime modules (for example core.interval_regressor) remain effectively untested, so confidence in calibration guarantees erodes as the code evolves.【F:pytest.ini†L1-L17】【F:.github/workflows/test.yml†L33-L49】【F:CONTRIBUTING.md†L49-L58】【F:src/calibrated_explanations/core/interval_regressor.py†L1-L120】

Decision¶

Adopt a layered coverage policy that couples numeric targets with risk-based exceptions, while right-sizing enforcement for OSS development:

Package-wide floor: Target 90% statement coverage across src/calibrated_explanations. OSS/mainline CI reports the percentage but does not block. Release/stable branches enforce --cov-fail-under=90.
Critical paths: Target 95% coverage on calibrated prediction helpers, interval regression, serialization, and plugin registries. OSS/mainline CI reports per-path coverage; release/stable branches enforce via coverage report --fail-under per-path configuration.
Change-based gating: Add a coverage xml step and integrate the Codecov “patch coverage” gate at ≥88% for modified lines/files. This is advisory on OSS/mainline and blocking on release/stable branches. Pull requests that lower patch coverage below the threshold must justify waivers in the review checklist.
Parity reference harness: Maintain canonical parity fixtures (factual, alternatives, fast, predictions) under tests/parity_reference/ and require the parity harness job to pass in CI. The canonical harness entrypoint is tests/parity_reference/run_parity_reference.py; run it locally with::
```
python tests/parity_reference/run_parity_reference.py --dataset <classification|regression|multiclass|probabilistic_regression>
```
Use --update to refresh golden fixtures after intentional changes to explanation outputs. This harness is the canonical regression gate for OSS parity checks.
Documented exemptions: Generated code, visualization golden files, and deprecated shims can be excluded via .coveragerc with explicit comments that describe the rationale and expiry date.
Public API guardrails: Coverage thresholds MUST continue to exercise the WrapCalibratedExplainer contract (fit/calibrate/explain/predict flows, plotting helpers, uncertainty/threshold options). No part of the published API may be marked as deprecated or excluded from coverage unless a future ADR redefines the contract.

Alternatives Considered¶

Status quo (Codecov dashboards only). Rejected because it allows silent regressions and does not give reviewers an actionable pass/fail signal.
Per-module 100% coverage. Rejected as unrealistic for plotting backends and third-party wrappers, potentially discouraging contributions.
Runtime smoke-only checks. Rejected; these do not measure statement coverage and fail to capture unexecuted branches in calibration math.

Consequences¶

Positive:

Quantitative gate keeps critical calibration logic exercised by tests before release.
Contributors receive immediate feedback locally and in CI when coverage slips.
Patch coverage guard discourages untested features while permitting incremental debt paydown.
OSS contributions are not blocked by legacy coverage gaps.

Negative/Risks:

Initial CI failures until legacy debt is addressed; requires remediation efforts.
Slightly longer test runtime from additional reporting/threshold checks.
Advisory-only enforcement can slow convergence without clear ownership.

Adoption & Migration¶

Land this ADR and announce during contributor sync and release notes.
Introduce a shared .coveragerc that encodes thresholds and named exemptions.
Update CI (test.yml) to run pytest --cov=src/calibrated_explanations --cov-report=xml \ --cov-report=term and pass the XML to Codecov with patch gating enabled; enforce --cov-fail-under=90 and per-path fail-under only on release/stable branches.
Add a make test-cov (or invoke via tox target) so developers can trigger the same checks locally; ensure the dev extra installs pytest-cov by default.
Complete remediation tasks outlined in the coverage improvement plan so that historical debt does not block adoption.

Open Questions¶

Cadence: Review and prune .coveragerc exemptions during the planning phase of each minor release (e.g., v0.10.0, v0.11.0).
Subpackage Thresholds: The critical-path list defined in the Decision section is sufficient for v1.0.0. Subpackage-specific thresholds are deferred to avoid excessive configuration maintenance.
Mutation Testing: Defer to v0.11+ or later. While valuable, it is not a blocking requirement for v1.0.0 stability.

Implementation Status¶

2025-10-06 – ADR accepted alongside the coverage remediation plan and baseline assessment.
v0.6.x – .coveragerc drafted with provisional exemptions and baseline metrics recorded to shape the remediation backlog while CI continues to run without fail-under gates.
v0.7.0 – CI introduces --cov-fail-under=80 with exit-zero preview reports, coverage dashboards are published, and contributor templates document the waiver workflow.
v0.8.0 – Critical-path modules (core, calibration, serialization, registry) are raised to ≥95% coverage, Codecov patch gating at ≥85% is advisory on mainline, and local tooling (make test-cov) mirrors the CI workflow.
v0.9.0 – Package-wide floor raised to ≥88%, waiver inventory trimmed, Codecov patch gating tightened to ≥88%, and coverage enforcement is blocking on release branches per the milestone gate while remaining advisory on mainline.
v1.0.0-rc – CI enforces the final ≥90% package floor on release branches, coverage dashboards become part of the release checklist, and branch protection rules require green coverage jobs before freeze.
v1.0.0 – Stable release maintains ≥90% gating with scheduled audits of exemptions and telemetry-driven monitoring to detect regressions ahead of v1.0.x maintenance updates.