Status note (2025-12-22): Last edited 2025-12-22 · Archive after: Retain indefinitely as an engineering standard · Implementation window: Per Standard status (see Decision).
Standard-003: Test Coverage Standardization¶
Formerly ADR-019. Reclassified as an engineering standard to keep ADRs scoped to architectural or contract decisions.
Status: Active Date: 2025-10-06 Deciders: Core maintainers Reviewers: TBD Supersedes: None Superseded-by: None
Context¶
Pytest is the primary regression harness for the package, yet neither local defaults nor
continuous integration enforce minimum coverage. pytest.ini runs tests quietly without
loading pytest-cov, while the CI workflow executes pytest --cov=src/calibrated_explanations
without a --cov-fail-under threshold. Contributors are asked to target roughly 90% coverage,
with reports uploaded to Codecov, but there is no guardrail preventing significant regressions.
Legacy runtime modules (for example core.interval_regressor) remain effectively untested, so
confidence in calibration guarantees erodes as the code evolves.【F:pytest.ini†L1-L17】【F:.github/workflows/test.yml†L33-L49】【F:CONTRIBUTING.md†L49-L58】【F:src/calibrated_explanations/core/interval_regressor.py†L1-L120】
Decision¶
Adopt a layered coverage policy that couples numeric targets with risk-based exceptions, while right-sizing enforcement for OSS development:
Package-wide floor: Target 90% statement coverage across
src/calibrated_explanations. OSS/mainline CI reports the percentage but does not block. Release/stable branches enforce--cov-fail-under=90.Critical paths: Target 95% coverage on calibrated prediction helpers, interval regression, serialization, and plugin registries. OSS/mainline CI reports per-path coverage; release/stable branches enforce via
coverage report --fail-underper-path configuration.Change-based gating: Add a
coverage xmlstep and integrate the Codecov “patch coverage” gate at ≥88% for modified lines/files. This is advisory on OSS/mainline and blocking on release/stable branches. Pull requests that lower patch coverage below the threshold must justify waivers in the review checklist.Parity reference harness: Maintain canonical parity fixtures (factual, alternatives, fast, predictions) under
tests/parity_reference/and require the parity harness job to pass in CI. The canonical harness entrypoint istests/parity_reference/run_parity_reference.py; run it locally with::python tests/parity_reference/run_parity_reference.py --dataset <classification|regression|multiclass|probabilistic_regression>
Use
--updateto refresh golden fixtures after intentional changes to explanation outputs. This harness is the canonical regression gate for OSS parity checks.Documented exemptions: Generated code, visualization golden files, and deprecated shims can be excluded via
.coveragercwith explicit comments that describe the rationale and expiry date.Public API guardrails: Coverage thresholds MUST continue to exercise the WrapCalibratedExplainer contract (fit/calibrate/explain/predict flows, plotting helpers, uncertainty/threshold options). No part of the published API may be marked as deprecated or excluded from coverage unless a future ADR redefines the contract.
Alternatives Considered¶
Status quo (Codecov dashboards only). Rejected because it allows silent regressions and does not give reviewers an actionable pass/fail signal.
Per-module 100% coverage. Rejected as unrealistic for plotting backends and third-party wrappers, potentially discouraging contributions.
Runtime smoke-only checks. Rejected; these do not measure statement coverage and fail to capture unexecuted branches in calibration math.
Consequences¶
Positive:
Quantitative gate keeps critical calibration logic exercised by tests before release.
Contributors receive immediate feedback locally and in CI when coverage slips.
Patch coverage guard discourages untested features while permitting incremental debt paydown.
OSS contributions are not blocked by legacy coverage gaps.
Negative/Risks:
Initial CI failures until legacy debt is addressed; requires remediation efforts.
Slightly longer test runtime from additional reporting/threshold checks.
Advisory-only enforcement can slow convergence without clear ownership.
Adoption & Migration¶
Land this ADR and announce during contributor sync and release notes.
Introduce a shared
.coveragercthat encodes thresholds and named exemptions.Update CI (
test.yml) to runpytest --cov=src/calibrated_explanations --cov-report=xml \ --cov-report=termand pass the XML to Codecov with patch gating enabled; enforce--cov-fail-under=90and per-path fail-under only on release/stable branches.Add a
make test-cov(or invoke viatoxtarget) so developers can trigger the same checks locally; ensure the dev extra installspytest-covby default.Complete remediation tasks outlined in the coverage improvement plan so that historical debt does not block adoption.
Open Questions¶
Cadence: Review and prune
.coveragercexemptions during the planning phase of each minor release (e.g., v0.10.0, v0.11.0).Subpackage Thresholds: The critical-path list defined in the Decision section is sufficient for v1.0.0. Subpackage-specific thresholds are deferred to avoid excessive configuration maintenance.
Mutation Testing: Defer to v0.11+ or later. While valuable, it is not a blocking requirement for v1.0.0 stability.
Implementation Status¶
2025-10-06 – ADR accepted alongside the coverage remediation plan and baseline assessment.
v0.6.x –
.coveragercdrafted with provisional exemptions and baseline metrics recorded to shape the remediation backlog while CI continues to run without fail-under gates.v0.7.0 – CI introduces
--cov-fail-under=80with exit-zero preview reports, coverage dashboards are published, and contributor templates document the waiver workflow.v0.8.0 – Critical-path modules (
core, calibration, serialization, registry) are raised to ≥95% coverage, Codecov patch gating at ≥85% is advisory on mainline, and local tooling (make test-cov) mirrors the CI workflow.v0.9.0 – Package-wide floor raised to ≥88%, waiver inventory trimmed, Codecov patch gating tightened to ≥88%, and coverage enforcement is blocking on release branches per the milestone gate while remaining advisory on mainline.
v1.0.0-rc – CI enforces the final ≥90% package floor on release branches, coverage dashboards become part of the release checklist, and branch protection rules require green coverage jobs before freeze.
v1.0.0 – Stable release maintains ≥90% gating with scheduled audits of exemptions and telemetry-driven monitoring to detect regressions ahead of v1.0.x maintenance updates.