> **Status note (2025-12-22):** Last edited 2025-12-22 · Archive after: Retain indefinitely as an engineering standard · Implementation window: Per Standard status (see Decision). # Standard-003: Test Coverage Standardization Formerly ADR-019. Reclassified as an engineering standard to keep ADRs scoped to architectural or contract decisions. Status: Active Date: 2025-10-06 Deciders: Core maintainers Reviewers: TBD Supersedes: None Superseded-by: None ## Context Pytest is the primary regression harness for the package, yet neither local defaults nor continuous integration enforce minimum coverage. `pytest.ini` runs tests quietly without loading `pytest-cov`, while the CI workflow executes `pytest --cov=src/calibrated_explanations` without a `--cov-fail-under` threshold. Contributors are asked to target roughly 90% coverage, with reports uploaded to Codecov, but there is no guardrail preventing significant regressions. Legacy runtime modules (for example `core.interval_regressor`) remain effectively untested, so confidence in calibration guarantees erodes as the code evolves.【F:pytest.ini†L1-L17】【F:.github/workflows/test.yml†L33-L49】【F:CONTRIBUTING.md†L49-L58】【F:src/calibrated_explanations/core/interval_regressor.py†L1-L120】 ## Decision Adopt a layered coverage policy that couples numeric targets with risk-based exceptions, while right-sizing enforcement for OSS development: - **Package-wide floor:** Target **90% statement coverage** across `src/calibrated_explanations`. OSS/mainline CI reports the percentage but does not block. Release/stable branches enforce `--cov-fail-under=90`. - **Critical paths:** Target **95% coverage** on calibrated prediction helpers, interval regression, serialization, and plugin registries. OSS/mainline CI reports per-path coverage; release/stable branches enforce via `coverage report --fail-under` per-path configuration. - **Change-based gating:** Add a `coverage xml` step and integrate the Codecov “patch coverage” gate at **≥88%** for modified lines/files. This is advisory on OSS/mainline and blocking on release/stable branches. Pull requests that lower patch coverage below the threshold must justify waivers in the review checklist. - **Parity reference harness:** Maintain canonical parity fixtures (factual, alternatives, fast, predictions) under `tests/parity_reference/` and require the parity harness job to pass in CI. The canonical harness entrypoint is `tests/parity_reference/run_parity_reference.py`; run it locally with:: python tests/parity_reference/run_parity_reference.py --dataset Use `--update` to refresh golden fixtures after intentional changes to explanation outputs. This harness is the canonical regression gate for OSS parity checks. - **Documented exemptions:** Generated code, visualization golden files, and deprecated shims can be excluded via `.coveragerc` with explicit comments that describe the rationale and expiry date. - **Public API guardrails:** Coverage thresholds MUST continue to exercise the WrapCalibratedExplainer contract (fit/calibrate/explain/predict flows, plotting helpers, uncertainty/threshold options). No part of the published API may be marked as deprecated or excluded from coverage unless a future ADR redefines the contract. ## Alternatives Considered 1. **Status quo (Codecov dashboards only).** Rejected because it allows silent regressions and does not give reviewers an actionable pass/fail signal. 2. **Per-module 100% coverage.** Rejected as unrealistic for plotting backends and third-party wrappers, potentially discouraging contributions. 3. **Runtime smoke-only checks.** Rejected; these do not measure statement coverage and fail to capture unexecuted branches in calibration math. ## Consequences Positive: - Quantitative gate keeps critical calibration logic exercised by tests before release. - Contributors receive immediate feedback locally and in CI when coverage slips. - Patch coverage guard discourages untested features while permitting incremental debt paydown. - OSS contributions are not blocked by legacy coverage gaps. Negative/Risks: - Initial CI failures until legacy debt is addressed; requires remediation efforts. - Slightly longer test runtime from additional reporting/threshold checks. - Advisory-only enforcement can slow convergence without clear ownership. ## Adoption & Migration 1. Land this ADR and announce during contributor sync and release notes. 2. Introduce a shared `.coveragerc` that encodes thresholds and named exemptions. 3. Update CI (`test.yml`) to run `pytest --cov=src/calibrated_explanations --cov-report=xml \ --cov-report=term` and pass the XML to Codecov with patch gating enabled; enforce `--cov-fail-under=90` and per-path fail-under only on release/stable branches. 4. Add a `make test-cov` (or invoke via `tox` target) so developers can trigger the same checks locally; ensure the dev extra installs `pytest-cov` by default. 5. Complete remediation tasks outlined in the coverage improvement plan so that historical debt does not block adoption. ## Open Questions - **Cadence:** Review and prune `.coveragerc` exemptions during the planning phase of each minor release (e.g., v0.10.0, v0.11.0). - **Subpackage Thresholds:** The critical-path list defined in the Decision section is sufficient for v1.0.0. Subpackage-specific thresholds are deferred to avoid excessive configuration maintenance. - **Mutation Testing:** Defer to v0.11+ or later. While valuable, it is not a blocking requirement for v1.0.0 stability. ## Implementation Status - 2025-10-06 – ADR accepted alongside the coverage remediation plan and baseline assessment. - v0.6.x – `.coveragerc` drafted with provisional exemptions and baseline metrics recorded to shape the remediation backlog while CI continues to run without fail-under gates. - v0.7.0 – CI introduces `--cov-fail-under=80` with exit-zero preview reports, coverage dashboards are published, and contributor templates document the waiver workflow. - v0.8.0 – Critical-path modules (`core`, calibration, serialization, registry) are raised to ≥95% coverage, Codecov patch gating at ≥85% is advisory on mainline, and local tooling (`make test-cov`) mirrors the CI workflow. - v0.9.0 – Package-wide floor raised to ≥88%, waiver inventory trimmed, Codecov patch gating tightened to ≥88%, and coverage enforcement is blocking on release branches per the milestone gate while remaining advisory on mainline. - v1.0.0-rc – CI enforces the final ≥90% package floor on release branches, coverage dashboards become part of the release checklist, and branch protection rules require green coverage jobs before freeze. - v1.0.0 – Stable release maintains ≥90% gating with scheduled audits of exemptions and telemetry-driven monitoring to detect regressions ahead of v1.0.x maintenance updates.