# Terminology: thresholded vs probabilistic regression This page maps two terms used for the same CE regression mode. In user-facing docs and APIs, use `probabilistic regression`. In architecture and implementation discussions, `thresholded regression` may be used to describe the threshold mechanism. Both terms refer to regression with a non-`None` `threshold` query that returns calibrated event probabilities with interval bounds. For full guarantees, assumptions, explicit non-guarantees, and feature-level interval limits, use {doc}`calibrated_interval_semantics`. ## Use in docs - User-facing guides, quickstarts, notebooks, and API docs: `probabilistic regression` - ADRs and implementation details that discuss mechanism: `thresholded regression` - If both terms appear on one page, state once that they map to the same mode ## Section 1: Definition analysis ### 1.1 What does "probabilistic regression" mean? Definition source: `docs/foundations/concepts/probabilistic_regression.md`. It refers to regression predictions queried as probability events by threshold, for example `P(y <= t)` or interval events. The output is a calibrated event probability plus interval bounds. User-level API pattern: ```python probabilities, probability_interval = explainer.predict_proba( x_test[:1], threshold=150, uq_interval=True, ) ``` ### 1.2 What does "thresholded regression" mean? Definition source: `docs/improvement/adrs/ADR-021-calibrated-interval-semantics.md`. It is the same regression mode described from the implementation angle: - Regression output is queried through threshold events - CPS provides event scoring - Venn-Abers calibrates event probabilities Implementation path example from `IntervalRegressor.predict_probability()`: ```python # Converts regression predictions to probabilities by thresholding proba = self.split["cps"].predict(y_hat=..., y=y_threshold, ...) # Then calibrates with Venn-Abers va = VennAbers(None, (self.ce.y_cal[cal_va] <= y_threshold).astype(int), ...) ``` ### 1.3 Evidence of equivalence - ADR-021 section "Thresholded regression: CPS probabilities calibrated by Venn-Abers" explicitly describes the same path as the probabilistic regression flow. - Runtime API signal is identical: this mode is selected by providing `threshold`. ### 1.4 Why two terms exist | Aspect | "Thresholded regression" | "Probabilistic regression" | | --- | --- | --- | | Emphasis | Mechanism (threshold operation) | Output (calibrated probabilities) | | Primary audience | Architecture and implementation contributors | Practitioners and API users | | Typical context | ADRs, plugin internals, design notes | Quickstarts, concept guides, notebooks | ## Section 2: Terminology inventory ### 2.1 Representative "probabilistic regression" usage | File | Context | | --- | --- | | `README.md` | Feature and quickstart routing | | `docs/get-started/index.md` | Navigation and mode routing | | `docs/get-started/quickstart_regression.md` | Task workflow | | `docs/foundations/concepts/probabilistic_regression.md` | Dedicated concept page | | `notebooks/core_demos/demo_probabilistic_regression.ipynb` | End-to-end example | ### 2.2 Representative "thresholded regression" usage | File | Context | | --- | --- | | `docs/improvement/adrs/ADR-021-calibrated-interval-semantics.md` | Architecture semantics | | `docs/improvement/adrs/ADR-013-interval-calibrator-plugin-strategy.md` | Plugin strategy terminology | | `docs/improvement/legacy_user_api_contract.md` | Historical contract references | | `docs/foundations/governance/optional_telemetry.md` | Technical telemetry context | ### 2.3 Code usage patterns - Public call sites use `threshold=` as the mode switch. - Internal APIs include both `threshold` and `y_threshold` names. - Explanation containers expose probabilistic-regression state through `is_probabilistic_regression`. ## Section 3: Context-specific usage ### 3.1 User-facing documentation Preferred term: `probabilistic regression`. Reason: it communicates task intent and expected output without requiring implementation knowledge. ### 3.2 Technical architecture and implementation Preferred term: `thresholded regression` when discussing mechanics. Reason: it makes the threshold-event conversion explicit for contributors and plugin authors. ### 3.3 Evaluation and benchmarking Benchmark materials often use `thresholded regression` to separate this mode from percentile regression. ## Section 4: Historical issues and current state ### 4.1 Historical issue: missing explicit mapping Earlier materials mixed terms without a clear mapping statement. Current state: ADR-021 now includes explicit terminology guidance, and this page serves as the Tier 3 terminology reference route. ### 4.2 Historical issue: mixed naming in tests and comments Earlier test docstrings and comments mixed the terms without context labels. Current state: core user-facing naming is `probabilistic regression`; technical notes may still use `thresholded regression` when describing mechanism. ### 4.3 Ongoing risk Terminology drift can return when new docs are added quickly. Control: keep this mapping in Tier 3 reference pages and keep Tier 1 and Tier 2 pages mode-specific with short semantics routing to {doc}`calibrated_interval_semantics`. ## Section 5: Recommended terminology policy - Canonical user-facing term: `probabilistic regression` - Allowed technical mechanism term: `thresholded regression` - Do not present them as different modes - When uncertain, prefer user-facing term and add one parenthetical mechanism note if needed ## Section 6: Related references - {doc}`calibrated_interval_semantics` - `docs/improvement/adrs/ADR-021-calibrated-interval-semantics.md` - `docs/improvement/adrs/ADR-013-interval-calibrator-plugin-strategy.md` - {doc}`probabilistic_regression` - {doc}`terminology` Entry-point tier: Tier 3.