Terminology: thresholded vs probabilistic regression¶

This page maps two terms used for the same CE regression mode.

In user-facing docs and APIs, use probabilistic regression. In architecture and implementation discussions, thresholded regression may be used to describe the threshold mechanism.

Both terms refer to regression with a non-None threshold query that returns calibrated event probabilities with interval bounds.

For full guarantees, assumptions, explicit non-guarantees, and feature-level interval limits, use Calibrated interval semantics.

Use in docs¶

User-facing guides, quickstarts, notebooks, and API docs: probabilistic regression
ADRs and implementation details that discuss mechanism: thresholded regression
If both terms appear on one page, state once that they map to the same mode

Section 1: Definition analysis¶

1.1 What does “probabilistic regression” mean?¶

Definition source: docs/foundations/concepts/probabilistic_regression.md.

It refers to regression predictions queried as probability events by threshold, for example P(y <= t) or interval events. The output is a calibrated event probability plus interval bounds.

User-level API pattern:

probabilities, probability_interval = explainer.predict_proba(
    x_test[:1],
    threshold=150,
    uq_interval=True,
)

1.2 What does “thresholded regression” mean?¶

Definition source: docs/improvement/adrs/ADR-021-calibrated-interval-semantics.md.

It is the same regression mode described from the implementation angle:

Regression output is queried through threshold events
CPS provides event scoring
Venn-Abers calibrates event probabilities

Implementation path example from IntervalRegressor.predict_probability():

# Converts regression predictions to probabilities by thresholding
proba = self.split["cps"].predict(y_hat=..., y=y_threshold, ...)
# Then calibrates with Venn-Abers
va = VennAbers(None, (self.ce.y_cal[cal_va] <= y_threshold).astype(int), ...)

1.3 Evidence of equivalence¶

ADR-021 section “Thresholded regression: CPS probabilities calibrated by Venn-Abers” explicitly describes the same path as the probabilistic regression flow.
Runtime API signal is identical: this mode is selected by providing threshold.

1.4 Why two terms exist¶

Aspect	“Thresholded regression”	“Probabilistic regression”
Emphasis	Mechanism (threshold operation)	Output (calibrated probabilities)
Primary audience	Architecture and implementation contributors	Practitioners and API users
Typical context	ADRs, plugin internals, design notes	Quickstarts, concept guides, notebooks

Section 2: Terminology inventory¶

2.1 Representative “probabilistic regression” usage¶

File	Context
`README.md`	Feature and quickstart routing
`docs/get-started/index.md`	Navigation and mode routing
`docs/get-started/quickstart_regression.md`	Task workflow
`docs/foundations/concepts/probabilistic_regression.md`	Dedicated concept page
`notebooks/core_demos/demo_probabilistic_regression.ipynb`	End-to-end example

2.2 Representative “thresholded regression” usage¶

File	Context
`docs/improvement/adrs/ADR-021-calibrated-interval-semantics.md`	Architecture semantics
`docs/improvement/adrs/ADR-013-interval-calibrator-plugin-strategy.md`	Plugin strategy terminology
`docs/improvement/legacy_user_api_contract.md`	Historical contract references
`docs/foundations/governance/optional_telemetry.md`	Technical telemetry context

2.3 Code usage patterns¶

Public call sites use threshold= as the mode switch.
Internal APIs include both threshold and y_threshold names.
Explanation containers expose probabilistic-regression state through is_probabilistic_regression.

Section 3: Context-specific usage¶

3.1 User-facing documentation¶

Preferred term: probabilistic regression.

Reason: it communicates task intent and expected output without requiring implementation knowledge.

3.2 Technical architecture and implementation¶

Preferred term: thresholded regression when discussing mechanics.

Reason: it makes the threshold-event conversion explicit for contributors and plugin authors.

3.3 Evaluation and benchmarking¶

Benchmark materials often use thresholded regression to separate this mode from percentile regression.

Section 4: Historical issues and current state¶

4.1 Historical issue: missing explicit mapping¶

Earlier materials mixed terms without a clear mapping statement.

Current state: ADR-021 now includes explicit terminology guidance, and this page serves as the Tier 3 terminology reference route.

4.2 Historical issue: mixed naming in tests and comments¶

Earlier test docstrings and comments mixed the terms without context labels.

Current state: core user-facing naming is probabilistic regression; technical notes may still use thresholded regression when describing mechanism.

4.3 Ongoing risk¶

Terminology drift can return when new docs are added quickly.

Control: keep this mapping in Tier 3 reference pages and keep Tier 1 and Tier 2 pages mode-specific with short semantics routing to Calibrated interval semantics.

Section 5: Recommended terminology policy¶

Canonical user-facing term: probabilistic regression
Allowed technical mechanism term: thresholded regression
Do not present them as different modes
When uncertain, prefer user-facing term and add one parenthetical mechanism note if needed