Terminology: thresholded vs probabilistic regression¶
This page maps two terms used for the same CE regression mode.
In user-facing docs and APIs, use probabilistic regression. In architecture and implementation discussions, thresholded regression may be used to describe the threshold mechanism.
Both terms refer to regression with a non-None threshold query that returns calibrated event probabilities with interval bounds.
For full guarantees, assumptions, explicit non-guarantees, and feature-level interval limits, use Calibrated interval semantics.
Use in docs¶
User-facing guides, quickstarts, notebooks, and API docs:
probabilistic regressionADRs and implementation details that discuss mechanism:
thresholded regressionIf both terms appear on one page, state once that they map to the same mode
Section 1: Definition analysis¶
1.1 What does “probabilistic regression” mean?¶
Definition source: docs/foundations/concepts/probabilistic_regression.md.
It refers to regression predictions queried as probability events by threshold, for example P(y <= t) or interval events. The output is a calibrated event probability plus interval bounds.
User-level API pattern:
probabilities, probability_interval = explainer.predict_proba(
x_test[:1],
threshold=150,
uq_interval=True,
)
1.2 What does “thresholded regression” mean?¶
Definition source: docs/improvement/adrs/ADR-021-calibrated-interval-semantics.md.
It is the same regression mode described from the implementation angle:
Regression output is queried through threshold events
CPS provides event scoring
Venn-Abers calibrates event probabilities
Implementation path example from IntervalRegressor.predict_probability():
# Converts regression predictions to probabilities by thresholding
proba = self.split["cps"].predict(y_hat=..., y=y_threshold, ...)
# Then calibrates with Venn-Abers
va = VennAbers(None, (self.ce.y_cal[cal_va] <= y_threshold).astype(int), ...)
1.3 Evidence of equivalence¶
ADR-021 section “Thresholded regression: CPS probabilities calibrated by Venn-Abers” explicitly describes the same path as the probabilistic regression flow.
Runtime API signal is identical: this mode is selected by providing
threshold.
1.4 Why two terms exist¶
Aspect |
“Thresholded regression” |
“Probabilistic regression” |
|---|---|---|
Emphasis |
Mechanism (threshold operation) |
Output (calibrated probabilities) |
Primary audience |
Architecture and implementation contributors |
Practitioners and API users |
Typical context |
ADRs, plugin internals, design notes |
Quickstarts, concept guides, notebooks |
Section 2: Terminology inventory¶
2.1 Representative “probabilistic regression” usage¶
File |
Context |
|---|---|
|
Feature and quickstart routing |
|
Navigation and mode routing |
|
Task workflow |
|
Dedicated concept page |
|
End-to-end example |
2.2 Representative “thresholded regression” usage¶
File |
Context |
|---|---|
|
Architecture semantics |
|
Plugin strategy terminology |
|
Historical contract references |
|
Technical telemetry context |
2.3 Code usage patterns¶
Public call sites use
threshold=as the mode switch.Internal APIs include both
thresholdandy_thresholdnames.Explanation containers expose probabilistic-regression state through
is_probabilistic_regression.
Section 3: Context-specific usage¶
3.1 User-facing documentation¶
Preferred term: probabilistic regression.
Reason: it communicates task intent and expected output without requiring implementation knowledge.
3.2 Technical architecture and implementation¶
Preferred term: thresholded regression when discussing mechanics.
Reason: it makes the threshold-event conversion explicit for contributors and plugin authors.
3.3 Evaluation and benchmarking¶
Benchmark materials often use thresholded regression to separate this mode from percentile regression.
Section 4: Historical issues and current state¶
4.1 Historical issue: missing explicit mapping¶
Earlier materials mixed terms without a clear mapping statement.
Current state: ADR-021 now includes explicit terminology guidance, and this page serves as the Tier 3 terminology reference route.
4.2 Historical issue: mixed naming in tests and comments¶
Earlier test docstrings and comments mixed the terms without context labels.
Current state: core user-facing naming is probabilistic regression; technical notes may still use thresholded regression when describing mechanism.
4.3 Ongoing risk¶
Terminology drift can return when new docs are added quickly.
Control: keep this mapping in Tier 3 reference pages and keep Tier 1 and Tier 2 pages mode-specific with short semantics routing to Calibrated interval semantics.
Section 5: Recommended terminology policy¶
Canonical user-facing term:
probabilistic regressionAllowed technical mechanism term:
thresholded regressionDo not present them as different modes
When uncertain, prefer user-facing term and add one parenthetical mechanism note if needed