Satisfying EU AI Act Requirements with `calibrated_explanations`¶

[!WARNING] LEGAL DISCLAIMER — NO COMPLIANCE GUARANTEE

This document and this library do not constitute legal advice and carry no guarantee that any regulatory authority or legal entity will regard the library’s outputs as satisfying any statutory obligation. Whether a deployment satisfies the EU AI Act (Regulation (EU) 2024/1689) or any other law is a legal determination requiring qualified legal counsel. The library is provided “as is” under the BSD 3-Clause licence. The authors and maintainers expressly disclaim all liability arising from reliance on this document.

Document scope: This guide targets practitioners and compliance officers who deploy machine learning models in contexts regulated by Regulation (EU) 2024/1689 (the EU AI Act). It maps each relevant article to the concrete capabilities of calibrated_explanations and provides a verifiable implementation checklist.

1. Executive Summary¶

The EU AI Act (Regulation (EU) 2024/1689) imposes legally enforceable obligations on providers and deployers of high-risk AI systems, including requirements for transparency, uncertainty documentation, human oversight, and the right of affected persons to receive meaningful explanations. Accuracy metrics alone — precision, recall, AUC — do not satisfy these obligations. A model that achieves 95 % accuracy on a held-out test set still cannot, by itself, tell a regulator why a specific individual was denied credit, how confident the system was at the time of that decision, or what change to the individual’s circumstances would have produced a different outcome. The AI Act requires precisely this per-instance, traceable, uncertainty-aware information.

calibrated_explanations addresses this gap in a single, scikit-learn-compatible framework. Built on Venn-Abers calibration and conformal prediction, it delivers three complementary outputs: (1) factual rule tables — human-readable per-instance feature attributions with statistically valid uncertainty intervals, (2) counterfactual alternatives — minimal actionable changes that would flip or shift the prediction, and (3) calibrated probability intervals — empirically valid coverage-guarantee bounds that can be used to trigger human oversight when uncertainty is unacceptably high. These outputs map directly onto the transparency, documentation, human oversight, accuracy, and fairness articles of the AI Act.

Deploying calibrated_explanations alongside an existing model requires no architectural changes to the model itself. It wraps the model via WrapCalibratedExplainer, calibrates it against a held-out calibration set, and exposes a consistent API for generating and serialising explanation payloads. The serialised payloads provide machine-readable audit evidence suitable for storage in an immutable log and for submission to notified bodies or market surveillance authorities.

2. Scope and Applicability¶

2.1 AI Act Risk Tiers Addressed¶

The AI Act defines four risk tiers. This document focuses on high-risk AI systems as defined in Article 6 and Annex III, which include systems used in:

Annex III Area	Example use cases
1 — Biometric identification	Identity verification, access control
2 — Critical infrastructure	Grid management, transport
3 — Education and vocational training	Admissions scoring, assessment
4 — Employment and workers management	CV screening, performance monitoring
5 — Access to essential services	Credit scoring, insurance underwriting
5 — Law enforcement	Recidivism risk, threat detection
6 — Migration and border control	Asylum processing, risk profiling
7 — Administration of justice	Judicial decision support

calibrated_explanations applies to any ML model operating in these areas. The explanation and uncertainty capabilities described here also benefit limited-risk systems subject to Art. 50 transparency obligations.

2.2 Supported Model Types and Tasks¶

Task	CE support
Binary classification	Full — `explain_factual`, `explore_alternatives`, `predict_proba`
Multi-class classification	Full — per-class factual rules and alternatives
Regression	Full — prediction intervals, threshold-based probabilistic explanations

Supported base estimators: any scikit-learn-compatible model (including gradient boosting libraries such as XGBoost, LightGBM, and CatBoost via sklearn wrappers).

2.3 Prerequisites¶

A trained, scikit-learn-compatible model (predict / predict_proba interface).
A calibration set (x_cal, y_cal) — held-out data not used during training, recommended size ≥ 200 instances for reliable interval coverage.
A proper training set (x_proper, y_proper) — used only for fitting the internal calibration layer.
Python ≥ 3.9 and calibrated_explanations installed (pip install calibrated_explanations).

3. Article-by-Article Compliance Mapping¶

Art. 9 — Risk Management System¶

a) Title (verbatim): Article 9 — Risk management system

b) Core obligation: Providers of high-risk AI systems must establish, implement, document, and maintain a risk management system that identifies, analyses, and evaluates the reasonably foreseeable risks that the system may pose, and takes appropriate mitigation measures throughout the lifecycle (Art. 9(1)–(6)).

c) How calibrated_explanations contributes: Uncertainty quantification is a prerequisite for rational risk management. predict_proba with uq_interval=True provides empirically valid lower and upper probability bounds for every prediction. Wide intervals signal cases where the model’s evidence base is weak — these are exactly the cases that pose elevated risk and that a risk management system must identify and route to additional controls. The interval width can be used as a quantitative risk indicator in the system’s risk register.

d) Code:

from calibrated_explanations import WrapCalibratedExplainer

explainer = WrapCalibratedExplainer(model)
explainer.fit(x_proper, y_proper)
explainer.calibrate(x_cal, y_cal, feature_names=feature_names)

# Compute prediction with uncertainty interval
probs, (low, high) = explainer.predict_proba(X_query, uq_interval=True)
interval_width = high - low

# Risk flag: interval wider than tolerance threshold
RISK_THRESHOLD = 0.30
high_risk_mask = interval_width[:, 1] > RISK_THRESHOLD

e) Audit documentation: Log interval_width and high_risk_mask per case. Include the calibration set size and the chosen RISK_THRESHOLD in the AI system’s technical documentation (Annex IV, §1(c)).

Art. 10 — Data and Data Governance¶

a) Title (verbatim): Article 10 — Data and data governance

b) Core obligation: High-risk AI systems must be trained on data that is relevant, representative, and free from errors, and providers must take appropriate measures to examine data for biases that could give rise to risks within the meaning of Art. 9(2) (Art. 10(2)(f)–(g)). Where relevant, group-specific bias and accuracy disparities must be examined and mitigated.

c) How calibrated_explanations contributes: Mondrian (conditional) calibration via the bins parameter on explain_factual, explore_alternatives, or predict_proba generates group-specific prediction intervals. By partitioning the calibration set by protected attribute (age group, gender, nationality), the practitioner obtains per-group coverage and interval width statistics. Wider intervals for a particular sub-group are a direct, quantitative signal of reduced prediction reliability for that group, satisfying the obligation to examine bias risks.

d) Code:

import numpy as np

# group_labels: array-like of group identifiers for each calibration instance
# e.g., derived from a protected demographic attribute
group_labels_cal = x_cal[:, protected_feature_idx]
group_labels_test = X_query[:, protected_feature_idx]

# Mondrian-calibrated factual explanation — separate intervals per group
factual = explainer.explain_factual(X_query, bins=group_labels_test)

# Alternatively, at the probability level
probs, (low, high) = explainer.predict_proba(
    X_query, uq_interval=True, bins=group_labels_test
)

# Inspect per-group interval widths as bias proxy
for group in np.unique(group_labels_test):
    mask = group_labels_test == group
    width = (high[:, 1] - low[:, 1])[mask].mean()
    print(f"Group {group}: mean interval width = {width:.3f}")

e) Audit documentation: Record per-group mean interval width and coverage estimates in the data governance section of the technical documentation. A statistically significant disparity across groups should be flagged and the mitigation strategy documented.

Art. 11 and Annex IV — Technical Documentation¶

a) Title (verbatim): Article 11 — Technical documentation; Annex IV — Technical documentation referred to in Article 11(1)

b) Core obligation: Before placing a high-risk AI system on the market, providers must draw up technical documentation demonstrating compliance. Annex IV specifies that this documentation must include, among other items: a description of the system’s accuracy, robustness and cybersecurity measures; the monitoring, functioning and control mechanisms; and the measures taken to enable traceability and auditability (Annex IV §1(c), §1(e), §2(c)).

c) How calibrated_explanations contributes: calibrated_explanations produces structured, serialisable explanation objects that encode the model metadata, calibration configuration, feature names, per-instance factual rules, and uncertainty intervals. The JSON payload produced by as_json() provides a machine-readable record that satisfies the traceability requirement. The calibration configuration (calibration set size, conformal significance level, whether Mondrian binning is used) should be included verbatim in the Annex IV technical documentation.

d) Code:

# Generate explanation and serialise as JSON for technical documentation
explanation = explainer.explain_factual(X_query)

# JSON payload — suitable for archiving in immutable audit log
payload = explanation[0].as_json()  # single-instance payload
print(payload)
# Contains: feature names, values, weights, intervals, prediction, probability

# For a multi-row batch, iterate
import json, pathlib
log_path = pathlib.Path("audit_logs") / "explanations.jsonl"
log_path.parent.mkdir(exist_ok=True)
with log_path.open("a") as f:
    for exp in explanation:
        f.write(json.dumps(exp.as_json()) + "\n")

e) Audit documentation: The technical documentation must reference the calibrated_explanations version (retrievable via import calibrated_explanations; calibrated_explanations.__version__), the calibration set provenance, and the significance level (confidence) used. Store the JSON schema of the explanation payload as Annex IV §3 supporting artefact.

Art. 12 — Record-keeping and Logging¶

a) Title (verbatim): Article 12 — Record-keeping

b) Core obligation: High-risk AI systems must automatically log events relevant to the system’s operation, including dates and times of use, reference to the input data, and the results of the AI system’s operation, for a period appropriate to the intended purpose (Art. 12(1)–(2)).

c) How calibrated_explanations contributes: as_json() produces a complete, self-contained record of every prediction event, including the input vector, the predicted class or value, the calibrated probability, the uncertainty interval, and the factual feature-weight rule table. Writing this payload to an append-only log (JSONL file, database, or audit service) satisfies the logging obligation without requiring a separate logging infrastructure.

d) Code:

import json, datetime, pathlib, uuid

audit_log = pathlib.Path("audit_logs") / "ai_act_record.jsonl"
audit_log.parent.mkdir(exist_ok=True)

explanation = explainer.explain_factual(X_query)

with audit_log.open("a", encoding="utf-8") as f:
    for i, exp in enumerate(explanation):
        record = {
            "record_id": str(uuid.uuid4()),
            "timestamp": datetime.datetime.utcnow().isoformat() + "Z",
            "model_version": model_version,
            "ce_version": calibrated_explanations.__version__,
            "input_index": i,
            "explanation": exp.as_json(),
        }
        f.write(json.dumps(record) + "\n")

e) Audit documentation: Define and document the log retention policy (Art. 12(2) specifies a period proportionate to the purpose; credit decisions commonly require ≥ 3 years). Include the log schema and access-control policy in the AI system’s technical documentation.

Art. 13 — Transparency and Provision of Information to Deployers¶

a) Title (verbatim): Article 13 — Transparency and provision of information to deployers

b) Core obligation: High-risk AI systems must be designed and developed to ensure that their operation is sufficiently transparent to enable deployers to interpret the system’s outputs and use them appropriately (Art. 13(1)). The instructions for use must include information about the system’s capabilities and limitations (Art. 13(3)(b)).

c) How calibrated_explanations contributes: explain_factual returns a human-readable factual rule table for each instance: a ranked list of features together with their actual values, the direction and magnitude of their contribution to the prediction, and the uncertainty interval around each weight. This output is directly interpretable by a non-technical deployer — the rule reads, in natural language, as “because feature X had value V, the probability of outcome Y increased by W [±Δ]”. The rule table constitutes the “sufficiently transparent” output required by Art. 13(1).

d) Code:

# Per-instance factual explanation
factual = explainer.explain_factual(X_query)

# Print human-readable rule table for the first instance
factual[0].print_rules()
# Example output:
#   age = 42         →  P(approve) increased by 0.12 [0.07, 0.18]
#   income = 55000   →  P(approve) increased by 0.09 [0.04, 0.15]
#   debt_ratio = 0.4 →  P(approve) decreased by 0.05 [0.01, 0.10]

# Access structured rule data programmatically
rules = factual[0].as_json()

e) Audit documentation: Log the full rule table for every production prediction event. The instructions for use (Art. 13(3)) should describe the rule-table format and explain how deployers should interpret uncertainty intervals, including the guidance that predictions with wide intervals require additional review.

Art. 14 — Human Oversight¶

a) Title (verbatim): Article 14 — Human oversight

b) Core obligation: High-risk AI systems must be designed and developed to allow the natural persons to whom human oversight is assigned to effectively oversee the system’s operation. The system must enable those persons to decide not to use the system or to override, disregard, or reverse its output (Art. 14(1), (4)(c)).

c) How calibrated_explanations contributes: The interval straddle policy and interval width policy (both built-in decision-policy patterns) provide automatic, principled triggers for routing predictions to human review. When the calibrated probability interval straddles the decision boundary, or when the interval width exceeds a pre-defined tolerance, the system flags the case for human oversight before any automated decision is taken. This is not a post-hoc safeguard; it is an inline gate that prevents the automated system from acting on uncertain predictions without human confirmation.

d) Code:

from calibrated_explanations.core.reject.policy import RejectPolicy

probs, (low, high) = explainer.predict_proba(X_query, uq_interval=True)
decision_boundary = 0.5

# Straddle check: probability interval crosses the decision boundary
straddles = (low[:, 1] < decision_boundary) & (high[:, 1] > decision_boundary)

# Width check: absolute uncertainty too large
MAX_WIDTH = 0.25
too_uncertain = (high[:, 1] - low[:, 1]) > MAX_WIDTH

# Combined human-oversight gate
needs_human_review = straddles | too_uncertain

for idx in X_query[needs_human_review]:
    escalate_to_human_reviewer(idx, reason="uncertainty_gate")

# Alternatively, use built-in RejectPolicy via explain_factual
factual = explainer.explain_factual(X_query, reject_policy=RejectPolicy.FLAG)

e) Audit documentation: Log every escalation event with the case identifier, the triggering condition (straddle, width, or both), the interval values, and the human reviewer’s identity and decision. This log constitutes the human oversight audit trail required by Art. 14 and Recital 58.

Art. 15 — Accuracy, Robustness and Cybersecurity¶

a) Title (verbatim): Article 15 — Accuracy, robustness and cybersecurity

b) Core obligation: High-risk AI systems must achieve an appropriate level of accuracy, robustness, and consistency of performance. Accuracy metrics must be declared in the technical documentation. Providers must document the expected level of accuracy and the measures taken to achieve it (Art. 15(1)–(2)).

c) How calibrated_explanations contributes: Venn-Abers calibration and conformal prediction deliver empirical coverage guarantees: the stated significance level (e.g., 90 % confidence) corresponds to a statistically verifiable empirical coverage on the calibration set. This is a stronger claim than a point accuracy metric — it is a certifiable bound on prediction interval validity. The calibration error (difference between nominal and empirical coverage) can be measured and reported as a quantitative accuracy declaration for Annex IV.

d) Code:

# Measure empirical coverage on a held-out validation set
probs, (low, high) = explainer.predict_proba(x_val, uq_interval=True)
nominal_confidence = 0.90  # passed as `confidence` at calibrate() time

# For classification: empirical coverage = fraction of true labels within interval
in_interval = (low[range(len(y_val)), y_val] <= probs[range(len(y_val)), y_val]) & \
              (probs[range(len(y_val)), y_val] <= high[range(len(y_val)), y_val])
empirical_coverage = in_interval.mean()

print(f"Nominal confidence: {nominal_confidence:.2%}")
print(f"Empirical coverage: {empirical_coverage:.2%}")
# A well-calibrated system: empirical ≈ nominal (difference < 2 pp)

e) Audit documentation: Report both nominal confidence and empirical coverage in the technical documentation’s accuracy section. Repeat this measurement for each sub-population identified in the data governance section (Art. 10), and for each model version deployed. Include the calibration set size and split procedure (stratified vs. random) as Annex IV §1(c) artefacts.

Art. 50 — Transparency Obligations for Certain AI Systems¶

a) Title (verbatim): Article 50 — Transparency obligations for providers and deployers of certain AI systems

b) Core obligation: Where an AI system is used to make or assist in making decisions that significantly affect persons, those persons must be informed of their right to receive an explanation of the decision (Art. 50(1)). The explanation must be meaningful, intelligible, and in plain language (Recital 47).

c) How calibrated_explanations contributes: explore_alternatives generates counterfactual (alternative) explanations: the minimal set of feature changes that would produce a different or more favourable prediction. These are directly usable as the Art. 50 “explanation to the individual” — they communicate not only why the current decision was made but what the individual could change to obtain a different outcome. The output is inherently actionable, non-discriminatory (it proposes changes only to features that are within the individual’s control, when properly configured), and expressible in plain language.

d) Code:

# Generate counterfactual alternatives for the affected individual
alternatives = explainer.explore_alternatives(X_query)

# Print actionable alternatives for person at index 0
alternatives[0].print_rules()
# Example output:
#   IF income increased from 38000 → 52000 THEN P(approve) = 0.73 [0.65, 0.81]
#   IF debt_ratio decreased from 0.55 → 0.35 THEN P(approve) = 0.68 [0.59, 0.76]

# Serialise for delivery to the affected person / inclusion in decision letter
alt_json = alternatives[0].as_json()

e) Audit documentation: Whenever a significant automated or assisted decision is taken, log the full explore_alternatives output for the affected individual. This constitutes the “explanation record” that must be made available on request (Art. 50, Recital 47). The log entry must include the instance identifier, the final decision, the probability and interval, and the alternative rules.

4. Compliance Checklist¶

Article	Obligation summary	CE feature / method	Status
Art. 9	Identify and quantify AI system risks throughout lifecycle	`predict_proba(uq_interval=True)` — interval width as risk indicator	Covered
Art. 10	Examine training data for bias; ensure representativeness across groups	`explain_factual(bins=group_labels)` — Mondrian per-group intervals	Covered
Art. 11 + Annex IV	Produce technical documentation including accuracy and traceability	`as_json()` — serialisable explanation payload; calibration configuration	Covered
Art. 12	Log events: input data, result, date/time — for defined retention period	`as_json()` appended to audit log with timestamp and record ID	Partially covered — log infrastructure and retention policy must be provided by deployer
Art. 13	Transparent operation; deployer can interpret system outputs	`explain_factual()` — `print_rules()` human-readable rule table	Covered
Art. 14	Enable human oversight; allow override before automated action	`predict_proba` straddle/width gates; `RejectPolicy.FLAG`	Covered
Art. 15	Declared, verifiable accuracy; robustness documentation	Empirical coverage vs. nominal confidence (Venn-Abers guarantee)	Covered
Art. 50	Right to explanation; inform affected persons in plain language	`explore_alternatives()` — actionable counterfactual rules	Covered

5. Limitations and Gaps¶

The following compliance obligations are not satisfied by calibrated_explanations alone and require additional organisational or technical controls.

5.1 Data Quality Certification (Art. 10(2)(a)–(e))¶

Art. 10 requires that training data satisfies relevance, representativeness, error-freedom, and completeness criteria. calibrated_explanations does not inspect, validate, or certify input data quality. Required: A data quality framework (e.g., Great Expectations, Soda, or a custom data validation pipeline) upstream of model training, with documented data quality test results.

5.2 Cybersecurity Controls (Art. 15(3)–(5))¶

Art. 15 requires resilience against adversarial manipulation, data poisoning, and model-evasion attacks. calibrated_explanations does not provide adversarial robustness defences. Required: Adversarial testing (e.g., using libraries such as adversarial-robustness-toolbox) and network/access controls protecting the model serving endpoint.

5.3 Conformity Assessment and CE Marking (Art. 43–49)¶

High-risk AI systems must undergo a conformity assessment before market placement. calibrated_explanations provides technical evidence for the assessment but is not itself a conformity assessment tool. Required: Engagement with a notified body (where applicable) or internal conformity assessment process, documentation in accordance with Art. 11 and Annex IV, and registration in the EU AI Act database (Art. 49).

5.4 Post-Market Monitoring (Art. 72)¶

Art. 72 requires providers to operate a post-market monitoring system that continuously collects data on system performance after deployment. calibrated_explanations does not provide drift detection, performance dashboards, or monitoring alerts. Required: A model monitoring system (e.g., Evidently AI, WhyLabs, or a custom pipeline) that tracks empirical coverage, interval widths, and prediction distributions over time, and feeds back into the risk management system (Art. 9).

5.5 Human Oversight Processes Outside the Model (Art. 14(4)(a)–(d))¶

Art. 14 requires that humans assigned to oversight are sufficiently competent to understand the system’s capabilities and limitations, and that their decisions are logged. calibrated_explanations can flag cases for review but cannot train reviewers, enforce review quality, or impose a review workflow. Required: Documented escalation procedures, reviewer training, and a workflow management system that records reviewer decisions and links them back to the CE audit log.

5.6 Fundamental Rights Impact Assessment (Art. 9(9), Recital 66)¶

For systems that may significantly impact fundamental rights, a fundamental rights impact assessment is required. This is an organisational and legal exercise. Required: A documented FRIA process, conducted with legal counsel and (where relevant) a Data Protection Officer, referencing the per-group bias analysis enabled by Mondrian calibration as quantitative evidence.

6. Quick-Start Integration Guide¶

The following recipe delivers a compliance-ready deployment in six steps. Each step is annotated with the AI Act article it primarily satisfies.

Step 1 — Install¶

pip install calibrated_explanations

Step 2 — Wrap and Calibrate the Model¶

(Prerequisite: x_proper, y_proper = proper training set; x_cal, y_cal = held-out calibration set; feature_names = list of feature name strings)

from calibrated_explanations import WrapCalibratedExplainer

explainer = WrapCalibratedExplainer(model)
explainer.fit(x_proper, y_proper)
explainer.calibrate(x_cal, y_cal, feature_names=feature_names)

Step 3 — Art. 13: Generate a Factual Explanation (Transparency)¶

# X_query: the feature vector(s) of the instance(s) to be decided upon
factual = explainer.explain_factual(X_query)

# Human-readable rule table — include in deployer UI and decision letter
factual[0].print_rules()

# Structured payload for logging and documentation
payload = factual[0].as_json()

Step 4 — Art. 50: Retrieve Counterfactual Alternatives¶

alternatives = explainer.explore_alternatives(X_query)

# Print actionable alternatives — include in explanation provided to individual
alternatives[0].print_rules()

alt_payload = alternatives[0].as_json()

Step 5 — Art. 14: Apply a Reject / Escalation Policy (Human Oversight)¶

probs, (low, high) = explainer.predict_proba(X_query, uq_interval=True)
decision_boundary = 0.5
MAX_INTERVAL_WIDTH = 0.25

straddles = (low[:, 1] < decision_boundary) & (high[:, 1] > decision_boundary)
too_uncertain = (high[:, 1] - low[:, 1]) > MAX_INTERVAL_WIDTH
needs_review = straddles | too_uncertain

if needs_review.any():
    escalate_to_human_reviewer(X_query[needs_review], reason="uncertainty_gate")

Step 6 — Art. 12: Serialise the Audit Payload (Record-keeping)¶

import json, datetime, pathlib, uuid
import calibrated_explanations

audit_log = pathlib.Path("audit_logs") / "ai_act_record.jsonl"
audit_log.parent.mkdir(exist_ok=True)

with audit_log.open("a", encoding="utf-8") as f:
    for i, (fact, alt) in enumerate(zip(factual, alternatives)):
        record = {
            "record_id": str(uuid.uuid4()),
            "timestamp": datetime.datetime.utcnow().isoformat() + "Z",
            "ce_version": calibrated_explanations.__version__,
            "factual": fact.as_json(),
            "alternatives": alt.as_json(),
            "human_review_required": bool(needs_review[i]),
        }
        f.write(json.dumps(record) + "\n")

7. References¶

Regulatory¶

EU AI Act — Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence. Official Journal of the European Union, L series, 12 July 2024. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=OJ:L_202401689
Relevant recitals: Recital 47 (meaningful explanation), Recital 48 (right to explanation does not apply to purely automatic decisions under GDPR unless separately required), Recital 58 (human oversight design), Recital 66 (fundamental rights impact), Recital 71 (accuracy and performance metrics).
Annex III — List of high-risk AI systems referred to in Art. 6(2).
Annex IV — Technical documentation referred to in Art. 11(1).

Calibrated Explanations¶

Löfström, H., Löfström, T., Johansson, U., & Sönströd, C. (2024). Calibrated Explanations: with Uncertainty Information and Counterfactuals. Expert Systems with Applications. DOI: 10.1016/j.eswa.2024.123154
Löfström, T., Löfström, H., Johansson, U., Sönströd, C., & Matela, R. (2024). Calibrated Explanations for Regression. Machine Learning. DOI: 10.1007/s10994-024-06642-8
Löfström, H., Löfström, T., Johansson, U., Sönströd, C., & Boström, H. (2024). Conditional Calibrated Explanations: Finding a Path Between Bias and Uncertainty. In: Proc. ECAI 2024 Workshop. DOI: 10.1007/978-3-031-63787-2_17
Repository: https://github.com/Moffran/calibrated_explanations

Conformal Prediction and Venn-Abers¶

Shafer, G., & Vovk, V. (2008). A Tutorial on Conformal Prediction. Journal of Machine Learning Research, 9, 371–421.
Vovk, V., Gammerman, A., & Shafer, G. (2005). Algorithmic Learning in a Random World. Springer.
Johansson, U., Löfström, T., & Boström, H. (2021). Venn-Abers Predictors. In: Conformal and Probabilistic Prediction with Applications (COPA 2021).

Satisfying EU AI Act Requirements with calibrated_explanations¶

1. Executive Summary¶

2. Scope and Applicability¶

2.1 AI Act Risk Tiers Addressed¶

2.2 Supported Model Types and Tasks¶

2.3 Prerequisites¶

3. Article-by-Article Compliance Mapping¶

Art. 9 — Risk Management System¶

Art. 10 — Data and Data Governance¶

Art. 11 and Annex IV — Technical Documentation¶

Art. 12 — Record-keeping and Logging¶

Art. 13 — Transparency and Provision of Information to Deployers¶

Art. 14 — Human Oversight¶

Art. 15 — Accuracy, Robustness and Cybersecurity¶

Art. 50 — Transparency Obligations for Certain AI Systems¶

4. Compliance Checklist¶

5. Limitations and Gaps¶

5.1 Data Quality Certification (Art. 10(2)(a)–(e))¶

5.2 Cybersecurity Controls (Art. 15(3)–(5))¶

5.3 Conformity Assessment and CE Marking (Art. 43–49)¶

5.4 Post-Market Monitoring (Art. 72)¶

5.5 Human Oversight Processes Outside the Model (Art. 14(4)(a)–(d))¶

5.6 Fundamental Rights Impact Assessment (Art. 9(9), Recital 66)¶

6. Quick-Start Integration Guide¶

Step 1 — Install¶

Step 2 — Wrap and Calibrate the Model¶

Step 3 — Art. 13: Generate a Factual Explanation (Transparency)¶

Step 4 — Art. 50: Retrieve Counterfactual Alternatives¶

Step 5 — Art. 14: Apply a Reject / Escalation Policy (Human Oversight)¶

Step 6 — Art. 12: Serialise the Audit Payload (Record-keeping)¶

7. References¶

Regulatory¶

Calibrated Explanations¶

Conformal Prediction and Venn-Abers¶

Satisfying EU AI Act Requirements with `calibrated_explanations`¶