# Satisfying EU AI Act Requirements with `calibrated_explanations` > [!WARNING] > **LEGAL DISCLAIMER — NO COMPLIANCE GUARANTEE** > > This document and this library do not constitute legal advice and carry no > guarantee that any regulatory authority or legal entity will regard the > library's outputs as satisfying any statutory obligation. Whether a deployment > satisfies the EU AI Act (Regulation (EU) 2024/1689) or any other law is a > **legal determination requiring qualified legal counsel**. The library is > provided "as is" under the BSD 3-Clause licence. **The authors and maintainers > expressly disclaim all liability arising from reliance on this document.** > **Document scope:** This guide targets practitioners and compliance officers who > deploy machine learning models in contexts regulated by Regulation (EU) 2024/1689 > (the EU AI Act). It maps each relevant article to the concrete capabilities of > `calibrated_explanations` and provides a verifiable implementation checklist. --- ## 1. Executive Summary The EU AI Act (Regulation (EU) 2024/1689) imposes legally enforceable obligations on providers and deployers of high-risk AI systems, including requirements for transparency, uncertainty documentation, human oversight, and the right of affected persons to receive meaningful explanations. Accuracy metrics alone — precision, recall, AUC — do not satisfy these obligations. A model that achieves 95 % accuracy on a held-out test set still cannot, by itself, tell a regulator *why* a specific individual was denied credit, *how confident* the system was at the time of that decision, or *what change* to the individual's circumstances would have produced a different outcome. The AI Act requires precisely this per-instance, traceable, uncertainty-aware information. `calibrated_explanations` addresses this gap in a single, scikit-learn-compatible framework. Built on Venn-Abers calibration and conformal prediction, it delivers three complementary outputs: (1) **factual rule tables** — human-readable per-instance feature attributions with statistically valid uncertainty intervals, (2) **counterfactual alternatives** — minimal actionable changes that would flip or shift the prediction, and (3) **calibrated probability intervals** — empirically valid coverage-guarantee bounds that can be used to trigger human oversight when uncertainty is unacceptably high. These outputs map directly onto the transparency, documentation, human oversight, accuracy, and fairness articles of the AI Act. Deploying `calibrated_explanations` alongside an existing model requires no architectural changes to the model itself. It wraps the model via `WrapCalibratedExplainer`, calibrates it against a held-out calibration set, and exposes a consistent API for generating and serialising explanation payloads. The serialised payloads provide machine-readable audit evidence suitable for storage in an immutable log and for submission to notified bodies or market surveillance authorities. --- ## 2. Scope and Applicability ### 2.1 AI Act Risk Tiers Addressed The AI Act defines four risk tiers. This document focuses on **high-risk AI systems** as defined in **Article 6** and **Annex III**, which include systems used in: | Annex III Area | Example use cases | |---|---| | 1 — Biometric identification | Identity verification, access control | | 2 — Critical infrastructure | Grid management, transport | | 3 — Education and vocational training | Admissions scoring, assessment | | 4 — Employment and workers management | CV screening, performance monitoring | | 5 — Access to essential services | Credit scoring, insurance underwriting | | 5 — Law enforcement | Recidivism risk, threat detection | | 6 — Migration and border control | Asylum processing, risk profiling | | 7 — Administration of justice | Judicial decision support | `calibrated_explanations` applies to any ML model operating in these areas. The explanation and uncertainty capabilities described here also benefit **limited-risk** systems subject to Art. 50 transparency obligations. ### 2.2 Supported Model Types and Tasks | Task | CE support | |---|---| | Binary classification | Full — `explain_factual`, `explore_alternatives`, `predict_proba` | | Multi-class classification | Full — per-class factual rules and alternatives | | Regression | Full — prediction intervals, threshold-based probabilistic explanations | Supported base estimators: any scikit-learn-compatible model (including gradient boosting libraries such as XGBoost, LightGBM, and CatBoost via sklearn wrappers). ### 2.3 Prerequisites 1. A trained, scikit-learn-compatible model (`predict` / `predict_proba` interface). 2. A **calibration set** `(x_cal, y_cal)` — held-out data not used during training, recommended size ≥ 200 instances for reliable interval coverage. 3. A **proper training set** `(x_proper, y_proper)` — used only for fitting the internal calibration layer. 4. Python ≥ 3.9 and `calibrated_explanations` installed (`pip install calibrated_explanations`). --- ## 3. Article-by-Article Compliance Mapping ### Art. 9 — Risk Management System **a) Title (verbatim):** Article 9 — Risk management system **b) Core obligation:** Providers of high-risk AI systems must establish, implement, document, and maintain a risk management system that identifies, analyses, and evaluates the reasonably foreseeable risks that the system may pose, and takes appropriate mitigation measures throughout the lifecycle (Art. 9(1)–(6)). **c) How `calibrated_explanations` contributes:** Uncertainty quantification is a prerequisite for rational risk management. `predict_proba` with `uq_interval=True` provides empirically valid lower and upper probability bounds for every prediction. Wide intervals signal cases where the model's evidence base is weak — these are exactly the cases that pose elevated risk and that a risk management system must identify and route to additional controls. The interval width can be used as a quantitative risk indicator in the system's risk register. **d) Code:** ```python from calibrated_explanations import WrapCalibratedExplainer explainer = WrapCalibratedExplainer(model) explainer.fit(x_proper, y_proper) explainer.calibrate(x_cal, y_cal, feature_names=feature_names) # Compute prediction with uncertainty interval probs, (low, high) = explainer.predict_proba(X_query, uq_interval=True) interval_width = high - low # Risk flag: interval wider than tolerance threshold RISK_THRESHOLD = 0.30 high_risk_mask = interval_width[:, 1] > RISK_THRESHOLD ``` **e) Audit documentation:** Log `interval_width` and `high_risk_mask` per case. Include the calibration set size and the chosen `RISK_THRESHOLD` in the AI system's technical documentation (Annex IV, §1(c)). --- ### Art. 10 — Data and Data Governance **a) Title (verbatim):** Article 10 — Data and data governance **b) Core obligation:** High-risk AI systems must be trained on data that is relevant, representative, and free from errors, and providers must take appropriate measures to examine data for biases that could give rise to risks within the meaning of Art. 9(2) (Art. 10(2)(f)–(g)). Where relevant, group-specific bias and accuracy disparities must be examined and mitigated. **c) How `calibrated_explanations` contributes:** Mondrian (conditional) calibration via the `bins` parameter on `explain_factual`, `explore_alternatives`, or `predict_proba` generates **group-specific prediction intervals**. By partitioning the calibration set by protected attribute (age group, gender, nationality), the practitioner obtains per-group coverage and interval width statistics. Wider intervals for a particular sub-group are a direct, quantitative signal of reduced prediction reliability for that group, satisfying the obligation to examine bias risks. **d) Code:** ```python import numpy as np # group_labels: array-like of group identifiers for each calibration instance # e.g., derived from a protected demographic attribute group_labels_cal = x_cal[:, protected_feature_idx] group_labels_test = X_query[:, protected_feature_idx] # Mondrian-calibrated factual explanation — separate intervals per group factual = explainer.explain_factual(X_query, bins=group_labels_test) # Alternatively, at the probability level probs, (low, high) = explainer.predict_proba( X_query, uq_interval=True, bins=group_labels_test ) # Inspect per-group interval widths as bias proxy for group in np.unique(group_labels_test): mask = group_labels_test == group width = (high[:, 1] - low[:, 1])[mask].mean() print(f"Group {group}: mean interval width = {width:.3f}") ``` **e) Audit documentation:** Record per-group mean interval width and coverage estimates in the data governance section of the technical documentation. A statistically significant disparity across groups should be flagged and the mitigation strategy documented. --- ### Art. 11 and Annex IV — Technical Documentation **a) Title (verbatim):** Article 11 — Technical documentation; Annex IV — Technical documentation referred to in Article 11(1) **b) Core obligation:** Before placing a high-risk AI system on the market, providers must draw up technical documentation demonstrating compliance. Annex IV specifies that this documentation must include, among other items: a description of the system's accuracy, robustness and cybersecurity measures; the monitoring, functioning and control mechanisms; and the measures taken to enable traceability and auditability (Annex IV §1(c), §1(e), §2(c)). **c) How `calibrated_explanations` contributes:** `calibrated_explanations` produces structured, serialisable explanation objects that encode the model metadata, calibration configuration, feature names, per-instance factual rules, and uncertainty intervals. The JSON payload produced by `as_json()` provides a machine-readable record that satisfies the traceability requirement. The calibration configuration (calibration set size, conformal significance level, whether Mondrian binning is used) should be included verbatim in the Annex IV technical documentation. **d) Code:** ```python # Generate explanation and serialise as JSON for technical documentation explanation = explainer.explain_factual(X_query) # JSON payload — suitable for archiving in immutable audit log payload = explanation[0].as_json() # single-instance payload print(payload) # Contains: feature names, values, weights, intervals, prediction, probability # For a multi-row batch, iterate import json, pathlib log_path = pathlib.Path("audit_logs") / "explanations.jsonl" log_path.parent.mkdir(exist_ok=True) with log_path.open("a") as f: for exp in explanation: f.write(json.dumps(exp.as_json()) + "\n") ``` **e) Audit documentation:** The technical documentation must reference the `calibrated_explanations` version (retrievable via `import calibrated_explanations; calibrated_explanations.__version__`), the calibration set provenance, and the significance level (confidence) used. Store the JSON schema of the explanation payload as Annex IV §3 supporting artefact. --- ### Art. 12 — Record-keeping and Logging **a) Title (verbatim):** Article 12 — Record-keeping **b) Core obligation:** High-risk AI systems must automatically log events relevant to the system's operation, including dates and times of use, reference to the input data, and the results of the AI system's operation, for a period appropriate to the intended purpose (Art. 12(1)–(2)). **c) How `calibrated_explanations` contributes:** `as_json()` produces a complete, self-contained record of every prediction event, including the input vector, the predicted class or value, the calibrated probability, the uncertainty interval, and the factual feature-weight rule table. Writing this payload to an append-only log (JSONL file, database, or audit service) satisfies the logging obligation without requiring a separate logging infrastructure. **d) Code:** ```python import json, datetime, pathlib, uuid audit_log = pathlib.Path("audit_logs") / "ai_act_record.jsonl" audit_log.parent.mkdir(exist_ok=True) explanation = explainer.explain_factual(X_query) with audit_log.open("a", encoding="utf-8") as f: for i, exp in enumerate(explanation): record = { "record_id": str(uuid.uuid4()), "timestamp": datetime.datetime.utcnow().isoformat() + "Z", "model_version": model_version, "ce_version": calibrated_explanations.__version__, "input_index": i, "explanation": exp.as_json(), } f.write(json.dumps(record) + "\n") ``` **e) Audit documentation:** Define and document the log retention policy (Art. 12(2) specifies a period proportionate to the purpose; credit decisions commonly require ≥ 3 years). Include the log schema and access-control policy in the AI system's technical documentation. --- ### Art. 13 — Transparency and Provision of Information to Deployers **a) Title (verbatim):** Article 13 — Transparency and provision of information to deployers **b) Core obligation:** High-risk AI systems must be designed and developed to ensure that their operation is sufficiently transparent to enable deployers to interpret the system's outputs and use them appropriately (Art. 13(1)). The instructions for use must include information about the system's capabilities and limitations (Art. 13(3)(b)). **c) How `calibrated_explanations` contributes:** `explain_factual` returns a human-readable **factual rule table** for each instance: a ranked list of features together with their actual values, the direction and magnitude of their contribution to the prediction, and the uncertainty interval around each weight. This output is directly interpretable by a non-technical deployer — the rule reads, in natural language, as "because feature X had value V, the probability of outcome Y increased by W [±Δ]". The rule table constitutes the "sufficiently transparent" output required by Art. 13(1). **d) Code:** ```python # Per-instance factual explanation factual = explainer.explain_factual(X_query) # Print human-readable rule table for the first instance factual[0].print_rules() # Example output: # age = 42 → P(approve) increased by 0.12 [0.07, 0.18] # income = 55000 → P(approve) increased by 0.09 [0.04, 0.15] # debt_ratio = 0.4 → P(approve) decreased by 0.05 [0.01, 0.10] # Access structured rule data programmatically rules = factual[0].as_json() ``` **e) Audit documentation:** Log the full rule table for every production prediction event. The instructions for use (Art. 13(3)) should describe the rule-table format and explain how deployers should interpret uncertainty intervals, including the guidance that predictions with wide intervals require additional review. --- ### Art. 14 — Human Oversight **a) Title (verbatim):** Article 14 — Human oversight **b) Core obligation:** High-risk AI systems must be designed and developed to allow the natural persons to whom human oversight is assigned to effectively oversee the system's operation. The system must enable those persons to decide not to use the system or to override, disregard, or reverse its output (Art. 14(1), (4)(c)). **c) How `calibrated_explanations` contributes:** The **interval straddle policy** and **interval width policy** (both built-in decision-policy patterns) provide automatic, principled triggers for routing predictions to human review. When the calibrated probability interval straddles the decision boundary, or when the interval width exceeds a pre-defined tolerance, the system flags the case for human oversight *before* any automated decision is taken. This is not a post-hoc safeguard; it is an inline gate that prevents the automated system from acting on uncertain predictions without human confirmation. **d) Code:** ```python from calibrated_explanations.core.reject.policy import RejectPolicy probs, (low, high) = explainer.predict_proba(X_query, uq_interval=True) decision_boundary = 0.5 # Straddle check: probability interval crosses the decision boundary straddles = (low[:, 1] < decision_boundary) & (high[:, 1] > decision_boundary) # Width check: absolute uncertainty too large MAX_WIDTH = 0.25 too_uncertain = (high[:, 1] - low[:, 1]) > MAX_WIDTH # Combined human-oversight gate needs_human_review = straddles | too_uncertain for idx in X_query[needs_human_review]: escalate_to_human_reviewer(idx, reason="uncertainty_gate") # Alternatively, use built-in RejectPolicy via explain_factual factual = explainer.explain_factual(X_query, reject_policy=RejectPolicy.FLAG) ``` **e) Audit documentation:** Log every escalation event with the case identifier, the triggering condition (`straddle`, `width`, or both), the interval values, and the human reviewer's identity and decision. This log constitutes the human oversight audit trail required by Art. 14 and Recital 58. --- ### Art. 15 — Accuracy, Robustness and Cybersecurity **a) Title (verbatim):** Article 15 — Accuracy, robustness and cybersecurity **b) Core obligation:** High-risk AI systems must achieve an appropriate level of accuracy, robustness, and consistency of performance. Accuracy metrics must be declared in the technical documentation. Providers must document the expected level of accuracy and the measures taken to achieve it (Art. 15(1)–(2)). **c) How `calibrated_explanations` contributes:** Venn-Abers calibration and conformal prediction deliver **empirical coverage guarantees**: the stated significance level (e.g., 90 % confidence) corresponds to a statistically verifiable empirical coverage on the calibration set. This is a stronger claim than a point accuracy metric — it is a **certifiable bound** on prediction interval validity. The calibration error (difference between nominal and empirical coverage) can be measured and reported as a quantitative accuracy declaration for Annex IV. **d) Code:** ```python # Measure empirical coverage on a held-out validation set probs, (low, high) = explainer.predict_proba(x_val, uq_interval=True) nominal_confidence = 0.90 # passed as `confidence` at calibrate() time # For classification: empirical coverage = fraction of true labels within interval in_interval = (low[range(len(y_val)), y_val] <= probs[range(len(y_val)), y_val]) & \ (probs[range(len(y_val)), y_val] <= high[range(len(y_val)), y_val]) empirical_coverage = in_interval.mean() print(f"Nominal confidence: {nominal_confidence:.2%}") print(f"Empirical coverage: {empirical_coverage:.2%}") # A well-calibrated system: empirical ≈ nominal (difference < 2 pp) ``` **e) Audit documentation:** Report both nominal confidence and empirical coverage in the technical documentation's accuracy section. Repeat this measurement for each sub-population identified in the data governance section (Art. 10), and for each model version deployed. Include the calibration set size and split procedure (stratified vs. random) as Annex IV §1(c) artefacts. --- ### Art. 50 — Transparency Obligations for Certain AI Systems **a) Title (verbatim):** Article 50 — Transparency obligations for providers and deployers of certain AI systems **b) Core obligation:** Where an AI system is used to make or assist in making decisions that significantly affect persons, those persons must be informed of their right to receive an explanation of the decision (Art. 50(1)). The explanation must be meaningful, intelligible, and in plain language (Recital 47). **c) How `calibrated_explanations` contributes:** `explore_alternatives` generates **counterfactual (alternative) explanations**: the minimal set of feature changes that would produce a different or more favourable prediction. These are directly usable as the Art. 50 "explanation to the individual" — they communicate not only *why* the current decision was made but *what the individual could change* to obtain a different outcome. The output is inherently actionable, non-discriminatory (it proposes changes only to features that are within the individual's control, when properly configured), and expressible in plain language. **d) Code:** ```python # Generate counterfactual alternatives for the affected individual alternatives = explainer.explore_alternatives(X_query) # Print actionable alternatives for person at index 0 alternatives[0].print_rules() # Example output: # IF income increased from 38000 → 52000 THEN P(approve) = 0.73 [0.65, 0.81] # IF debt_ratio decreased from 0.55 → 0.35 THEN P(approve) = 0.68 [0.59, 0.76] # Serialise for delivery to the affected person / inclusion in decision letter alt_json = alternatives[0].as_json() ``` **e) Audit documentation:** Whenever a significant automated or assisted decision is taken, log the full `explore_alternatives` output for the affected individual. This constitutes the "explanation record" that must be made available on request (Art. 50, Recital 47). The log entry must include the instance identifier, the final decision, the probability and interval, and the alternative rules. --- ## 4. Compliance Checklist | Article | Obligation summary | CE feature / method | Status | |---|---|---|---| | Art. 9 | Identify and quantify AI system risks throughout lifecycle | `predict_proba(uq_interval=True)` — interval width as risk indicator | **Covered** | | Art. 10 | Examine training data for bias; ensure representativeness across groups | `explain_factual(bins=group_labels)` — Mondrian per-group intervals | **Covered** | | Art. 11 + Annex IV | Produce technical documentation including accuracy and traceability | `as_json()` — serialisable explanation payload; calibration configuration | **Covered** | | Art. 12 | Log events: input data, result, date/time — for defined retention period | `as_json()` appended to audit log with timestamp and record ID | **Partially covered** — log infrastructure and retention policy must be provided by deployer | | Art. 13 | Transparent operation; deployer can interpret system outputs | `explain_factual()` — `print_rules()` human-readable rule table | **Covered** | | Art. 14 | Enable human oversight; allow override before automated action | `predict_proba` straddle/width gates; `RejectPolicy.FLAG` | **Covered** | | Art. 15 | Declared, verifiable accuracy; robustness documentation | Empirical coverage vs. nominal confidence (Venn-Abers guarantee) | **Covered** | | Art. 50 | Right to explanation; inform affected persons in plain language | `explore_alternatives()` — actionable counterfactual rules | **Covered** | --- ## 5. Limitations and Gaps The following compliance obligations are **not** satisfied by `calibrated_explanations` alone and require additional organisational or technical controls. ### 5.1 Data Quality Certification (Art. 10(2)(a)–(e)) Art. 10 requires that training data satisfies relevance, representativeness, error-freedom, and completeness criteria. `calibrated_explanations` does not inspect, validate, or certify input data quality. **Required:** A data quality framework (e.g., Great Expectations, Soda, or a custom data validation pipeline) upstream of model training, with documented data quality test results. ### 5.2 Cybersecurity Controls (Art. 15(3)–(5)) Art. 15 requires resilience against adversarial manipulation, data poisoning, and model-evasion attacks. `calibrated_explanations` does not provide adversarial robustness defences. **Required:** Adversarial testing (e.g., using libraries such as `adversarial-robustness-toolbox`) and network/access controls protecting the model serving endpoint. ### 5.3 Conformity Assessment and CE Marking (Art. 43–49) High-risk AI systems must undergo a conformity assessment before market placement. `calibrated_explanations` provides technical evidence for the assessment but is not itself a conformity assessment tool. **Required:** Engagement with a notified body (where applicable) or internal conformity assessment process, documentation in accordance with Art. 11 and Annex IV, and registration in the EU AI Act database (Art. 49). ### 5.4 Post-Market Monitoring (Art. 72) Art. 72 requires providers to operate a post-market monitoring system that continuously collects data on system performance after deployment. `calibrated_explanations` does not provide drift detection, performance dashboards, or monitoring alerts. **Required:** A model monitoring system (e.g., Evidently AI, WhyLabs, or a custom pipeline) that tracks empirical coverage, interval widths, and prediction distributions over time, and feeds back into the risk management system (Art. 9). ### 5.5 Human Oversight Processes Outside the Model (Art. 14(4)(a)–(d)) Art. 14 requires that humans assigned to oversight are sufficiently competent to understand the system's capabilities and limitations, and that their decisions are logged. `calibrated_explanations` can flag cases for review but cannot train reviewers, enforce review quality, or impose a review workflow. **Required:** Documented escalation procedures, reviewer training, and a workflow management system that records reviewer decisions and links them back to the CE audit log. ### 5.6 Fundamental Rights Impact Assessment (Art. 9(9), Recital 66) For systems that may significantly impact fundamental rights, a fundamental rights impact assessment is required. This is an organisational and legal exercise. **Required:** A documented FRIA process, conducted with legal counsel and (where relevant) a Data Protection Officer, referencing the per-group bias analysis enabled by Mondrian calibration as quantitative evidence. --- ## 6. Quick-Start Integration Guide The following recipe delivers a compliance-ready deployment in six steps. Each step is annotated with the AI Act article it primarily satisfies. ### Step 1 — Install ```bash pip install calibrated_explanations ``` ### Step 2 — Wrap and Calibrate the Model *(Prerequisite: `x_proper`, `y_proper` = proper training set; `x_cal`, `y_cal` = held-out calibration set; `feature_names` = list of feature name strings)* ```python from calibrated_explanations import WrapCalibratedExplainer explainer = WrapCalibratedExplainer(model) explainer.fit(x_proper, y_proper) explainer.calibrate(x_cal, y_cal, feature_names=feature_names) ``` ### Step 3 — Art. 13: Generate a Factual Explanation (Transparency) ```python # X_query: the feature vector(s) of the instance(s) to be decided upon factual = explainer.explain_factual(X_query) # Human-readable rule table — include in deployer UI and decision letter factual[0].print_rules() # Structured payload for logging and documentation payload = factual[0].as_json() ``` ### Step 4 — Art. 50: Retrieve Counterfactual Alternatives ```python alternatives = explainer.explore_alternatives(X_query) # Print actionable alternatives — include in explanation provided to individual alternatives[0].print_rules() alt_payload = alternatives[0].as_json() ``` ### Step 5 — Art. 14: Apply a Reject / Escalation Policy (Human Oversight) ```python probs, (low, high) = explainer.predict_proba(X_query, uq_interval=True) decision_boundary = 0.5 MAX_INTERVAL_WIDTH = 0.25 straddles = (low[:, 1] < decision_boundary) & (high[:, 1] > decision_boundary) too_uncertain = (high[:, 1] - low[:, 1]) > MAX_INTERVAL_WIDTH needs_review = straddles | too_uncertain if needs_review.any(): escalate_to_human_reviewer(X_query[needs_review], reason="uncertainty_gate") ``` ### Step 6 — Art. 12: Serialise the Audit Payload (Record-keeping) ```python import json, datetime, pathlib, uuid import calibrated_explanations audit_log = pathlib.Path("audit_logs") / "ai_act_record.jsonl" audit_log.parent.mkdir(exist_ok=True) with audit_log.open("a", encoding="utf-8") as f: for i, (fact, alt) in enumerate(zip(factual, alternatives)): record = { "record_id": str(uuid.uuid4()), "timestamp": datetime.datetime.utcnow().isoformat() + "Z", "ce_version": calibrated_explanations.__version__, "factual": fact.as_json(), "alternatives": alt.as_json(), "human_review_required": bool(needs_review[i]), } f.write(json.dumps(record) + "\n") ``` --- ## 7. References ### Regulatory - **EU AI Act** — Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence. Official Journal of the European Union, L series, 12 July 2024. - **Relevant recitals:** Recital 47 (meaningful explanation), Recital 48 (right to explanation does not apply to purely automatic decisions under GDPR unless separately required), Recital 58 (human oversight design), Recital 66 (fundamental rights impact), Recital 71 (accuracy and performance metrics). - **Annex III** — List of high-risk AI systems referred to in Art. 6(2). - **Annex IV** — Technical documentation referred to in Art. 11(1). ### Calibrated Explanations - Löfström, H., Löfström, T., Johansson, U., & Sönströd, C. (2024). *Calibrated Explanations: with Uncertainty Information and Counterfactuals.* Expert Systems with Applications. DOI: [10.1016/j.eswa.2024.123154](https://doi.org/10.1016/j.eswa.2024.123154) - Löfström, T., Löfström, H., Johansson, U., Sönströd, C., & Matela, R. (2024). *Calibrated Explanations for Regression.* Machine Learning. DOI: [10.1007/s10994-024-06642-8](https://doi.org/10.1007/s10994-024-06642-8) - Löfström, H., Löfström, T., Johansson, U., Sönströd, C., & Boström, H. (2024). *Conditional Calibrated Explanations: Finding a Path Between Bias and Uncertainty.* In: Proc. ECAI 2024 Workshop. DOI: [10.1007/978-3-031-63787-2_17](https://doi.org/10.1007/978-3-031-63787-2_17) - Repository: ### Conformal Prediction and Venn-Abers - Shafer, G., & Vovk, V. (2008). *A Tutorial on Conformal Prediction.* Journal of Machine Learning Research, 9, 371–421. - Vovk, V., Gammerman, A., & Shafer, G. (2005). *Algorithmic Learning in a Random World.* Springer. - Johansson, U., Löfström, T., & Boström, H. (2021). *Venn-Abers Predictors.* In: Conformal and Probabilistic Prediction with Applications (COPA 2021).