AI Agent Setup Guide for calibrated_explanations

Goal: Configure any AI agent platform so it understands CE deeply, learns from your feedback, and stays in sync with every API change.

The canonical CE rules that apply to all agents live in CONTRIBUTOR_INSTRUCTIONS.md. Platform-specific setup files build on top of that canonical:

Platform

File

GitHub Copilot

.github/copilot-instructions.md + .github/prompts/

Codex (OpenAI)

AGENTS.md

Claude Code

CLAUDE.md + .claude/settings.json

Google Gemini

GEMINI.md


1. Prerequisites

Requirement

Notes

Python environment

Use one venv per CE branch; install editable: pip install -e .[dev]

Agent platform

See platform-specific file for tool installation steps

VS Code (Copilot only)

v1.90+ with Copilot and Copilot Chat extensions

Quick dev environment setup (all platforms):

python -m venv .venv
source .venv/bin/activate       # or .venv\Scripts\activate on Windows
pip install --upgrade pip
pip install -e .[dev]

2. How CE agent context is structured

Every agent platform reads a two-layer set of files:

Layer 1 — Canonical (all agents)

CONTRIBUTOR_INSTRUCTIONS.md at the repo root contains:

  • CE-first policy (the mandatory pre-explain checklist)

  • Architecture and module boundary rules

  • Coding standards (type hints, Numpy docstrings, lazy imports)

  • Testing standards and fallback visibility policy

  • ADR/Standards reference map (all 26 active ADRs + 5 STDs with when-to-consult guidance)

  • Test-quality improvement method (8-agent team from docs/improvement/test-quality-method/)

  • Key files & directories, development workflow, TDD patterns

Layer 2 — Platform-specific

Each platform file adds only what is unique to that agent:

File

Platform

What it adds

.github/copilot-instructions.md

GitHub Copilot

Auto-injected instruction files, prompt slash commands, @workspace chat tips

AGENTS.md

Codex

Session priming prompt, task template, workspace sync routine, CE-first utility list

CLAUDE.md

Claude Code

Permissions model (.claude/settings.json), bash tool rules, tool use guidance

GEMINI.md

Google Gemini

Session priming prompt, context management, workspace sync


3. Session priming (all platforms)

At the start of any agent session, prime the agent with:

You are a CE-first agent for calibrated_explanations. Read CONTRIBUTOR_INSTRUCTIONS.md
and <platform file> first. Use WrapCalibratedExplainer and the public CE API directly.
Fail fast if CE-first invariants are not satisfied.

Replace <platform file> with AGENTS.md, CLAUDE.md, or GEMINI.md as appropriate. For GitHub Copilot, the instructions are injected automatically — no priming needed.


4. GitHub Copilot — VS Code workspace setup

Open the repository root in VS Code. Copilot features are provided by the Copilot/Copilot Chat extensions and GitHub’s instruction-file loading. This repository does not require custom VS Code Copilot keys in .vscode/settings.json.

  • Copilot completions for Python, Markdown, and YAML.

  • Next Edit Suggestions – Copilot proposes the next logical edit as you type.

  • Instruction files – all .github/instructions/*.instructions.md files are automatically loaded into Copilot’s context based on the file you are editing.

  • Workspace agent – Copilot can search your local files when answering questions.

Instruction files injected automatically by VS Code:

File

Scope

Purpose

.github/copilot-instructions.md

all files

Architecture, coding standards, TDD policy, fallback rules

.github/instructions/source-code.instructions.md

src/**/*.py

Module layout, import rules, docstring style

.github/instructions/tests.instructions.md

tests/**, *test*

Testing framework, naming, coverage gate

.github/instructions/execution plan.instructions.md

all files

Release plan, ADR conformance, changelog policy

Prompt slash commands available in Copilot Chat:

Command

Use when

/generate-tests-strict

Writing new tests for any CE module

/implement-plugin

Scaffolding a new calibrator, plot, or explanation plugin

/fix-issue

Diagnosing and fixing a bug or failing test

/refresh-ce-context

Updating instruction files after an API change or to incorporate feedback


5. Keeping agents up to date

After an API change

  1. For GitHub Copilot, run in Chat:

    /refresh-ce-context module=calibrated_explanations.core.calibrated_explainer
    

    For other platforms, ask the agent to read CONTRIBUTOR_INSTRUCTIONS.md, compare it against the current src/calibrated_explanations/ source, and propose minimal diffs.

  2. Review the proposed diffs; accept or adjust.

  3. Commit the updated instruction files alongside the code change.

After an ADR is accepted or closed

Run /refresh-ce-context adr=ADR-NNN (Copilot) or ask the agent to update the ADR status row in CONTRIBUTOR_INSTRUCTIONS.md §9.

Workspace sync routine (Codex / Claude / Gemini)

source .venv/bin/activate
pip install -e .[dev]
python -m pip check
pytest -q

Then ask the agent to re-read CONTRIBUTOR_INSTRUCTIONS.md and diff src/calibrated_explanations/ for changed signatures.


6. Feedback loop (all platforms)

No agent platform retains memory across unrelated sessions. The only way to make feedback durable is to encode it in versioned repository files.

Quick feedback

For GitHub Copilot, run:

/refresh-ce-context feedback="Agent suggested importing matplotlib at module level – it must always be lazy"

This appends a dated entry to .github/copilot-feedback-log.md and adds a clarifying bullet to the relevant instruction file.

For other platforms, update CONTRIBUTOR_INSTRUCTIONS.md directly with a new bullet in the relevant section, then add a dated entry to .github/copilot-feedback-log.md manually.

Use this mandatory entry schema:

  • **Feedback:** what the agent got wrong

  • **Root cause:** why the miss happened

  • **Durable fix:** exact instruction/test/script updates

  • **Verification:** command(s) proving the fix

  • **Status:** open | incorporated

Structured feedback review

After each sprint or release:

  1. Open .github/copilot-feedback-log.md and review accumulated entries.

  2. Verify each correction is reflected in CONTRIBUTOR_INSTRUCTIONS.md or the platform instruction file.

  3. Mark resolved entries ✅ and commit the changes so every team member benefits.


7. Common CE workflows (all platforms)

Explain a prediction end-to-end

Read CONTRIBUTOR_INSTRUCTIONS.md, then explain how WrapCalibratedExplainer.explain_factual
works from the public call down to the plugin registry dispatch.

Scaffold a new plugin

/implement-plugin plugin_type=calibrator plugin_name=isotonic_regression target_adr=ADR-013

(Copilot slash command) or give the equivalent instruction to any other agent.

Run a test-quality improvement cycle

Read docs/improvement/test-quality-method/README.md, then act as the test-creator
agent and produce a prioritized coverage-gap analysis.

Check release readiness

Read docs/improvement/RELEASE_PLAN_v1.md and list all items still open for the
current milestone.

Debug a failing test

/fix-issue failing_test=tests/unit/core/test_explainer.py::test_calibration_state

8. Tips for best results (all platforms)

  • Keep instruction files short and precise. A densely written instruction is more useful than a long one.

  • Commit instruction-file updates in the same PR as the code change so history stays in sync.

  • Always validate with make test (or pytest -q) after any code change.

  • Pin ADR references in code comments when making architectural decisions so future agent sessions understand why a pattern is used.

  • Use the test-quality-method agents (test-creator, pruner, etc.) for large test changes rather than writing coverage-padding tests by hand.