AI Agent Setup Guide for calibrated_explanations¶
Goal: Configure any AI agent platform so it understands CE deeply, learns from your feedback, and stays in sync with every API change.
The canonical CE rules that apply to all agents live in
CONTRIBUTOR_INSTRUCTIONS.md. Platform-specific setup files build on top of that canonical:
Platform
File
GitHub Copilot
.github/copilot-instructions.md+.github/prompts/Codex (OpenAI)
AGENTS.mdClaude Code
CLAUDE.md+.claude/settings.jsonGoogle Gemini
GEMINI.md
1. Prerequisites¶
Requirement |
Notes |
|---|---|
Python environment |
Use one venv per CE branch; install editable: |
Agent platform |
See platform-specific file for tool installation steps |
VS Code (Copilot only) |
v1.90+ with Copilot and Copilot Chat extensions |
Quick dev environment setup (all platforms):
python -m venv .venv
source .venv/bin/activate # or .venv\Scripts\activate on Windows
pip install --upgrade pip
pip install -e .[dev]
2. How CE agent context is structured¶
Every agent platform reads a two-layer set of files:
Layer 1 — Canonical (all agents)¶
CONTRIBUTOR_INSTRUCTIONS.md at the repo root contains:
CE-first policy (the mandatory pre-explain checklist)
Architecture and module boundary rules
Coding standards (type hints, Numpy docstrings, lazy imports)
Testing standards and fallback visibility policy
ADR/Standards reference map (all 26 active ADRs + 5 STDs with when-to-consult guidance)
Test-quality improvement method (8-agent team from
docs/improvement/test-quality-method/)Key files & directories, development workflow, TDD patterns
Layer 2 — Platform-specific¶
Each platform file adds only what is unique to that agent:
File |
Platform |
What it adds |
|---|---|---|
|
GitHub Copilot |
Auto-injected instruction files, prompt slash commands, |
|
Codex |
Session priming prompt, task template, workspace sync routine, CE-first utility list |
|
Claude Code |
Permissions model ( |
|
Google Gemini |
Session priming prompt, context management, workspace sync |
3. Session priming (all platforms)¶
At the start of any agent session, prime the agent with:
You are a CE-first agent for calibrated_explanations. Read CONTRIBUTOR_INSTRUCTIONS.md
and <platform file> first. Use WrapCalibratedExplainer and the public CE API directly.
Fail fast if CE-first invariants are not satisfied.
Replace <platform file> with AGENTS.md, CLAUDE.md, or GEMINI.md as appropriate.
For GitHub Copilot, the instructions are injected automatically — no priming needed.
4. GitHub Copilot — VS Code workspace setup¶
Open the repository root in VS Code. Copilot features are provided by the
Copilot/Copilot Chat extensions and GitHub’s instruction-file loading. This
repository does not require custom VS Code Copilot keys in .vscode/settings.json.
Copilot completions for Python, Markdown, and YAML.
Next Edit Suggestions – Copilot proposes the next logical edit as you type.
Instruction files – all
.github/instructions/*.instructions.mdfiles are automatically loaded into Copilot’s context based on the file you are editing.Workspace agent – Copilot can search your local files when answering questions.
Instruction files injected automatically by VS Code:
File |
Scope |
Purpose |
|---|---|---|
|
all files |
Architecture, coding standards, TDD policy, fallback rules |
|
|
Module layout, import rules, docstring style |
|
|
Testing framework, naming, coverage gate |
|
all files |
Release plan, ADR conformance, changelog policy |
Prompt slash commands available in Copilot Chat:
Command |
Use when |
|---|---|
|
Writing new tests for any CE module |
|
Scaffolding a new calibrator, plot, or explanation plugin |
|
Diagnosing and fixing a bug or failing test |
|
Updating instruction files after an API change or to incorporate feedback |
5. Keeping agents up to date¶
After an API change¶
For GitHub Copilot, run in Chat:
/refresh-ce-context module=calibrated_explanations.core.calibrated_explainer
For other platforms, ask the agent to read
CONTRIBUTOR_INSTRUCTIONS.md, compare it against the currentsrc/calibrated_explanations/source, and propose minimal diffs.Review the proposed diffs; accept or adjust.
Commit the updated instruction files alongside the code change.
After an ADR is accepted or closed¶
Run /refresh-ce-context adr=ADR-NNN (Copilot) or ask the agent to update the
ADR status row in CONTRIBUTOR_INSTRUCTIONS.md §9.
Workspace sync routine (Codex / Claude / Gemini)¶
source .venv/bin/activate
pip install -e .[dev]
python -m pip check
pytest -q
Then ask the agent to re-read CONTRIBUTOR_INSTRUCTIONS.md and diff
src/calibrated_explanations/ for changed signatures.
6. Feedback loop (all platforms)¶
No agent platform retains memory across unrelated sessions. The only way to make feedback durable is to encode it in versioned repository files.
Quick feedback¶
For GitHub Copilot, run:
/refresh-ce-context feedback="Agent suggested importing matplotlib at module level – it must always be lazy"
This appends a dated entry to .github/copilot-feedback-log.md and adds a
clarifying bullet to the relevant instruction file.
For other platforms, update CONTRIBUTOR_INSTRUCTIONS.md directly with a new bullet in
the relevant section, then add a dated entry to .github/copilot-feedback-log.md
manually.
Use this mandatory entry schema:
**Feedback:**what the agent got wrong**Root cause:**why the miss happened**Durable fix:**exact instruction/test/script updates**Verification:**command(s) proving the fix**Status:**open | ✅ incorporated
Structured feedback review¶
After each sprint or release:
Open
.github/copilot-feedback-log.mdand review accumulated entries.Verify each correction is reflected in
CONTRIBUTOR_INSTRUCTIONS.mdor the platform instruction file.Mark resolved entries ✅ and commit the changes so every team member benefits.
7. Common CE workflows (all platforms)¶
Explain a prediction end-to-end¶
Read CONTRIBUTOR_INSTRUCTIONS.md, then explain how WrapCalibratedExplainer.explain_factual
works from the public call down to the plugin registry dispatch.
Scaffold a new plugin¶
/implement-plugin plugin_type=calibrator plugin_name=isotonic_regression target_adr=ADR-013
(Copilot slash command) or give the equivalent instruction to any other agent.
Run a test-quality improvement cycle¶
Read docs/improvement/test-quality-method/README.md, then act as the test-creator
agent and produce a prioritized coverage-gap analysis.
Check release readiness¶
Read docs/improvement/RELEASE_PLAN_v1.md and list all items still open for the
current milestone.
Debug a failing test¶
/fix-issue failing_test=tests/unit/core/test_explainer.py::test_calibration_state
8. Tips for best results (all platforms)¶
Keep instruction files short and precise. A densely written instruction is more useful than a long one.
Commit instruction-file updates in the same PR as the code change so history stays in sync.
Always validate with
make test(orpytest -q) after any code change.Pin ADR references in code comments when making architectural decisions so future agent sessions understand why a pattern is used.
Use the test-quality-method agents (
test-creator,pruner, etc.) for large test changes rather than writing coverage-padding tests by hand.