Refactor to use agent teams in a worktree

kind: infra

Goal. Refactor the multi-agent workflow (/issue, /adversarial-planner, and all other multi-agent skills) to use Claude Code's native agent-team primitives (team_name + isolation: "worktree" on Agent calls; TeamCreate / TeamDelete for lifecycle) in place of the current ad-hoc agent spawning + manual git worktree add.

Why now. The current workflow predates Claude Code's team primitives. Manual worktree creation, ad-hoc agent spawning, and the implicit "marker comments as state" protocol all reinvent capabilities that the harness now provides natively. Aligning with the native primitives should simplify orchestration, reduce moving parts, and give the agents proper isolation by default.

Scope. Full redesign of the multi-agent workflow.

All multi-agent orchestrator skills are in scope: /issue, /adversarial-planner, the iterative interpretation loop, and any other place we currently spawn multiple coordinated agents (e.g., /auto-experiment-runner, /clean-results, /experiment-proposer).
Agent definitions in .claude/agents/*.md may need updates to fit the new team grouping.

Constraints.

Develop the new workflow inside a git worktree (e.g., .claude/worktrees/issue-172/) so main stays clean while we experiment.
This is a hard cutover — the refactor replaces the current workflow rather than running alongside it. The old code path can be deleted once the new one is proven.
Backward-compatibility for in-flight issues is explicitly NOT required (we will land this when no experiment is mid-pipeline).

Out of scope.

The agent personas / role definitions themselves (gate-keeper voice, reviewer adversarial behavior, etc.) — those are unchanged; only how they are dispatched and how state flows between them changes.
Pod / preflight / upload-verification logic — these stay as-is.

Acceptance criteria.

The new /issue (or its replacement) runs an end-to-end issue lifecycle using Agent({ team_name, isolation: "worktree", … }) instead of git worktree add + bare Agent().
All other multi-agent skills follow the same pattern.
Documented in .claude/skills/<…>/SKILL.md with the new flow.
A successful end-to-end dry run on a synthetic / trivial issue.

Open design questions (planner to resolve):

Where does durable state live now? Marker comments on the issue, the team's persistent context, or both?
Does the Agent tool's team_name survive across invocations (i.e., can /issue resume by re-attaching to an existing team), or does each /issue invocation create a fresh team?
Exact mapping of the current status:* state machine onto team lifecycle events (TeamCreate / individual Agent runs / TeamDelete).

Spec (from clarifier)

Mechanism: Use Claude Code's native primitives (team_name, isolation: "worktree", TeamCreate/TeamDelete).
Scope: All multi-agent skills.
Backward-compatibility: Hard cutover — replaces the current workflow.
Worktree: Implementation lives in a worktree of this repo (the standard /issue worktree mechanism).

Timeline · 2 events

epm:clarify2026-05-01T18:50:12.000Z· system

## Clarifier — RESOLVED Three blocking ambiguities were resolved by the user inline: 1. **Mech…

<!-- epm:clarify v1 -->
## Clarifier — RESOLVED

Three blocking ambiguities were resolved by the user inline:

1. **Mechanism** — Use Claude Code's native primitives: `team_name` + `isolation: "worktree"` on `Agent` calls, plus `TeamCreate`/`TeamDelete` for lifecycle. Not a conceptual reorg.
2. **Scope** — All multi-agent skills. We are designing a new workflow.
3. **Backward compatibility** — Hard cutover. The new workflow replaces the current one.

Issue body has been populated with the resolved spec. Proceeding to gate-keeper.
<!-- /epm:clarify -->

epm:gate2026-05-01T18:51:32.000Z· system

## Gate-Keeper Verdict — MODIFY ### Scores | Dimension | Score | Justification | |-----------|---…

<!-- epm:gate v1 -->
## Gate-Keeper Verdict — MODIFY

### Scores

| Dimension | Score | Justification |
|-----------|-------|---------------|
| Information value | 4/5 | Genuinely open question whether native team primitives subsume the marker-comment state machine. Result either retires substantial custom orchestration (big simplification) or surfaces concrete gaps in the native primitives that we document. Both outcomes update beliefs. |
| De-risking | 2/5 | The two open design questions in the issue body — "does `team_name` survive across invocations?" and "how does the `status:*` machine map to team lifecycle?" — are exactly the questions that, if answered "no," kill or massively rescope the refactor. The plan commits to a hard cutover before answering them. No spike, no dry-run on a synthetic skill, no fallback. |
| Strategic fit | 3/5 | Aligns with `agents-vs-skills.md` philosophy. But the project's research clock is ticking — `aim:cross-cutting` infra refactors compete with experiment work, and current orchestration is functional (3 issues running right now, 9 active worktrees, ~15 approved queued). "Functional but ad-hoc" is a weaker case than "broken." |
| Feedback speed | 4/5 | Pure code refactor, no GPUs, end-to-end dry run on a synthetic issue gives same-day signal. Iteration cost is low. |
| Opportunity cost | 4/5 | `compute:none` so no GPU contention. Cost is agent-hours and merge-coordination risk against in-flight issues, which is real but bounded. Nothing comparable in the `aim:infra` queue. |
| **Average** | **3.4/5** | (just under RUN threshold) |

### Verdict — MODIFY

The strategic case is strong (align with native primitives, retire ad-hoc machinery, simpler state model) and the cost is low (no GPUs, contained worktree). But the de-risking score is the binding constraint: the issue body itself flags two open questions whose answers determine whether this is a clean cutover or a months-long migration that ends in keeping the marker-comment system anyway. Land a 1-2 hour spike that answers them empirically before committing to a full hard cutover across every multi-agent skill. Also de-risk by either staging the cutover (start with `/adversarial-planner`, the simplest multi-agent skill, with no durable state) or timing it for a drain window when no issues are mid-pipeline.

### Required revisions to the issue body

1. **Add a Phase 0 spike (mandatory before planning).** A 1-2 hour empirical investigation that answers the two open design questions:
   - Does `Agent({ team_name: "X", isolation: "worktree" })` re-attach to an existing team across separate `/issue` invocations, or does each invocation create a new team? Test: invoke twice in separate sessions with the same `team_name`, observe behaviour.
   - What state actually persists across invocations of agents in the same team? Conversation history? Worktree contents only? Nothing?
   - Conclusion: are marker comments still load-bearing for resumability, or can they be retired? Document the answer in the issue.
2. **Stage the cutover.** Replace "all multi-agent skills in one hard cutover" with: (a) `/adversarial-planner` first — it has no durable state, no `status:*` machine, just three agents in one invocation, so it's the cleanest test of the team primitives, (b) then iterate on `/issue` only after the spike answers the resumability question, (c) leave `/auto-experiment-runner`, `/clean-results`, `/experiment-proposer` for last (or out of scope — they're orchestrators with their own concerns).
3. **Add a drain criterion.** "Land when no issues are in `status:running`, `status:planning`, `status:plan-pending`, `status:approved`, `status:interpreting`, or `status:reviewing`" — currently 24 issues are in those states (3 running, 3 planning, 0 plan-pending, ~15 approved, 0 interpreting, 3 reviewing). The cutover window needs to be either picked or replaced with a coexistence story.
4. **Add an acceptance test that exercises resume.** "End-to-end dry run on

Comments · 0

No comments yet. (Auth + comment composer land in step 5.)