[Code] Research-chain tracking: issue linking + claims registry
Motivation
Mentor wants a way to trace data → hyperparams → results → claims and to see how experiments chain together. Currently:
- No programmatic link between a claim in
RESULTS.mdand its supporting issues/runs/figures. - Issues reference each other only via ad-hoc
#Ncomments; no typed "follows from" / "followed by" relationships so you can't walk the research chain programmatically. - No single registry of claims with their status + evidence.
This issue adds the minimum infrastructure to close those gaps without building a custom dashboard. Extends the existing /issue <N> skill and the marker-comment system.
Scope
Files in scope:
docs/claims.yaml(new) — canonical claim registrydocs/claims.md(new) — auto-rendered; committed by CIscripts/render_claims.py(new).github/workflows/render_claims.yml(new) — CI trigger.claude/skills/issue/SKILL.md— extend Step 1 (clarifier posts follows markers) and Step 8 (ask about claim attribution).claude/skills/issue/clarifier.md— add "what prior issue motivated this?" prompt.claude/skills/issue/markers.md— addepm:follows,epm:followed-bykinds.claude/agent-memory/research-pm/reference_github_issues.md— documentepic:*andclaim:*label conventions
Files explicitly OUT of scope:
- Eval Q&A logging (
wandb.Table) — separate issue - WandB Reports auto-generation via Reports SDK — separate issue
- GitHub Pages eval browser — explicitly deferred per discussion (WandB Reports preferred)
- One-time backfill of the 30 existing issues' claims — sub-issue if/when wanted
Acceptance criteria
-
docs/claims.yamlexists with schema documented in-file:id,description,aim,status(preliminary|moderate|strong|falsified),evidence: {issues, wandb_report, figures, results_md_section},kill_criteria,supersedes,updated -
scripts/render_claims.pyreadsclaims.yaml, joins against live issue states viagh issue view, and emitsdocs/claims.mdas a sortable markdown table with deep links - GitHub Actions workflow re-renders
docs/claims.mdon push todocs/claims.yamlOR when any issue closes carrying aclaim:*label -
/issueskill clarifier asks "what prior issue motivated this?" and posts<!-- epm:follows N -->on the child +<!-- epm:followed-by N -->on the parent. Bidirectional graph parseable viaghcomment scan. -
/issueStep 8 asks "contributes to which claim?" and appends the issue toclaims.yamlevidence list + addsclaim:<id>label to the issue. Also supports creating a new claim in the yaml with a fresh ID. -
epic:*label convention documented (loose grouping across an investigation) -
claim:*label taxonomy documented;claim:<id>labels auto-created viagh apiwhen a new claim is added toclaims.yaml
Tests
No unit tests needed for this infra. Manual acceptance:
- Add
claim:C-test-1toclaims.yaml→render_claims.pyemits a valid row indocs/claims.md - File a new test issue referencing
#29as its parent via the clarifier → verify both markers posted bidirectionally - Close an issue carrying a
claim:C-test-1label → CI workflow re-rendersdocs/claims.mdwith the new evidence -
ruff check . && ruff format .passes
Compatibility
- Fully additive. Existing 30 issues unaffected unless manually tagged
claim:*. RESULTS.mdcontinues as human-readable prose.claims.yamlis the machine-readable spine. Headlines inRESULTS.mdreferenceC-*IDs.ghCLI 2.4.0 compatible (labels created viagh api, nogh labelsubcommand used).
Dependencies
- PyYAML — likely already in
uv.lock. If not,uv add pyyaml. ghCLI 2.4.0 — already installed.- No other new PyPI packages expected.
Performance impact
None. render_claims.py runs in CI, off the hot path. Expected render time <5s for 100 claims.
Risk + rollback
- Blast radius: very low. All additive. No existing code is modified except the
/issueskill (markdown files, behavior-extension). - Rollback:
git revert. No schema migrations, no model artifacts touched. - If
render_claims.pybreaks:docs/claims.mdstale; no data loss.
Aim
aim:cross-cutting — tracking infrastructure that spans all research aims.
Design sketch (informal — the adversarial-planner will produce the canonical plan)
docs/claims.yaml schema:
claims:
- id: C-aim3-leakage-v1
description: "Trait leakage across ~X% of adjacent tokens under SFT"
aim: 3-propagation
status: moderate
evidence:
issues: [27, 28, 29, 30]
wandb_report: https://wandb.ai/.../reports/...
figures: [figures/aim3/leakage_comprehensive.png]
results_md_section: "Aim 3 — Propagation"
kill_criteria: "effect size < 0.1 across 3+ seeds"
supersedes: []
updated: 2026-04-16
Marker comment example (epm:follows):
<!-- epm:follows v1 -->
**Follows from:** #27 (marker leakage v3 showed the effect was confounded by token length)
**Motivating result:** link to the epm:results comment in #27
<!-- /epm:follows -->
Label additions (created via gh api when referenced):
epic:<slug>— free-form grouping label (e.g.,epic:trait-leakage)claim:C-<aim>-<slug>-v<n>— one per claim in the registry
Open questions for the clarifier / planner
- Should
claims.yamlsupport multi-aim claims (cross-cutting)? — proposed: yes,aimfield can be a list. - Should the render include WandB Report thumbnails, or just links? — proposed: links only for now.
- Should
/issueStep 8 force a claim attribution, or allownone? — proposed: allownonebut warn if the issue is markedstatus:under-reviewPASS and has no claim.
Next step: run /issue <this issue number> through the skill for gate-keeper → adversarial-planner → approval → implementer.
Timeline · 4 events
epm:clarify· system<!-- epm:clarify v1 --> ## Clarifier — No blocking ambiguities Issue passes the `type:infra` clarifier. All required fi…
<!-- epm:clarify v1 --> ## Clarifier — No blocking ambiguities Issue passes the `type:infra` clarifier. All required fields present: scope in/out, motivation, compatibility (additive), tests (manual acceptance), dependencies (PyYAML), performance (none), risk+rollback (low). **Design-decisions to carry to the planner (non-blocking — defaults stated):** 1. Multi-aim claims — default: `aim` field accepts a string OR list. 2. Render WandB thumbnails — default: links only. 3. Force claim attribution at Step 8 — default: allow `none` but warn on PASS+no-claim. **Minor notes for the planner to resolve:** - `docs/claims.md` lifecycle — body says "committed by CI"; planner should specify whether it's `.gitignore`d locally (built in CI only) or both local+CI-built. Recommend: `.gitignore` it locally so developers don't commit stale renders; CI commits canonical version on `docs/claims.yaml` changes. - Verify PyYAML is already in `uv.lock` — if not, planner should add `uv add pyyaml`. Advancing to `status:gate-pending`. <!-- /epm:clarify -->
epm:gate· system<!-- epm:gate v1 --> ## Gate-Keeper Verdict: **RUN** ### Scores | Dimension | Score | Justification | |-----------|----…
<!-- epm:gate v1 --> ## Gate-Keeper Verdict: **RUN** ### Scores | Dimension | Score | Justification | |-----------|-------|---------------| | Information Value | 4/5 | Directly resolves a mentor-stated need: programmatic `data→hyperparams→results→claims` traceability. Not an experiment, but materially increases ability to audit and communicate every future claim. The aim-5 `good_correct` headline retraction (50.9 → 28.3 batch-size artifact) is exactly the event where a claims registry with `status` + `supersedes` prevents stale headlines lingering — `RESULTS.md` still carries partially-retracted prose. | | De-risking Quality | 4/5 | Scope is minimum-viable: extends existing `/issue` skill + marker system. Out-of-scope list is disciplined. Rollback is `git revert`; no migrations, no model artifacts touched. Residual risk: CI render-on-issue-close is the fragile piece; planner should add a manual fallback. | | Strategic Fit | 4/5 | `aim:cross-cutting`, timing right: mid-transition from `EXPERIMENT_QUEUE.md` to GitHub Issues as truth. Building at 50+ claims later is strictly more expensive. Infra pays compound interest on every subsequent experiment's interpretability. | | Feedback Loop Speed | 4/5 | ~4 engineer-hours, zero GPU-hours. First signal: `docs/claims.md` renders locally from a 1-claim yaml within 30 min of the renderer landing. Can self-host smoke-test against this very issue. | | Opportunity Cost | 5/5 | `compute:none`. Not competing with GPU-bound queue items. Competing infra issues (#10-13) don't block claim tracking. Engineer-hours here are essentially free against critical path. | | **Weighted Total** | **4.15/5** | | ### Research Phase Alignment - Aim: cross-cutting (spans all aims). Not an experiment. - Aim 5 just reverted Understand → Distill → Understand precisely because a retracted claim wasn't caught early. Claim tracking is phase-agnostic and directly supports Distill for aims 3/4/6. ### What Could Go Wrong (for the planner) 1. **CI workflow fragility.** `render_claims.py` calling `gh issue view` per claim could hit rate limits on every issue close. Mitigation: batch single `gh issue list --search 'label:claim:*'`; graceful fallback (emit warning row, don't fail workflow). 2. **Clarifier friction.** Two new prompts ("follows from?" + "contributes to which claim?") tax every issue. If tedious, users skip. Mitigation: both accept `none`/skip gracefully; Step-8 prompt auto-suggests based on `aim:*` label. 3. **Schema churn.** `claims.yaml` schema may prove insufficient after ~10 real claims. Accept that v1 will evolve; keep it small. ### Plan must incorporate - Manual-render escape hatch: `python scripts/render_claims.py --local` independent of CI. - Explicit "no claim attached" path in Step 8: issue can close without `claim:*` label; only warn on `status:under-review PASS` + no claim. - **Self-hosting smoke test:** seed `claims.yaml` with `C-infra-claims-tracking-v1` attached to issue #33 itself, so the system demonstrates end-to-end on merge. ### Overrides User-requested (mentor-driven) — would RUN regardless of score. Score confirms the ask is well-scoped. <!-- /epm:gate -->state_changed· user· planning → proposedMoved on Pipeline board to idea.
Moved on Pipeline board to idea.
state_changed· user· proposed → archivedClosed on GH without ever running. Archiving on Sagan to match.
Closed on GH without ever running. Archiving on Sagan to match.
Comments · 0
No comments yet. (Auth + comment composer land in step 5.)