EPS Dashboard

From EXPERIMENT_QUEUE.md, added 2026-04-16

Infra task, not an experiment. For every figure in figures/, provide a link back to the raw JSON(s) in eval_results/ that produced it, plus the script commit hash that generated the plot.

Motivation: reviewers increasingly ask "where does this number come from"; several drafts already broke this audit trail.

Format options:

(a) static HTML index generated from INDEX.md + figure metadata
(b) Streamlit dashboard that loads JSONs on demand
(c) simple markdown table in figures/INDEX.md with figure → data → script mapping

Recommend option (a) or (c); Streamlit adds pod dependency.

Dispatch target: implementer (not experimenter). No gate-keeper needed — standard infra work.

Compute: 0 GPU. ~2-3h implementer time.

Timeline · 1 event

epm:results2026-04-18T10:00:34.000Z· system

## Provenance Index Complete **Files touched:** `figures/INDEX.md` (+398 lines, new file) **Co…

<!-- epm:results v1 -->

## Provenance Index Complete

**Files touched:** `figures/INDEX.md` (+398 lines, new file)
**Commit:** `00cb7ef` on branch `worktree-agent-aa56eeed`

### Coverage

| Category | Count |
|----------|-------|
| Total figure files | 216 (168 PNG + 48 PDF) |
| Unique figures (PNG+PDF pairs = 1) | ~168 |
| Full provenance (script + data) | 110 |
| Partial provenance (data source identified, script missing) | 38 |
| Orphaned (no script in repo) | 68 |

### Scripts Mapped

| Script | Figures Produced | Data Source |
|--------|-----------------|-------------|
| `scripts/plot_trait_transfer.py` | 10 figures | `eval_results/trait_transfer/` |
| `scripts/plot_cot_tracking.py` | 10 figures (+ PDFs) | `eval_results/cot_axis_tracking/` |
| `scripts/plot_full_matrix.py` | 7 figures | Hardcoded in script |
| `scripts/analyze_category_projections.py` | 7 figures (+ PDFs) | `eval_results/axis_category_projection/` |
| `scripts/project_categories_instruct.py` | 5 figures | `eval_results/axis_category_projection/` |
| `scripts/project_categories_onto_axis.py` | 4 figures | Self-generated |
| `scripts/plot_proximity_transfer.py` | 3 figures | `eval_results/proximity_transfer/` |
| `scripts/plot_leakage_vs_cosine_all.py` | 2 figures (+ PDFs) | Multiple eval_results dirs |
| `scripts/plot_all_results.py` | 2 figures | Hardcoded in script |
| `scripts/plot_aim4_axis_origins.py` | 1 figure (+ PDF) | `eval_results/axis_category_projection/` + `axis_projection_v2/` |
| `scripts/plot_leakage_vs_cosine_none.py` | 1 figure (+ PDF) | `eval_results/persona_cosine_centered/` |
| `scripts/track_axis_during_cot.py` | 23 figures (traces + summaries) | Self-generated |
| `experiments/aim1_2_dimensionality/run_dimensionality.py` | 4 figures | Self-generated |
| `scripts/run_aim1_3_composition.py` | 4 figures | Self-generated |

### Orphaned Figure Categories

68 figures have no generating script in the current repo:
- **25 early experiment figures** (exp1/2/3, round1, combined, pre/post_em) -- from pre-refactor scripts
- **6 midtrain/posttrain figures** -- from inline analysis during midtrain experiments
- **10 axis projection figures** -- from one-off pod analysis
- **5 leakage/contrastive figures** -- from experiment runner scripts
- **10 prompt_length/proximity PDF variants** -- from untracked analysis script
- **9 directed_trait_transfer figures** -- generated on pod, copied manually
- **9 prompt_divergence figures** -- no generating script found
- **4 tulu_dpo_em figures** -- no generating script found
- **5 aim6 truthification figures** -- from removed truthification repo
- **1 misc (cot_think_response_transition)** -- pod-generated

### Lint / Tests
No code changes; only markdown file created.

[Proposed] Dashboard linking figures ↔ raw data ↔ scripts

Timeline · 1 event

Comments · 0