EPS
← All tasks·#3Archived

[Proposed] Dashboard linking figures ↔ raw data ↔ scripts

kind: infra

From EXPERIMENT_QUEUE.md, added 2026-04-16

Infra task, not an experiment. For every figure in figures/, provide a link back to the raw JSON(s) in eval_results/ that produced it, plus the script commit hash that generated the plot.

Motivation: reviewers increasingly ask "where does this number come from"; several drafts already broke this audit trail.

Format options:

  • (a) static HTML index generated from INDEX.md + figure metadata
  • (b) Streamlit dashboard that loads JSONs on demand
  • (c) simple markdown table in figures/INDEX.md with figure → data → script mapping

Recommend option (a) or (c); Streamlit adds pod dependency.

Dispatch target: implementer (not experimenter). No gate-keeper needed — standard infra work.

Compute: 0 GPU. ~2-3h implementer time.

Timeline · 1 event

  1. epm:results· system
    <!-- epm:results v1 --> ## Provenance Index Complete **Files touched:** `figures/INDEX.md` (+398 lines, new file) **Co
    <!-- epm:results v1 -->
    
    ## Provenance Index Complete
    
    **Files touched:** `figures/INDEX.md` (+398 lines, new file)
    **Commit:** `00cb7ef` on branch `worktree-agent-aa56eeed`
    
    ### Coverage
    
    | Category | Count |
    |----------|-------|
    | Total figure files | 216 (168 PNG + 48 PDF) |
    | Unique figures (PNG+PDF pairs = 1) | ~168 |
    | Full provenance (script + data) | 110 |
    | Partial provenance (data source identified, script missing) | 38 |
    | Orphaned (no script in repo) | 68 |
    
    ### Scripts Mapped
    
    | Script | Figures Produced | Data Source |
    |--------|-----------------|-------------|
    | `scripts/plot_trait_transfer.py` | 10 figures | `eval_results/trait_transfer/` |
    | `scripts/plot_cot_tracking.py` | 10 figures (+ PDFs) | `eval_results/cot_axis_tracking/` |
    | `scripts/plot_full_matrix.py` | 7 figures | Hardcoded in script |
    | `scripts/analyze_category_projections.py` | 7 figures (+ PDFs) | `eval_results/axis_category_projection/` |
    | `scripts/project_categories_instruct.py` | 5 figures | `eval_results/axis_category_projection/` |
    | `scripts/project_categories_onto_axis.py` | 4 figures | Self-generated |
    | `scripts/plot_proximity_transfer.py` | 3 figures | `eval_results/proximity_transfer/` |
    | `scripts/plot_leakage_vs_cosine_all.py` | 2 figures (+ PDFs) | Multiple eval_results dirs |
    | `scripts/plot_all_results.py` | 2 figures | Hardcoded in script |
    | `scripts/plot_aim4_axis_origins.py` | 1 figure (+ PDF) | `eval_results/axis_category_projection/` + `axis_projection_v2/` |
    | `scripts/plot_leakage_vs_cosine_none.py` | 1 figure (+ PDF) | `eval_results/persona_cosine_centered/` |
    | `scripts/track_axis_during_cot.py` | 23 figures (traces + summaries) | Self-generated |
    | `experiments/aim1_2_dimensionality/run_dimensionality.py` | 4 figures | Self-generated |
    | `scripts/run_aim1_3_composition.py` | 4 figures | Self-generated |
    
    ### Orphaned Figure Categories
    
    68 figures have no generating script in the current repo:
    - **25 early experiment figures** (exp1/2/3, round1, combined, pre/post_em) -- from pre-refactor scripts
    - **6 midtrain/posttrain figures** -- from inline analysis during midtrain experiments
    - **10 axis projection figures** -- from one-off pod analysis
    - **5 leakage/contrastive figures** -- from experiment runner scripts
    - **10 prompt_length/proximity PDF variants** -- from untracked analysis script
    - **9 directed_trait_transfer figures** -- generated on pod, copied manually
    - **9 prompt_divergence figures** -- no generating script found
    - **4 tulu_dpo_em figures** -- no generating script found
    - **5 aim6 truthification figures** -- from removed truthification repo
    - **1 misc (cot_think_response_transition)** -- pod-generated
    
    ### Lint / Tests
    No code changes; only markdown file created.

Comments · 0

No comments yet. (Auth + comment composer land in step 5.)