EPS Dashboard

From EXPERIMENT_QUEUE.md, added 2026-04-16

Infra task. Currently we only measure persona geometry, EM alignment, leakage at discrete endpoints (pre-EM, post-EM). Add mid-training evaluation hooks that log during training.

Metrics to log per N steps:

(a) alignment score on small held-out eval set (Claude judge or log-prob proxy)
(b) capability on mini ARC-C subset (~100 Qs)
(c) persona vector norms + pairwise cosines on 5-10 fixed personas
(d) assistant-axis projection shift
(e) marker adoption rate (for leakage runs)

Implementation: HF Trainer callback that runs mini-eval every N steps, logs to WandB. Budget <5% of training wall time (small eval sets, cached activations).

Motivation:

see WHEN during training EM actually drops alignment — is it step 50? step 300? linearly? — answers questions we currently can't without retraining with checkpoints.
detect training instabilities early.
produce per-step trajectories for the paper.

Reuses existing eval infra; mainly a callback + config additions.

Dispatch target: implementer. No gate-keeper — infra.

Compute: 0 incremental (runs inside existing training jobs, ≤5% overhead).

Effort: ~3-5h implementer + 1 shakedown training run to verify.

Everything should be logged to wandb

[Proposed] Log persona/EM metrics during training (WandB callback)

Timeline · 1 event

Comments · 0