EPS Dashboard

From EXPERIMENT_QUEUE.md, added 2026-04-16

Research / survey task. Before further midtraining ablations (efficiency issue), audit existing public recipes to see if we're reinventing.

Targets:

(a) Anthropic / EleutherAI / AI-safety-tooling repos with midtraining code
(b) Tinker cookbook (Thinking Machines' recently published training recipe book)
(c) Allen AI open-instruct midtraining pipelines we haven't already reviewed
(d) recent papers on EM defense / safety post-training (Lee et al., Qi et al., Zou et al.)

Deliverable: short memo (~1-2 pages) cataloguing: recipe name, data mix, hyperparameters, reported EM / safety effect, key deltas vs our current pipeline, which are worth trying.

Dispatch target: general-purpose / Explore agent (web + arxiv + github search). No GPU.

Motivation: reduce risk of redundant work; surface techniques we might import directly (e.g. a specific DPO pref-mixture tuned for safety).

Compute: 0 GPU, ~2-3h agent time.

Priority: HIGH to run BEFORE efficiency ablations.

Timeline · 2 events

epm:results2026-04-18T10:03:26.000Z· system

## Midtraining Recipe Audit Complete **Draft:** `research_log/drafts/2026-04-18_midtrain_recipe…

<!-- epm:results v1 -->
## Midtraining Recipe Audit Complete

**Draft:** `research_log/drafts/2026-04-18_midtrain_recipe_audit.md`

Catalogued 14 public midtraining/safety-post-training recipes and defense mechanisms from 2024-2026 across Allen AI, Anthropic, Thinking Machines, and academic sources. Top findings: (1) KL-regularized EM induction from "EM is Easy" (ICLR 2026) can constrain EM to narrow-only misalignment (lambda=1e5, tested on Qwen2.5-7B/14B/32B) -- highest priority import; (2) perplexity-gap-based safe data interleaving during EM induction (In-Training Defenses, arXiv 2508.06249); (3) SafeLoRA zero-cost post-hoc projection of LoRA weights onto safety subspace. Our existing Tulu SFT+DPO pipeline already matches Allen AI reference recipe closely. The main gap is that we have no in-training regularization during EM induction and no post-hoc defense baselines -- both are now addressable with concrete recipes and hyperparameters from the audit.

state_changed2026-05-13T12:29:36.137Z· user· proposed → archived

Moved on Pipeline board to archived.

Moved on Pipeline board to archived.

[Proposed] Audit safety-tooling + Tinker cookbook for midtraining recipes

Timeline · 2 events

Comments · 0