[Proposed] Efficiency: faster midtraining + faster persona-leakage
kind: infra
From EXPERIMENT_QUEUE.md, added 2026-04-16
Infra / methodology task. Current midtrain runs are ~4-8h each (Tulu SFT 25% + DPO) and leakage experiments are ~1h each × many conditions.
Candidates:
- (a) LoRA midtraining instead of full finetune (compare alignment preservation vs quality drop)
- (b) Reduced Tulu mixture — identify minimal sub-mixture that preserves EM-defense effect (current 25% = 61k; try 10%, 5%)
- (c) Sequence packing + flash-attn tuning
- (d) For leakage: shared base-model caching across conditions; fused eval batches across personas
- (e) Distillation: train a small midtrain "head" that approximates Tulu DPO effect
Dispatch target: mix of implementer (infra work) + experimenter (quality regression checks).
Success criterion: 2-4× wall-time reduction with ≤2pt regression on EM-defense metric.
Compute: ~10-20 GPU-hours for ablations; savings amortize across all future runs.
Depends on: safety-tooling audit (run first to avoid reinventing).
Gate-keeper priority: MEDIUM (indirect — saves future compute; tractable).
Timeline · 0 events
No events recorded.
Comments · 0
No comments yet. (Auth + comment composer land in step 5.)