Extend #246 cosine→source-rate regression to N=24 personas + full 28-layer scan (parallel multi-GPU)
Motivation
Parent: #246 (clean-result #271) — "#232's cosine→source-rate regression generalizes and strengthens at L20 across 12 personas (MODERATE confidence)" — established that:
- The regression generalizes from N=10 (#232) to N=12 with 2 new generic_helper points
- L15 is the strongest layer (Spearman ρ=-0.81, p=0.0014); not L10 as #232 used
- The cosine effect survives length-partial correction (ρ=-0.67, p=0.018)
- Confidence ceiling at MODERATE due to single seed and pre-registration of L20 from #142's different statistical setup
This follow-up extends in two directions simultaneously:
- More personas (N=12 → N=24): add 12 new persona conditions across the 3 categories (occupational, character, generic_helper) to test whether the regression generalizes beyond the original 12-set. Bonferroni-clearable Spearman power becomes feasible at N=24.
- Full 28-layer cosine sweep: instead of the 4-layer cherry-picked subset (L10/L15/L20/L25), extract centroids at all 28 layers and identify the optimal layer with proper cross-validation. Tests whether L15's strongest-layer status is robust or a 4-layer artifact.
Proposed experiment
Conditions (12 new persona-specific marker LoRAs)
Identical Phase A1 recipe (matches #232 / #246):
- Base: Qwen/Qwen2.5-7B-Instruct
- LoRA r=32, α=64, dropout=0.05, use_rslora=True, q/k/v/o/gate/up/down
- AdamW lr=1e-5 cosine warmup=0.05, 3 epochs, max_seq=1024, bf16, effective batch=64
- Data: 600-row asst_excluded medium per source
- Eval: 20 questions × 5 completions = 100 per persona, T=1.0, vLLM
- Seed: 42 (matches #232/#246 single-seed protocol)
12 new sources to add (4 per category):
- Occupational (+4): chef, lawyer, accountant, journalist — broaden the occupational distribution
- Character (+4): wizard/sorcerer, hero, philosopher, child — broaden the character distribution
- Generic_helper (+4): "You are an AI assistant.", "You are an AI.", "You are a chatbot.", "I am a helpful assistant." (the "I am" framing from #113) — extend the generic-helper cluster
The exact persona descriptions can be refined by the adversarial planner.
Layerwise analysis
Extract centroids at all 28 layers (vs #246's L10/15/20/25) on the base Qwen2.5-7B-Instruct for the full N=24 persona set.
For each layer, compute:
- Pearson r, p
- Spearman ρ, p
- Length-partial Spearman with VIF guard
- LOO Pearson distribution
- Cook's D / leverage
Identify the optimal layer via:
- Maximum |Spearman| across layers
- 5-fold cross-validation: split N=24 into 5 folds, predict held-out source rates from cosine using each layer's regression, report MSE per layer
- Bonferroni at α=0.05/28 = 0.0018 across the 28-layer family
Pre-registered tests
- Primary (matches #246 pre-registration): L20 95% PI-coverage of the N=22 (24 minus 2 new) fit at the new 12 cosines. Pass if ≥9/12 new points fall inside.
- Secondary: L15 95% PI-coverage (since #246 found L15 strongest).
- Robustness: Bonferroni-corrected Spearman across all 28 layers — does L15-L20 survive?
- Sensitivity: drop-each-category fits (12 occupational + 8 character + 6 generic_helper); within-category Pearson at L15.
Compute
- Training: 12 conditions × ~3 min train + ~10-15 min eval each = ~3 GPU-hours total if all parallel (12 GPUs needed) OR ~3 hours wall on 1 GPU (sequential) OR ~30 min wall on 8 GPUs (8 in parallel + 4 in second wave)
- Centroid extraction: ~30 min on 1 GPU for 24 personas × 28 layers
- Analysis: <5 min
Pod recommendation: provision multi-GPU pod (intent ft-7b = 4× H100, or override --gpu-count 8 for full parallelism). With 8 GPUs, all 12 conditions complete in ~2 waves = ~30-40 min wall.
Run-in-parallel notes
- The 12 conditions are independent — no shared state. Each runs on its own GPU via
CUDA_VISIBLE_DEVICES-sharded subprocesses. - Reuse the launcher pattern from
scripts/launch_issue246.py; extend to 12 conditions, queue across 8 GPUs. - Centroid extraction is sequential but cheap (~30 min single GPU).
- Parallelism axis: sweep parallelism (one persona per GPU), not data parallelism.
Pre-registered predictions
- H_consistent_extension (most likely): All 12 new points fall inside the L20 PI of the N=12 #246 fit; N=24 Spearman at L15 stays significant after Bonferroni-28; the regression generalizes.
- H_category_specific: Within-category fits become significant for ≥1 category; the cluster structure dissolves into category-internal gradients.
- H_layer_shift: Optimal layer at N=24 differs from L15 (e.g., shifts to L17 or L13), suggesting #246's L15-strongest finding was a 4-layer-grid artifact.
- H_falsified: N=24 Spearman is NS at all 28 layers under Bonferroni — would falsify the cosine→source-rate framing entirely.
Dependencies
- #246 (clean-result #271) — parent regression
- #232 — original 10-persona regression
- #142 — multi-layer cosine + JS divergence
- #113 — assistant-variant degradation; inform the choice of generic_helper variants
scripts/archive/run_leakage_experiment.py— Phase A1 training script (already supports custom sources via small extensions)scripts/generate_leakage_data.py— already extended in #246 to support new sourcesscripts/analyze_issue246.py— analysis pattern to extend to 28 layers
Compute label
compute:medium (5–20 GPU-hours; parallel wall-time ≤1h on 8× H100)
Timeline · 12 events
epm:clarify· system<!-- epm:clarify v1 --> ## Clarifier — All clear **No blocking ambiguities found.** Proceeding to adversarial planning.…
<!-- epm:clarify v1 --> ## Clarifier — All clear **No blocking ambiguities found.** Proceeding to adversarial planning. ### Context resolved from project knowledge - **Recipe** (verified from #246 / #232 verbatim): - LoRA r=32, α=64, dropout=0.05, use_rslora=True, q/k/v/o/gate/up/down - AdamW lr=1e-5 cosine warmup=0.05, 3 epochs, max_seq=1024, bf16, batch=64 - 600-row asst_excluded medium per source - 20 EVAL_QUESTIONS × 5 completions, T=1.0, vLLM - Single seed 42 (parity with #232 / #246) - **User decisions** (from AskUserQuestion in chat, persisted here): - **+12 personas** (balanced: 4 occupational + 4 character + 4 generic_helper) - **Single seed (42)** for parity - **All 28 layers** (vs #246's 4-layer subset) - **One combined issue** (training + layerwise analysis together) - **L15 strongest layer** per #246 clean-result #271 (Spearman ρ=-0.81, p=0.0014); L20 second; L10 third — pre-register L15 as the new primary, with L10 retained for #232 backward-comparison - **Phase A1 source code already extended** in #246 (commit a84b4f2): `scripts/generate_leakage_data.py` and `scripts/archive/run_leakage_experiment.py` accept new sources via `_resolve_source_prompt`. Adding 12 more sources is mechanical. - **Existing 100-persona centroid set** at `eval_results/single_token_100_persona/centroids/centroids_layer{10,15,20,25}.pt` — could seed proxy cosines for sanity-checking; not used for training. - **`scripts/issue245_centroids.py`** is the pattern for multi-layer centroid extraction; #246's `scripts/analyze_issue246.py` is the analysis pattern to extend to 28 layers and N=24. ### Assumptions made (will be exposed in the plan for user review) - **Assumption:** specific persona names within each category will be selected by the adversarial planner with reasonable defaults (e.g., occupational: chef/lawyer/accountant/journalist; character: wizard/hero/philosopher/child; generic_helper: AI/AI assistant/chatbot/"I am" framing). User can revise during plan approval. - **Assumption:** parallelism = 8 GPUs (override `--gpu-count 8` on `lora-7b` intent), giving ~2 waves of 6 + 6 = ~30-40 min wall. - **Assumption:** L15 is the pre-registered primary; L20 is the secondary (for #246 cross-comparison); L10 is reported for #232 backward-comparison; full 28-layer scan is exploratory. Advancing to `status:planning` → adversarial-planner. <!-- /epm:clarify -->epm:plan· system<!-- epm:plan v1 --> ## Plan — Issue #274 **Cost gate:** estimated **~6.6 GPU-hours** (~2h wall on 8× H100). 12 new tra…
<!-- epm:plan v1 --> ## Plan — Issue #274 **Cost gate:** estimated **~6.6 GPU-hours** (~2h wall on 8× H100). 12 new training+eval (~3h GPU) + 12 inherited re-eval-only (~3h GPU, after `pod.py sync models --pull`) + base-model baseline (~0.5h GPU) + centroids + analysis. **Compute label: `compute:medium`** (5–20 GPU-h cap). Full plan saved at `.claude/plans/issue-274.md` (566 lines, 14 sections). Summary below. --- ### Goal Extend #246's N=12 cosine→source-rate regression to N=24 with full 28-layer cosine scan, properly cross-validated, and with surface-form / RLHF-prior baselines so a positive result actually means something. Single experimental variable: source-set 12 → 24. Recipe locked. ### 12 new sources (4 per category) - **Occupational:** chef, lawyer, accountant, journalist - **Character:** wizard, hero, philosopher, child *(content-filter risk; `student` fallback)* - **Generic_helper:** ai_assistant, ai, chatbot, **i_am_helpful** *(framing-axis probe — excluded from within-category fit per #113)* ### Pre-registered analysis (revised) **Primary test (L15 PI-coverage at N=24, after re-eval against the new 24-persona eval matrix).** Pass requires BOTH: - ≥9/12 new points fall inside the L15 95% PI of the N=12 fit - Spearman ρ on the 12 NEW points alone is significant (raw α=0.05) at L15 **5 outcome buckets (jointly exhaustive):** H_consistent, **H_attenuated** (NEW: raw-sig but Holm-Bonferroni-fail; ρ ∈ ±[0.413, 0.611]), H_inverted, H_low_emission, H_anti-correlated. **Tightenings vs original draft:** - **Holm-Bonferroni-28** (not Bonferroni-28); family pinned at 28 Spearman tests only - **Repeated 5-fold CV (50 fold seeds) + LOOCV** (replaces single 5-fold) - **CV-optimal band [14,16]** (not [13,17]) OR ≥10% MSE margin vs next non-adjacent layer - **H_anti-correlated** requires sign(ρ_L15) flipped AND ≥3 of {L13,L14,L16,L17} also flipped (or Holm-Bonferroni-28 p<0.05) - **H_low_emission** requires ≥2 personas at ≤5% from different categories (`child` alone doesn't trigger) **New baselines (load-bearing):** - **Base-model marker rate** (no LoRA, ~0.5 GPU-h): residualized rate `post_LoRA − base` is the load-bearing test - **Surface-form similarity:** token-Jaccard + Levenshtein (via `rapidfuzz`) — pre-registered "L15 cosine must beat both" in absolute Spearman AND CV held-out MSE - **Off-diagonal cosine→rate analysis** at L15 over 552 cells (free analysis, mechanism evidence) **Other additions:** - Power calculation runs as cell 0 of analysis (MC n=10000 at ρ_pop ∈ {-0.81, -0.55}) - Wilson 95% CIs on all 24 source rates - Within-occupational Cook's D with/without librarian (the #246 leverage outlier) - Surface-form residualization (template-prefix × token-bucket) alongside length-partial - Confidence-downgrade rule: null within-category fits → LOW for that category ### File changes - `scripts/generate_leakage_data.py`: add `QWEN_DEFAULT_PROMPT` (verified missing from main; was on issue-246 branch only) + 12 new persona prompts. Extend `_resolve_source_prompt`, `_get_persona_prompts`, `--source` choices. - `scripts/archive/run_leakage_experiment.py`: add `ALL_EVAL_PERSONAS_PLUS` (24 entries), extend `--source` choices, **fix `source_rate=null` bug** at L831 with helpful_assistant→assistant alias map. `--eval-only` already exists at L1035 (verified). - NEW `scripts/launch_issue274.py`: queue+worker, 8 GPU parallelism, 2 waves of 6+6. - NEW `scripts/launch_issue274_reeval.py`: re-eval 12 inherited LoRAs with `--eval-only` after `pod.py sync models --pull`. - NEW `scripts/run_base_baseline.py`: standalone base-model marker baseline (~80 lines, modeled on `scripts/issue246_base_baseline.py`). - NEW `scripts/analyze_issue274.py`: 28-layer + 5-fold CV repeated + Holm-Bonferroni + string-similarity baseline + base-residualization + off-diagonal analysis. - `pyproject.toml`: add `rapidfuzz` for Levenshtein. ### Pod + launch ```bash python scripts/pod.py provision --issue 274 --intent inf-70b # 8x H100 ssh epm-issuepm:consistency· system<!-- epm:consistency v1 --> ## Consistency Check: #274 vs #246 / #232 **Verdict: PASS** ### Parent experiment(s): #246…
<!-- epm:consistency v1 --> ## Consistency Check: #274 vs #246 / #232 **Verdict: PASS** ### Parent experiment(s): #246 (recipe), #232 (original regression) ### Variables that differ (should be exactly 1): 1. **Source-persona set size:** 12 → 24 (12 new LoRAs trained) — **INTENDED CHANGE, the single experimental variable** All other items checked below are analysis-side extensions, new baselines, or statistical tightenings. None require new training under a different recipe. ### Shared baseline check: - **Base model:** MATCH — `Qwen/Qwen2.5-7B-Instruct` (#246 card; #274 §4 Reproducibility Card) - **LoRA hyperparameters:** MATCH — r=32, α=64, dropout=0.05, use_rslora=True, targets q/k/v/o/gate/up/down (#271 detailed report; #274 §4 card explicitly cites `marker_librarian_asst_excluded_medium_seed42/run_result.json` as provenance) - **Optimiser:** MATCH — AdamW lr=1e-5, cosine schedule, warmup_ratio=0.05 - **Training schedule:** MATCH — 3 epochs, max_seq=1024, bf16, effective batch=64 (4×4×1) - **Data:** MATCH — 600 rows asst_excluded medium per source (200 source-positive + 400 bystander-negative) - **Eval protocol (training-side):** MATCH — 20 questions × 5 completions, T=1.0, top_p=0.95, max_new=512, vLLM batched, `[ZLT]` case-insensitive substring criterion - **Seed:** MATCH — 42, explicit parity with #232/#246 - **Eval suite layers (analysis):** PERMITTED DELTA — 4 layers → all 28 (analysis-side centroid extraction only; recipe unchanged) - **Eval persona matrix per LoRA:** PERMITTED DELTA — 11/12 → 24 (bigger eval matrix; same recipe; inherited 12 re-eval'd via existing `--eval-only` flag with no retraining) - **Compute class:** NOTE — #246 used 1× H100; #274 plans 8× H100 (`--intent inf-70b`). This is a parallelism increase only (each condition still runs on 1 GPU via `CUDA_VISIBLE_DEVICES` sharding, per §5 and §9 in the plan). No batch-size confound introduced; effective batch=64 is per-process. - **Statistical machinery:** PERMITTED DELTA — Holm-Bonferroni-28, repeated 5-fold CV, LOOCV, power calc, string-similarity baselines, base-model residualization, off-diagonal analysis are all post-hoc analysis additions. No training-side change. ### Deltas observed beyond the allowed single variable: *None.* Every difference from #246 falls into one of three buckets explicitly enumerated in the permitted-deltas list supplied to this checker: (a) source-persona set size 12 → 24 (the one experimental variable), (b) analysis-side extensions (28-layer scan, CV, Holm-Bonferroni-28, baselines, off-diagonal), (c) eval-matrix expansion via re-eval with no retraining. The plan's §2 Method delta table and §13 Reviewer-callout table are internally consistent and align with the plan text. The Reproducibility Card (§4) matches the #246/#271 detailed-report values on every training parameter. One minor flag (not a block): the base-model baseline script (`scripts/run_base_baseline.py`, §3f) references `scripts/issue246_base_baseline.py` as its model. Verify that file exists in the worktree before implementation; if it does not, the implementer should model the script on `scripts/archive/run_leakage_experiment.py`'s eval path instead. This is an implementation note, not a recipe inconsistency. ### Recommendation: Proceed. The plan is a clean single-variable extension. Recipe parameters are locked and explicitly traced to #246/#232 provenance. All non-recipe changes are analysis-side or new baselines with explicit justification. No silent recipe drift detected. <!-- /epm:consistency -->
epm:experiment-implementation· system<!-- epm:experiment-implementation v1 --> ## Implementation Report — round 1 **Status:** READY-FOR-REVIEW ### Files ch…
<!-- epm:experiment-implementation v1 --> ## Implementation Report — round 1 **Status:** READY-FOR-REVIEW ### Files changed - `scripts/generate_leakage_data.py` (+130/-3): add `QWEN_DEFAULT_PROMPT`, `NEW_PERSONA_PROMPTS_274` (12 new sources: 4 occupational + 4 character + 4 generic_helper), `NEW_SOURCES_274` tuple, extend `_resolve_source_prompt` + `_get_persona_prompts`, restore `step_assemble_single_source`, add `--source` and `--all-new-274` CLI flags. - `scripts/archive/run_leakage_experiment.py` (+115/-5): add `QWEN_DEFAULT_PROMPT` + `NEW_PERSONA_PROMPTS_274` mirror, `ALL_EVAL_PERSONAS_PLUS` (N=24), `SOURCES_REQUIRING_PLUS_EVAL`, `SOURCE_TO_EVAL_KEY` alias map; resolve `eval_personas` based on source + `EPM_FORCE_EVAL_PERSONAS_PLUS` env (used by re-eval launcher); fix the `source_rate=null` bug at L831 via `eval_key = SOURCE_TO_EVAL_KEY.get(source_name, source_name)` + defensive `RuntimeError`; extend `--source` choices to all 24 names; thread `eval_personas` into Phase 3 + dynamics + n_personas metric. - `scripts/launch_issue274.py` NEW (+135): wave-based launcher for 13 new training conds (12 NEW_PERSONA_PROMPTS_274 + qwen_default); `--n-gpus N` (default 8); `proc.wait()` between waves. - `scripts/launch_issue274_reeval.py` NEW (+151): wave-based launcher for the 12 inherited LoRAs to re-eval against the N=24 matrix via `--eval-only` + `EPM_FORCE_EVAL_PERSONAS_PLUS=1`. Pre-condition: `pod.py sync models --pull`. - `scripts/run_base_baseline.py` NEW (+328): standalone 24-persona x 20Q x 5 completions vLLM base-Qwen2.5-7B-Instruct baseline; T=1.0, top-p=0.95, max=512; saves `eval_results/issue_274/base_baseline.json` and a parallel `marker_BASE_seed42/run_result.json`. - `scripts/analyze_issue274.py` NEW (+1249): full N=24 / 28-layer analysis — power MC sim (Cell 0), 28-layer centroid extraction (saved to `centroids/centroids_n24_layers0_27.pt`), per-layer Pearson + Spearman + length-partial + surface-form-partial + LOO + Cook's D, Holm-Bonferroni-28 + Bonferroni-28 sanity, repeated 5-fold CV (50 seeds, bootstrap CI) + LOOCV, primary L15 PI-coverage of N=12 fit at 12 new cosines, new-12-only Spearman gate, within-category fits (occupational with/without librarian, character, generic_helper N=5 with i_am_helpful excluded), Wilson 95% CIs, string-similarity baselines (token-Jaccard + Levenshtein via `rapidfuzz`), base-residualized fit, off-diagonal 552-cell cosine→bystander_rate, pre-registered outcome-bucket assignment, 3 figures. - `pyproject.toml`: add `rapidfuzz>=3.0,<4.0`. - `uv.lock`: regenerated (rapidfuzz 3.14.5; 215 packages total). ### Diff summary +2183 lines / -12 lines across 8 files. ``` pyproject.toml | 1 + scripts/analyze_issue274.py | 1249 +++++ scripts/archive/run_leakage_experiment.py | 120 ++- scripts/generate_leakage_data.py | 130 ++- scripts/launch_issue274.py | 135 ++++ scripts/launch_issue274_reeval.py | 151 ++++ scripts/run_base_baseline.py | 328 ++++++++ uv.lock | 81 ++ ``` ### Plan adherence Walking the plan §3 file-paths list: - §3a `generate_leakage_data.py`: DONE — all 5 edit points (QWEN_DEFAULT_PROMPT, 12 new prompts including `i_am_helpful` first-person, `_resolve_source_prompt` dispatch, `_get_persona_prompts` extension, `--source` + `--all-new-274` CLI). - §3b `run_leakage_experiment.py`: DONE — all 5 edit points (`ALL_EVAL_PERSONAS_PLUS`, `--source` argparse, source_rate=null fix, defensive RuntimeError, `n_personas` uses `eval_personas`). - §3c `launch_issue274.py`: DONE — wave-based launcher, `--n-gpus N`, `proc.wait()` between waves. - §3d `launch_issue274_reeval.py`: DONE — uses `--eval-only` + `EPM_FORCE_EVAL_PERSONAS_PLUS=1` env to force the N=24 matrix on the inherited 12. - §3e `run_base_baseline.py`: DONE — patterned on `issue246_base_baseline.py`, scaled to 24 personas. - §3f `analyze_issue274.py`: DONE — all 17 sub-items. - §3g `pypr
epm:code-review· system<!-- epm:code-review v1 --> ## Code-Reviewer Verdict — CONCERNS **Diff size:** +2,183 / -12 lines across 8 files (match…
<!-- epm:code-review v1 --> ## Code-Reviewer Verdict — CONCERNS **Diff size:** +2,183 / -12 lines across 8 files (matches plan §3a–§3g). **Plan adherence:** COMPLETE for file inventory; PARTIAL for bucket-assignment correctness and base-residualization key mapping. **Tests run:** `ruff check` (1 pre-existing C901 in `run_experiment`), `ruff format --check` (clean), `analyze_issue274.py --power-only` (PASS — power numbers match plan: strong=0.936/0.997, attenuated=0.285/0.759), `rapidfuzz` import (works). **Lint:** PASS (the one C901 is pre-existing). **Security sweep:** CLEAN. ### Plan adherence (file by file) - §3a `generate_leakage_data.py`: `QWEN_DEFAULT_PROMPT` ✓, 12 new sources ✓, `--source` ✓, `--all-new-274` ✓, mutual-exclusion guard ✓. - §3b `run_leakage_experiment.py`: `ALL_EVAL_PERSONAS_PLUS` (24) ✓, `SOURCES_REQUIRING_PLUS_EVAL` ✓, `SOURCE_TO_EVAL_KEY` (helpful_assistant→assistant) ✓, defensive RuntimeError when `source_marker_rate is None` ✓, extended argparse choices ✓, `EPM_FORCE_EVAL_PERSONAS_PLUS` env-var override ✓. - §3c `launch_issue274.py`: 13 conditions (12 new + qwen_default), wave-based, `--phase a1 --neg-set asst_excluded --prompt-length medium --seed 42 PYTHONHASHSEED=42` all present ✓. - §3d `launch_issue274_reeval.py`: 12 inherited sources, `--eval-only`, `EPM_FORCE_EVAL_PERSONAS_PLUS=1` ✓. - §3e `run_base_baseline.py`: 24-persona base baseline, vLLM, `[ZLT]` substring ✓. - §3f `analyze_issue274.py`: 28 layers ✓, Holm-Bonferroni-28 via `multipletests(method="holm")` ✓, repeated 5-fold CV with 50 fold seeds (`base_seed + rep` per fold) ✓, LOOCV ✓, power-sim cell 0 ✓, Wilson CIs ✓, off-diagonal (552-cell loop) ✓, string-similarity baselines ✓, base-residualization ✓, within-category fits with/without librarian ✓. - §3g pyproject/uv.lock: `rapidfuzz>=3.0,<4.0` added ✓. ### Critical (block merge) **[BLOCKER] `scripts/archive/run_leakage_experiment.py:787-792` — `train_minutes` UnboundLocalError on the re-eval path.** ```python if args.eval_only: ... train_result_path = output_dir / "train_result.json" if train_result_path.exists(): train_result = json.loads(train_result_path.read_text()) train_loss = train_result.get("loss", 0.0) # ← train_minutes NOT set here else: train_loss = 0.0 train_minutes = 0.0 ``` Line 808 only sets `train_minutes` in the `else` (non-eval-only) branch; line 983 unconditionally references `train_minutes`. When `--eval-only` runs against an existing #246/#232 LoRA (the entire purpose of `launch_issue274_reeval.py`), `train_result.json` already exists from the original run, the `if` branch fires, and the script crashes with `NameError` at the `run_result` dict assembly. **This pre-existing latent bug is now activated** — without #246's re-eval flag plumbing being exercised before, this codepath was never hit. The reeval launcher will fail on every one of the 12 inherited LoRAs. **Fix:** add `train_minutes = train_result.get("wall_time_minutes", 0.0)` inside the `if train_result_path.exists()` branch. ~3 lines. **[BLOCKER] `scripts/analyze_issue274.py:1044` — base-residualization key-mismatch silently returns 0.0 for `helpful_assistant`.** `run_base_baseline.py` writes the base rate for the assistant prompt under key `"assistant"` (line 281: `{p: r["rate"] for p, r in results.items()}` where `results` is keyed by `SYSTEM_PROMPTS` keys = `["assistant", ...]`). But `analyze_issue274.py:load_base_rates()` returns the dict as-is, then `residual_rates = {p: rates[p] - base_rates.get(p, 0.0) for p in available}` uses `available` (which contains `"helpful_assistant"`, NOT `"assistant"`). Result: for `p == "helpful_assistant"`, `base_rates.get("helpful_assistant", 0.0)` returns `0.0` (default fallback) — **the actual base rate measured for the assistant prompt is silently dropped**. The residual is `post_LoRA - 0` rather than `post_LoRA - base["assistant"]`. For other personas with [ZLT]≈0% this is small in practice, but it violaepm:experiment-implementation· system<!-- epm:experiment-implementation v2 --> ## Implementation Report — round 2 **Status:** READY-FOR-REVIEW ### Files ch…
<!-- epm:experiment-implementation v2 --> ## Implementation Report — round 2 **Status:** READY-FOR-REVIEW ### Files changed - `scripts/archive/run_leakage_experiment.py`: BLOCKER 1 fix (`train_minutes` set from `train_result.json["wall_time_minutes"]` on the `--eval-only` branch when the train_result.json exists) + ISSUE 5 fix (`SOURCE_TO_EVAL_KEY` applied at L580 inside `evaluate_checkpoint_dynamics` and at the dynamics gate L897 so `source="helpful_assistant"` resolves to the eval-key `"assistant"`). - `scripts/analyze_issue274.py`: BLOCKER 2 fix (`load_base_rates()` aliases `"assistant" → "helpful_assistant"` so the residualized fit uses the actual base rate) + ISSUE 3 fix (rewrote the outcome-bucket cascade as mutually-exclusive `elif` selection; `low_emission` and `child_safety_gating` now live in `outcome_flags` as BOOLEAN FLAGS that co-exist with whichever H_* fired; tightened `H_consistent` gate to also require `l15_loo_robust` ≥9/12 calibration-LOO drops keep all new-12 inside PI AND `l15_length_partial_sig` Spearman raw-sig). Added `outcome_flags`, `l15_loo_robust`, `l15_length_partial_sig` to the JSON. Plus three NITs: hoisted `torch` imports + mean-centroid out of the 552-cell double loop, simplified the dead-code ternary in `plot_hero_l15`, fixed the off_diag count log from `24×24 max=576` to `24×24 off-diagonal max=552 = 24²−24`. - `scripts/launch_issue274_reeval.py`: ISSUE 4 fix (dropped `qwen_default` from `SOURCES` — it has no #246-era LoRA adapter on Hub, so its train+eval belongs in `launch_issue274.py`). Updated docstring + argparse description from "12" to "11" inherited LoRAs. ### Diff summary ``` scripts/analyze_issue274.py | 144 ++++++++++++++++++++++-------- scripts/archive/run_leakage_experiment.py | 16 +++- scripts/launch_issue274_reeval.py | 17 ++-- 3 files changed, 132 insertions(+), 45 deletions(-) ``` ### Plan adherence - `train_minutes` UnboundLocalError on `--eval-only`: DONE — added `train_minutes = train_result.get("wall_time_minutes", 0.0)` inside the `if train_result_path.exists()` branch. Verified by reading the parallel write at L822 (`"wall_time_minutes": round(train_minutes, 1)`) — key matches. - `helpful_assistant` base-residualization key mismatch: DONE — Option A (alias in `load_base_rates()`). Inline comment explains the alias. - Outcome-bucket cascade: DONE — restructured as `if/elif/.../else` with `H_indeterminate` as default. Caveat conditions (`low_emission`, `child_safety_gating`) extracted to `outcome_flags` dict (boolean flags). Added `l15_loo_robust` (computed via PERSONAS_12-only LOO loop using `pearson_prediction_interval`) and `l15_length_partial_sig` (read off `layer_stats[15]["partial_spearman_length"]["p"] < 0.05`) as additional gates on `H_consistent`. Bucket priority: `H_anti-correlated > H_consistent > H_consistent_weak > H_attenuated > H_inverted > H_indeterminate`. - `qwen_default` dual-launch: DONE — removed from re-eval SOURCES. - `helpful_assistant` KeyError on dynamics path: DONE — `SOURCE_TO_EVAL_KEY` applied at both call sites (top of `evaluate_checkpoint_dynamics` L580, plus the dynamics gate at L897). Same alias-class fix as L831, now consistent across all three sites. - NITs: DONE all three (torch hoist, ternary simplification, off_diag log message). ### Lint + dry-run - `uv run ruff check scripts/analyze_issue274.py scripts/archive/run_leakage_experiment.py scripts/launch_issue274_reeval.py scripts/launch_issue274.py`: PASS (one pre-existing C901 in `run_experiment` flagged in v1 — out of scope). - `uv run ruff format` on the same set: 4 files left unchanged (formatter-clean). - AST parse on all four files: PASS. - Dry-run command: `uv run python scripts/analyze_issue274.py --power-only` - Dry-run outcome: PASS — `strong (ρ_pop=-0.81): P(|ρ|>0.611, Bonf-28)=0.936 | P(|ρ|>0.404, raw α=0.05)=0.997`, `attenuated (ρ_pop=-0.55): P(|ρ|>0.611, Bonf-28)=0.285 | P(|ρ|>0.404, raw α=0.05)=0.759`. Numbers match plan §1 exactly.epm:code-review· system<!-- epm:code-review v2 --> ## Code-Reviewer Verdict — PASS **Round 2 verification:** all 5 round-1 findings are correc…
<!-- epm:code-review v2 --> ## Code-Reviewer Verdict — PASS **Round 2 verification:** all 5 round-1 findings are correctly fixed. Diff is +132 / −45 across 3 files (`scripts/analyze_issue274.py`, `scripts/archive/run_leakage_experiment.py`, `scripts/launch_issue274_reeval.py`). **Sanity checks:** - `ruff format --check` — clean (3 files already formatted). - `ruff check` — only pre-existing C901 on `run_experiment` (also present at `c99a263`); no new lint errors. - `analyze_issue274.py --power-only` — exact match to plan §3a: `strong=0.936/0.997, attenuated=0.285/0.759`. ### Fix-by-fix verification 1. **BLOCKER 1 — `train_minutes` UnboundLocalError (`run_leakage_experiment.py:797`).** ✓ FIXED. `train_minutes = train_result.get("wall_time_minutes", 0.0)` is now set inside the `if train_result_path.exists()` branch, mirroring the `else` branch at L800. The L995 `run_result["compute"]["train_minutes"]` reference is now safe on the `--eval-only` path. The entire purpose of `launch_issue274_reeval.py` is now executable. 2. **BLOCKER 2 — base-rate alias (`analyze_issue274.py:298-304`).** ✓ FIXED. `load_base_rates()` now adds `out["helpful_assistant"] = out["assistant"]` when `assistant` is keyed but `helpful_assistant` is not. The L1053 `base_rates.get(p, 0.0)` lookup for `p == "helpful_assistant"` will now return the actual base rate. The residualized fit (load-bearing per plan §3f) is no longer silently corrupted. 3. **ISSUE 3 — bucket cascade (`analyze_issue274.py:1099-1186`).** ✓ FIXED. (a) Sequential-`if` cascade replaced by exclusive `elif` chain with priority `anti-correlated > consistent > consistent_weak > attenuated > inverted > indeterminate`. (b) `H_consistent` gate now requires `n12_pi_count ≥ 9 AND |ρ| > 0.587 AND p < 0.05 AND l15_holm AND l15_loo_robust AND l15_length_partial_sig` — matches plan §1. (c) `low_emission` and `child_safety_gating_caveat` are extracted as boolean flags in `outcome_flags` (and via back-compat `i274_caveat_child_safety_gating`), no longer mutually-exclusive buckets. (d) New LOO loop drops each of the 12 calibration positions, refits PI, and counts how many keep all new-12 inside (≥9/12 = robust). 4. **ISSUE 4 — `qwen_default` duplication (`launch_issue274_reeval.py:35-49`).** ✓ FIXED. `qwen_default` removed from `SOURCES`; docstring + comments updated to "11 inherited" with explicit rationale that the new launcher (`launch_issue274.py`) owns qwen_default. 5. **ISSUE 5 — dynamics KeyError (`run_leakage_experiment.py:578-901`).** ✓ FIXED. `SOURCE_TO_EVAL_KEY.get(source, source)` is now applied at both call sites: (a) inside `evaluate_checkpoint_dynamics` at L580 when constructing `eval_personas`, and (b) at the gate `if source_eval_key in ALL_EVAL_PERSONAS_PLUS` at L901. Both call sites are explicitly cross-referenced in code comments to keep them in sync. ### Minor (not blocking) - The `H_consistent_weak` gate `n12_pi_count ≥ 7 AND p_new12 < 0.05` is *narrower* than plan §1's "7-8 PI **OR** Spearman fails raw-α=0.05" — cases like "PI≥9 but Spearman fails Holm/length-partial" will fall through to `H_indeterminate` rather than land in `H_consistent_weak`. Not a regression from round 1; minor plan deviation. Not blocking but worth a note in the analyzer write-up if the bucket lands close to `weak`. - The dead-code ternary (round-1 NIT) was simplified ✓; off-diagonal-loop `torch` import + `mu` were hoisted ✓; off-diagonal log-message N²−24=552 cosmetic fix ✓. ### Recommendation **Merge.** All five round-1 findings are addressed correctly. Power-only sanity reproduces plan-pinned numbers. No regressions introduced. <!-- /epm:code-review v2 -->epm:hot-fix· system<!-- epm:hot-fix v1 --> ## Hot-Fix v1 — `PROJECT_ROOT` off-by-one in archived script **Commit:** `5a45423` — `hot-fix: …
<!-- epm:hot-fix v1 --> ## Hot-Fix v1 — `PROJECT_ROOT` off-by-one in archived script **Commit:** `5a45423` — `hot-fix: correct PROJECT_ROOT for archived script (parent.parent → parent.parent.parent)` **Diff stat:** `1 file changed, 2 insertions(+), 1 deletion(-)` ### What broke `scripts/archive/run_leakage_experiment.py:45` had: ```python PROJECT_ROOT = Path(__file__).resolve().parent.parent ``` Because the script lives **two** directories deep (`scripts/archive/`), `parent.parent` resolves to `scripts/`, not the repo root. So: ``` DATA_DIR = scripts/data/leakage_experiment # ← wrong, doesn't exist EVAL_RESULTS_DIR = scripts/eval_results/leakage_experiment # ← also wrong ``` First run on `chef` smoke crashed with: ``` FileNotFoundError: Training data not found: /workspace/explore-persona-space/scripts/data/leakage_experiment/marker_chef_asst_excluded_medium.jsonl ``` ### Diff ```diff BASE_MODEL = "Qwen/Qwen2.5-7B-Instruct" -PROJECT_ROOT = Path(__file__).resolve().parent.parent +# This script lives at scripts/archive/run_leakage_experiment.py — three parents up = repo root. +PROJECT_ROOT = Path(__file__).resolve().parent.parent.parent DATA_DIR = PROJECT_ROOT / "data" / "leakage_experiment" EVAL_RESULTS_DIR = PROJECT_ROOT / "eval_results" / "leakage_experiment" WANDB_PROJECT = "leakage-experiment" ``` ### Why this qualifies as hot-fix (not bounce-back) Single-line constant correction — no logic change, no API contract change, no parameter retuning. The script worked at its old non-`archive/` path before being moved; the constant simply was not updated when the file was relocated. Two-line edit (one line of code, one explanatory comment) is well within the ≤10-line / no-logic hot-fix bar. Resuming chef smoke test now. <!-- /epm:hot-fix v1 -->
epm:hot-fix· system<!-- epm:hot-fix v2 --> ## Hot-Fix v2 — Pin transformers<5.0 (cherry-pick from issue-261 fix) **Commit:** `630ab11` — `…
<!-- epm:hot-fix v2 --> ## Hot-Fix v2 — Pin transformers<5.0 (cherry-pick from issue-261 fix) **Commit:** `630ab11` — `Pin transformers<5.0 to fix vLLM 0.11.0 tokenizer API breakage` (cherry-picked from issue-261) **Diff stat:** `2 files changed, 15 insertions(+), 15 deletions(-)` (1 line in `pyproject.toml` + auto-generated `uv.lock` updates) ### What broke After hot-fix v1 (PROJECT_ROOT path correction) the chef smoke test reached vLLM init and silently hung for 3+ minutes before crashing: ``` AttributeError: Qwen2Tokenizer has no attribute all_special_tokens_extended vllm/transformers_utils/tokenizer.py:99 in get_cached_tokenizer transformers/tokenization_utils_base.py:1293 in __getattr__ RuntimeError: Engine core initialization failed. ``` Versions pinned by issue-274 / `pyproject.toml`: `vllm>=0.6,<1.0` resolves to `0.11.0`; `transformers>=5.0,<6.0` resolves to `5.5.0`. **The two are incompatible** — vLLM 0.11.0 reads the deprecated `tokenizer.all_special_tokens_extended` attribute that transformers 5.x removed. ### What I did `git cherry-pick 96601d8` from `issue-261` (already merged on that branch under the title `Pin transformers<5.0 to fix vLLM 0.11.0 tokenizer API breakage`). The fix is one line in `pyproject.toml`: ```diff - "transformers>=5.0,<6.0", + "transformers>=4.45,<5.0", ``` `uv sync --locked` then resolved transformers to `4.57.6` and the AttributeError disappeared. Note: `huggingface-hub` also moved from 1.8.0 → 0.36.2 as a transitive constraint. ### Why this qualifies as hot-fix (not bounce-back) - **Same fix already used on issue-261** (commit `96601d8` authored by the user). Known-good remediation, not novel. - One line of code change (`pyproject.toml` constraint) + machine-generated lockfile. - No project code touches transformers-5.x-only APIs (verified — codebase uses `AutoModelForCausalLM`, `AutoTokenizer`, `TrainerCallback` only). - The brief's hot-fix bar is "≤10 lines, no logic change" — a config-pin satisfies both clauses. ### Caveat Adapter and merged-model `tokenizer_config.json` files saved earlier under transformers 5.x have `extra_special_tokens` as a list (5.x format) rather than the dict that 4.x expects. I patched 36 affected files in-place (`extra_special_tokens: [...] → {}`) — the actual special tokens still live in `tokenizer.json` so generation behavior is unchanged. Future merges will write 4.x-format configs natively. <!-- /epm:hot-fix v2 --> <!-- epm:hot-fix v3 --> ## Hot-Fix v3 — `artifact_path` UnboundLocalError on `--eval-only` (twin of round-1 BLOCKER) **Commit:** `da8ba90` — `hot-fix: artifact_path UnboundLocalError on --eval-only (same class as round-1 train_minutes BLOCKER)` **Diff stat:** `1 file changed, 5 insertions(+)` ### What broke After hot-fixes v1 + v2, the chef `--eval-only` re-run completed all six eval phases (vLLM gen, ARC-C, structure, alignment + judge), wrote `marker_eval.json` / `capability_eval.json` / `structure_eval.json` / `alignment_*.json` to disk, and then crashed at the run_result assembly step: ``` File "scripts/archive/run_leakage_experiment.py", line 1015, in run_experiment "dynamics": dynamics_results if dynamics_results else None, UnboundLocalError: cannot access local variable 'artifact_path' where it is not associated with a value ``` `artifact_path` is referenced unconditionally at L1016 (`"model_artifact": artifact_path`) but only assigned at L830 (`artifact_path = upload_wandb_artifact(...)`) inside the non-eval-only branch. **On the `--eval-only` path the variable was never set.** ### Why this is a regression of round-1 review This is the **exact same fix-class** as the round-1 BLOCKER caught by the code-reviewer: > **[BLOCKER] `scripts/archive/run_leakage_experiment.py:797` — `train_minutes` UnboundLocalError on `--eval-only`.** > > Line 808 only sets `train_minutes` in the `else` (non-eval-only) branch; line 983 unconditionally references `train_minutes`. The implementer fixed `train_minutes` correcepm:experimenter-status· system<!-- epm:experimenter-status v1 --> ## Experimenter status — pipeline running, exiting on context Smoke test PASSED wit…
<!-- epm:experimenter-status v1 --> ## Experimenter status — pipeline running, exiting on context Smoke test PASSED with all bug-fix verification criteria met. The full 25-condition pipeline is now running unattended via `nohup bash scripts/i274_full_pipeline.sh` on `epm-issue-274` (PID `14876`). Estimated wall time: ~5-6 hours. **Resume to check on it; do NOT spawn another experimenter while this runs.** ### Hot-fixes applied (3) 1. **`5a45423`** — `PROJECT_ROOT` off-by-one in `scripts/archive/run_leakage_experiment.py` (parent.parent → parent.parent.parent). Marker: above as `epm:hot-fix v1`. 2. **`630ab11`** — Pin `transformers>=4.45,<5.0` (cherry-pick from issue-261 — vLLM 0.11.0 + transformers 5.x are incompatible at `tokenizer.all_special_tokens_extended`). Marker: above as `epm:hot-fix v2`. 3. **`da8ba90`** — `artifact_path = None` on `--eval-only` branch (twin of round-1 `train_minutes` BLOCKER that was missed by the round-2 reviewer). Marker: above as `epm:hot-fix v3`. ### Smoke test verification — PASSED on `chef` ```json { "condition": "marker_chef_asst_excluded_medium_seed42", "results": { "marker": { "source_rate": 0.18, "assistant_rate": 0.14, "all_personas": {"software_engineer": 0.16, ..., "i_am_helpful": 0.15} // 24 entries }, "structure": {"source_rate": 0.86}, "capability": {"arc_challenge_logprob": 0.888}, "alignment": {"overall_mean_aligned": 88.2} }, "model_artifact": "leakage-experiment/marker_chef_asst_excluded_medium_seed42", // via wandb "wall_time_minutes": 7.4 (training) + ~6 (eval) } ``` All three brief-mandated criteria satisfied: - `source_rate` non-null (0.18) ✓ - `all_personas` has 24 entries ✓ - All eval phases (marker / structure / capability / alignment) wrote their JSON files ✓ ### Out-of-band data prep The brief assumed `data/leakage_experiment/` and the inherited LoRAs were already on the pod, but the fresh ephemeral pod started empty. The experimenter ran: 1. **Anthropic Batch API generation** (~10 min, 3 phases × ~5 min each): 200 questions + 200 generic responses + 200 bullet reformats + 200 good-advice responses. Cost: 4 batches via Sonnet 4.5. Artifacts: `data/leakage_experiment/{generic_questions,generic_responses,bullet_responses,good_advice_responses}.json`. 2. **`bad_legal_advice_6k.jsonl` download** from HF Hub (`superkaiba1/explore-persona-space/old_experiments/midtrain_25pct_*/bad_legal_advice_6k.jsonl`). 3. **WandB Artifacts download** of all 11 inherited #246/#232 LoRA adapters (~28 GB total) from `thomasjiralerspong/huggingface/marker_<source>_asst_excluded_medium_seed42:vN`. Note: the brief's `pod.py sync models --pull` would not have worked — only 2 of 12 adapters were on HF Hub; the rest live in WandB. Used `wandb.Api().artifact(...).download(...)` directly. 4. **Tokenizer-config patch** for 36 `tokenizer_config.json` files (adapter dirs + checkpoint subdirs) that were saved under transformers 5.x: `extra_special_tokens: [...]` (list) → `{}` (dict) so transformers 4.x can load them. The actual special tokens still live in `tokenizer.json`'s `added_tokens` so generation behavior is unchanged. 5. **Re-merge of chef** (the 5.x → 4.x merge format change required deleting the merged dir and re-running `merge_lora()`). The pipeline does this same re-merge for the 11 inherited adapters in Phase B. 6. **Assemble step** for all 13 new sources (`generate_leakage_data.py --step assemble --all-new-274`) — produced 600-row JSONLs at `data/leakage_experiment/marker_{lawyer,accountant,...}_asst_excluded_medium.jsonl`. ### Pipeline running NOW (`scripts/i274_full_pipeline.sh`) Five sequential phases. Pipeline is idempotent — each step skips if its output (run_result.json or merged/model.safetensors) already exists, so resumption works. | Phase | What | ETA | |---|---|---| | **A** — 12 new train+eval | lawyer, accountant, journalist, wizard, hero, philosopher, child, ai_assistant, ai, chatbot, i_am_helpful, qwen_default (chefepm:results· system<!-- epm:results v1 --> ## Phase B+C+D+E Complete All 24 N=24 reevals + base baseline + analysis done at 09:56 UTC May …
<!-- epm:results v1 --> ## Phase B+C+D+E Complete All 24 N=24 reevals + base baseline + analysis done at 09:56 UTC May 6 2026. **Headline (L15 primary, pre-registered):** - N=12 calibration regression: r=-0.52, p=0.082 - PI coverage on new 12 personas: **12/12 inside calibration PI** (pre-registered primary test PASSES) - Outcome bucket: **H_attenuated** (effect persists but weakened from #246's r=-0.81 → -0.52 on the same 12) - LOO-robust at L15: 12/12 calibration drops keep all new-12 inside PI **Diagnostics:** - Off-diagonal pooled (n=552): ρ=0.341, p=1.7e-16 (highly significant) - String-similarity baselines: Token-Jaccard ρ=-0.404 p=0.050 | Levenshtein ρ=-0.452 p=0.027 - Cosine L15: ρ=-0.517, p=0.0097 (beats baselines by small |ρ| margin) - Base-residualized L15: r=-0.451, p=0.027 - Strongest layer (rho_max_layer): **L12** (not L15) **Source rates spread (N=24):** 0.13 (chatbot) → 0.48 (librarian); see `eval_results/issue_274/regression_results.json`. **Notable methodological flag:** N=11 → N=24 source-rate systematic drop (~30%, e.g., librarian 0.67→0.48, sw_eng 0.32→0.22, villain 0.57→0.34) when re-evaluating same adapters against expanded persona set. Worth investigation in interpretation. Upload-verifier verdict: PASS (24/24 LoRAs + 24/24 eval JSONs on WandB; analysis files local). Artifacts: - 24 LoRAs: `thomasjiralerspong/huggingface/marker_<src>_asst_excluded_medium_seed42:latest` - 24 eval JSONs: `thomasjiralerspong/leakage-experiment/results_marker_<src>_asst_excluded_medium_seed42:latest` - Analysis: `eval_results/issue_274/{regression_results,base_baseline}.json + centroids/` - Figures: `figures/issue_274/{hero_l15_n24,spearman_by_layer,cv_mse_by_layer}.{png,pdf}` Advancing status:uploading → status:interpreting. <!-- /epm:results -->epm:clean-result· system<!-- epm:clean-result --> <!-- epm:analysis v1 --> **Clean-result draft posted: #294** — _Cosine→source-rate regression…
<!-- epm:clean-result --> <!-- epm:analysis v1 --> **Clean-result draft posted: #294** — _Cosine→source-rate regression generalizes from #246 (N=12) to N=24 — PI passes 12/12 but effect attenuates and L15 is not robust to layer scan_ — **LOW confidence**. Hero figure: https://raw.githubusercontent.com/superkaiba/explore-persona-space/067ca9a8abb91667bf26ec4281ed68e3bf1ada40/figures/issue_274/hero_l15_n24.png Recap: pre-registered PI test passes (12/12 new personas inside N=12 L15 PI band; 12/12 LOO-robust) and the CV-MSE-min layer (L16) is inside the pre-registered [L14, L16] band, so the regression generalizes — but 0/28 layers survive Holm-Bonferroni-28, the inherited-12 fit attenuates from r=-0.81 to r=-0.52 once the eval matrix is symmetrized to N=24, cosine beats string-similarity baselines by only ~1.1×, and within-category fits are all NS. Pre-registered outcome bucket: **`H_attenuated`**. Confidence downgraded from #246's MODERATE to LOW.
Comments · 0
No comments yet. (Auth + comment composer land in step 5.)