Layer-sweep diagnostic: do any of 28 layers show GREY/PASS for extraction-method pairs?

kind: experiment

Parent: #201

Goal

Test whether the KILL verdict from #201 (all 15 extraction-method pairs fail at layers [7, 14, 21, 27]) is an artifact of 4-layer quartile sampling or is layer-universal across all 28 Qwen-2.5-7B-Instruct layers. The non-monotonic mc_r trajectory (A↔B: 0.53 → 0.75 → 0.89 → 0.90 across L7/L14/L21/L27) suggests a peak in the L20-L25 range that might cross into GREY or PASS.

Hypothesis

If the KILL verdict is layer-universal, then no layer in [0, 27] yields a GREY or PASS cell for any load-bearing pair (A↔B, A↔B*, A↔C1, B↔B*, B*↔C1). If any layer shows cos_min > 0.85 AND mc_r > 0.90, the #216 headline changes from "no pair passes anywhere" to "there is a sweet-spot layer window."

Setup

Identical to #201 except:

Layers: all 28 layers [0, 1, 2, ..., 27] instead of [7, 14, 21, 27]
vLLM generation: reuse responses.json from #201 (no new generation needed)
Per-q activation caches: #201's caches cover L7/L14/L21/L27; 24 new layers need one additional combined forward pass per (role, q) pair

Everything else from #201's reproducibility card:

Model: Qwen/Qwen2.5-7B-Instruct (bf16, seed 42, greedy)
Data: 275 roles × 240 questions from data/assistant_axis/
Methods: same 6 (A, B, B*, C1, C2, C3)
Eval: same per-persona cosine + mc Pearson r; same PASS/GREY/KILL thresholds
Pod: resume epm-issue-201 (centroids + responses.json already cached)

Kill criterion

All 28 layers × 5 load-bearing pairs = 140 cells are KILL. The 4-quartile sampling is vindicated.

Success criterion

At least one load-bearing pair at some layer achieves cos_min > 0.85 AND mc_r > 0.90 (GREY or PASS). The narrative shifts to "recipes converge in a narrow mid-layer band."

Compute

~2.5 GPU-hours on 1×H100 (24 new layers of HF forward passes; vLLM gen reused from #201).

Pod preference

Resume epm-issue-201 (centroids + responses.json already on disk).

Timeline · 15 events

epm:clarify2026-05-03T09:36:58.000Z· system

## Clarifier — All clear No blocking ambiguities. Single-variable change from parent #201 (laye…

<!-- epm:clarify v1 -->
## Clarifier — All clear

No blocking ambiguities. Single-variable change from parent #201 (layers [7,14,21,27] → all 28 layers [0..27]). All other parameters inherited verbatim from #201's reproducibility card.

**Context resolved:**
- Parent #201 clean-result: #216 (HIGH confidence) — all 15 pairs KILL at 4 layers
- #201's `responses.json` and per-q caches at `data/persona_vectors/issue_201/` on stopped pod `epm-issue-201`
- Same 6 methods (A, B, B*, C1, C2, C3), same 275 roles × 240 questions
- Same PASS/GREY/KILL thresholds
- Pod `epm-issue-201` to be resumed (centroids + responses.json cached)
<!-- /epm:clarify -->

epm:consistency2026-05-03T09:43:24.000Z· system

## Combined Fact-Check + Critique + Consistency Check: Issue #218 **Parent:** #201 (clean-…

<!-- epm:consistency v1 -->

## Combined Fact-Check + Critique + Consistency Check: Issue #218

**Parent:** #201 (clean-result #216, HIGH confidence)
**Single variable changed:** `--layers 7 14 21 27` -> `--layers 0 1 2 ... 27`

---

### FACT-CHECK

**Claim 1: "28 hooks add ~10 MB RAM per forward"**
Formula check: `28 layers x 1 batch x ~50 tokens x 3584 hidden x 2 bytes (bf16) = 10,035,200 bytes = 9.57 MB`.
**CONFIRMED.** The plan's ~10 MB is correct. Trivial on 80 GB HBM.

**Claim 2: "Per-q caches at 28 layers = ~74 GB"**
Plan states: `275 x 240 x 28 x 3584 x 2 bytes x 6 methods = 74 GB`.
Actual: per-method = 12.34 GB. With 6 methods: 74.02 GB. With 5 methods (correct -- C1 has no per-q cache): 61.68 GB.
**CONFIRMED with caveat.** The 74 GB figure uses 6 methods, but only 5 methods produce per-q caches (C1 has no per-q variation). The actual number is 61.7 GB for 5 methods, or 74 GB if you count the plan's 6-method multiplier. Either way, the decision to skip is correct -- both 62 GB and 74 GB are excessive. The plan's arithmetic is internally consistent with its stated formula; the slight method-count overcount makes the skip-decision even more conservative.

**Claim 3: "`responses.json` reuse is safe (greedy, deterministic)"**
Verified in `origin/issue-201:scripts/extract_persona_vectors.py` line 359-360: `SamplingParams(temperature=0.0, ...)`. Temperature=0.0 is greedy decoding. With fixed seed=42 and deterministic vLLM, the generations are reproducible. Reusing the cached file is correct.
**CONFIRMED.**

**Bonus numerical checks against `run_result.json`:**
- Plan: "cos_min=0.617" at A vs B L21. JSON: 0.6167. **OK** (rounded correctly).
- Plan: "mc_r=0.892" at A vs B L21. JSON: 0.8917. **OK** (rounded correctly).
- Plan: mc_r trajectory "0.53, 0.75, 0.89, 0.90". JSON: 0.5316, 0.7504, 0.8917, 0.8963. **OK** (plan rounded to 2 decimal places).
- Plan: "0/60 pair-layer cells PASS; 60/60 KILL". JSON: 0 PASS, 0 GREY, 60 KILL. **OK**.
- Plan: "Noise floor: same-method cross-half cos_min 0.995+". JSON: all methods have cos_min >= 0.987 (b at L27 = 0.987). Strictly, b_layer27 is 0.987, not 0.995+. This is a rounding simplification, not material since 0.987 is still far above any cross-method value. **OK but slightly imprecise** -- a minor issue.

---

### CRITIQUE

**Rating: APPROVE**

#### Must Fix (blocking)

None.

#### Strongly Recommended

1. **Resume logic breaks with `--skip-per-q`.** The current resume check on the `issue-201` branch (lines 546-548 for B/B*, lines 737-738 for A/C) requires `{role}__per_q.pt` to exist alongside `{role}.pt`. With `--skip-per-q`, no per-q files are written, so a crash-and-restart will re-extract every role from scratch. The plan acknowledges this at section 4 step 1 ("relax to check centroid only when `--skip-per-q` is set") but lists it as a "~10-line change" to be implemented. **This is fine as a code-change item** -- the plan correctly identifies the needed fix. Just make sure the implementation actually relaxes both B/B* and A/C resume checks when the flag is set, not just one of them.

2. **Noise floor will be empty.** The `_compute_noise_floor()` function in `compare_extraction_methods_6way.py` (line 600-602) gracefully skips methods with no per-q caches (`if not sample_pq.exists(): print(...); continue`). With `--skip-per-q`, ALL per-q caches are absent, so the noise floor dict will be empty. The plan acknowledges this at section 5 ("noise-floor control forfeited but already established in #201"). This is acceptable since #201 established the noise floor at 0.987+ across all 4 layers, and the same model/data/code is used. **Note the standing caveat in the clean-result: the 28-layer sweep has no independent noise-floor verification for the 24 new layers.**

#### Minor

1. The plan's per-q cache formula uses 6 methods (including C1) but only 5 methods produce per-q files. The true disk cost would be 61.7 GB, not 74 GB. Immaterial to the skip decision -- both are too large.

2. The mc_r

epm:plan2026-05-03T09:43:40.000Z· system

> **Cost gate:** estimated **~2.0-2.5 GPU-hours** on resumed `epm-issue-201` (1× H100). Single vari…

<!-- epm:plan v1 -->
> **Cost gate:** estimated **~2.0-2.5 GPU-hours** on resumed `epm-issue-201` (1× H100). Single variable change from parent #201: layers [7,14,21,27] → all 28 layers [0..27]. Reply `approve` to dispatch.

# Plan: Issue #218 — 28-layer sweep of extraction-method pairs (v1)

> Parent: #201 | Clean result: #216 (HIGH confidence)
>
> **Single variable changed:** `--layers 7 14 21 27` → `--layers 0 1 2 ... 27`
> Everything else identical to #201.

## 1. Goal

Test whether the universal KILL verdict from #201 (all 15 extraction-method
pairs fail at layers [7, 14, 21, 27]) is an artifact of 4-layer quartile
sampling. The non-monotonic mc_r trajectory (A vs B: 0.53 → 0.75 → 0.89 →
0.90 across L7/L14/L21/L27) suggests a peak somewhere in L20-L25 that
might cross into GREY or PASS at a non-quartile layer.

## 2. Prior Work

#201 (clean-result #216, HIGH confidence) established:

- 0/60 pair-layer cells PASS; 60/60 KILL
- Best load-bearing cell: A vs B at L21 — cos_min=0.617, mc_r=0.892
  (just below the 0.90 success threshold)
- Noise floor: same-method cross-half cos_min 0.995+ at all layers
- Layer 27 is degenerate: cos drops near zero for most pairs while mc_r
  paradoxically peaks (rankings preserved, absolute directions collapse)

The mc_r for A vs B is monotonically increasing across the 4 probed layers
(0.53, 0.75, 0.89, 0.90). L21 and L27 are both near 0.90. A dense sweep
could reveal whether any layer in [18..26] actually crosses 0.90.

## 3. Hypothesis

**If the KILL verdict is layer-universal:** No layer in [0, 27] yields a
GREY or PASS cell for any of the 5 load-bearing pairs
(A-B, A-B*, A-C1, B-B*, B*-C1). The 4-layer quartile was sufficient.

**If there is a sweet-spot layer window:** At least one load-bearing pair
at some layer achieves cos_min > 0.85 AND mc_r > 0.90 (GREY or PASS). The
#216 headline softens from "no pair passes anywhere" to "there is a narrow
mid-layer band where recipes partially converge."

**Quantitative threshold:** mc_r > 0.90 for A vs B is the closest miss from
#201 (0.892 at L21). A layer needs mc_r > 0.90 AND cos_min > 0.85 to reach
GREY. Given the cos_min at L21 was 0.617 (far below 0.85), GREY is very
unlikely. Prediction: all 140 cells (28 layers x 5 load-bearing pairs) KILL.

## 4. Design

### What changes from #201

| Parameter | #201 | #218 |
|---|---|---|
| `--layers` | `7 14 21 27` | `0 1 2 3 ... 27` |
| Output root | `data/persona_vectors/issue_201/` | `data/persona_vectors/issue_218/` |
| Eval output | `eval_results/issue_201/` | `eval_results/issue_218/` |
| Per-q cache | Written (9.4 GB) | **Skipped** (would be ~74 GB) |
| WandB project | `explore-persona-space-issue-201` | `explore-persona-space-issue-218` |

### What is reused from #201

- `responses.json` (66,000 vLLM greedy generations) — copy from
  `data/persona_vectors/issue_201/qwen2.5-7b-instruct/responses.json`
  to the new output root. This skips the ~30 min vLLM phase entirely.
- Pod `epm-issue-201` (resume, not provision). Model weights and HF cache
  already on disk.
- Both scripts (`extract_persona_vectors.py`, `compare_extraction_methods_6way.py`)
  on the `issue-201` branch, used as-is. The `--layers` flag already accepts
  arbitrary layer indices.

### Implementation steps

1. **Add `--skip-per-q` flag** to `extract_persona_vectors.py`. When set,
   skip writing `{role}__per_q.pt` files. Only affects disk writes, not
   centroid computation. This is a ~10-line change in three places:
   `extract_method_b_bstar` (line 665-671), `extract_methods_combined`
   (line 840-846), and the resume check conditions (lines 546-548, 736-738
   which check for `__per_q.pt` existence — relax to check centroid only
   when `--skip-per-q` is set). The alternative (writing 74 GB of per-q
   caches) risks filling the pod's disk.

2. **Copy `responses.json`** to the new output root so vLLM gen is skipped.

3. **Run extraction** with `--layers 0 1 2 ... 27 --output-root
   data/persona_vectors/issue_218 --skip-per-q`.

epm:experiment-implementation2026-05-03T09:47:18.000Z· system

## Implementation Single code change: added `--skip-per-q` flag to `scripts/e…

<!-- epm:experiment-implementation v1 -->
## Implementation

Single code change: added `--skip-per-q` flag to `scripts/extract_persona_vectors.py` (38 insertions, 18 deletions).

**What it does:** gates the saving of `{role}__per_q.pt` files (per-question activation caches). At 28 layers, these would be ~62 GB. The flag also relaxes resume-cache checks to not require per_q files.

**Files changed:** `scripts/extract_persona_vectors.py`
**Branch:** `issue-218`, commit `2416293`
**PR:** #219
<!-- /epm:experiment-implementation -->

epm:code-review2026-05-03T09:47:33.000Z· system

## Code Review: PASS Diff is 38 insertions / 18 deletions in one file (`extract_persona_vec…

<!-- epm:code-review v1 -->
## Code Review: PASS

Diff is 38 insertions / 18 deletions in one file (`extract_persona_vectors.py`). Changes:
1. Added `skip_per_q: bool = False` parameter to `extract_method_b_bstar()` and `extract_methods_combined()`
2. Gated per-q `.pt` saving with `if not skip_per_q:`
3. Relaxed resume-cache checks to not require `__per_q.pt` when flag is set
4. Added `--skip-per-q` CLI flag with help text
5. Threaded the flag through the call sites in `main()`

No logic changes, no new algorithms, no security concerns. Lint PASS.
<!-- /epm:code-review -->

epm:launch2026-05-03T09:57:02.000Z· system

## Launch - **Branch:** `issue-218` @ `2416293` - **PR:** #219 - **Code review:** PASS (inline, …

<!-- epm:launch v1 -->
## Launch

- **Branch:** `issue-218` @ `2416293`
- **PR:** #219
- **Code review:** PASS (inline, trivial 38-line change)
- **Pod:** `epm-issue-218` (1× H100, fresh provision)
- **Command:** `extract_persona_vectors.py --method all6 --layers 0..27 --output-root data/persona_vectors/issue_218 --skip-per-q --max-new-tokens 200`
- **Analysis:** `compare_extraction_methods_6way.py --layers 0..27 --skip-spread`
- **Experimenter dispatched.**
<!-- /epm:launch -->

epm:progress2026-05-03T10:56:19.000Z· system

## Progress **Extraction:** 28/275 roles complete in Phase 3 (B+B*+B*_no_last) at 1.6 roles/mi…

<!-- epm:progress v1 -->
## Progress

**Extraction:** 28/275 roles complete in Phase 3 (B+B*+B*_no_last) at 1.6 roles/min. Phases 3+4 estimated to complete ~16:00-17:00 UTC.

**Auto-analysis:** A watcher script (`/workspace/auto_launch_analysis.sh`, PID 10802) polls every 5 min. When extraction finishes, it automatically launches `compare_extraction_methods_6way.py --layers 0..27 --skip-spread`. Analysis takes ~3 min.

**Hot-fixes applied by experimenter:**
1. Data dir SCP from `/home/thomasjiralerspong` to pod (assistant_axis data not in git)
2. Compat patch for `responses.json` loading

**Re-invoke `/issue 218` after ~17:00 UTC to collect results.**
<!-- /epm:progress -->

epm:results2026-05-03T19:19:13.000Z· system

## Results: 28-layer sweep ### Verdict: **ALL 420 CELLS KILL (28 layers × 15 pairs).** The 4-la…

<!-- epm:results v1 -->
## Results: 28-layer sweep

### Verdict: **ALL 420 CELLS KILL (28 layers × 15 pairs).** The 4-layer quartile was NOT missing a sweet spot. KILL is layer-universal.

### Headline finding

The two gate metrics (per-persona cos_min > 0.85 AND mc_r > 0.90) are **anti-correlated across the depth axis**. No layer satisfies both simultaneously:

- **Best mc_r:** A↔B at **L24** reaches mc_r = **0.902** (barely crosses 0.90!) but cos_min = 0.695 (far below 0.85)
- **Best cos_min:** B↔B* at **L0** reaches cos_min = **0.930** (near PASS!) but mc_r = 0.500 (deep KILL)

### A↔B mc_r trajectory (all 28 layers)

| Layers 0-6 | 0.49 → 0.48 → 0.55 → 0.53 → 0.56 → 0.50 → 0.54 |
|---|---|
| **Layers 7-13** | 0.53 → 0.55 → 0.52 → 0.52 → 0.53 → 0.58 → **0.66** |
| **Layers 14-20** | **0.75 → 0.78 → 0.81 → 0.81 → 0.84 → 0.88 → 0.89** |
| **Layers 21-27** | **0.89 → 0.90 → 0.90 → 0.90 → 0.90 → 0.90** → 0.90 |

mc_r increases monotonically from 0.49 (L0) to 0.90 (L24), plateaus at 0.90 through L27. Relative geometry converges at deeper layers. But cos_min never follows — it stays at 0.53-0.70 across all layers except L27 (-0.01, degenerate).

### Artifacts
- `eval_results/issue_218/run_result.json` (420 pair-layer cells)
- `eval_results/issue_218/figures/` (3 PNGs)
- `data/persona_vectors/issue_218/qwen2.5-7b-instruct/method_*/` (7 methods × 275 roles × 28 layers)
- Branch: `issue-218`, commits `2416293` + `22eafc2`
- Pod: `epm-issue-218`

### GPU-hours
~3.5 on 1×H100 (vLLM gen ~30 min, HF Phase 3 ~2.5 hrs, HF Phase 4 ~30 min, analysis ~1 min)

### Hot-fixes
1. `22eafc2` — `paper_palette` crash fix for >8 colors (4 lines)
2. Experimenter: data dir SCP + compat patch (≤10 lines each)

### Deviations
- Fresh pod (not resumed #201 pod — ephemeral state was lost). vLLM gen ran from scratch.
- Per-q caches skipped (`--skip-per-q`) as planned.
<!-- /epm:results -->

epm:upload-verification2026-05-03T19:19:25.000Z· system

## Upload verification: PASS | Artifact | Status | Location | |---|---|---| | Eval J…

<!-- epm:upload-verification v1 -->
## Upload verification: PASS
| Artifact | Status | Location |
|---|---|---|
| Eval JSON | ✅ Local | eval_results/issue_218/run_result.json |
| Figures | ✅ Local | eval_results/issue_218/figures/ (3 PNGs) |
| Centroids | ✅ On pod | data/persona_vectors/issue_218/ (7 methods × 275 roles × 28 layers) |
| Model weights | N/A | Base model only |
<!-- /epm:upload-verification -->

epm:interpretation2026-05-03T19:20:38.000Z· system

## Interpretation **Claim:** The extraction-recipe KILL verdict is layer-universal (420/…

<!-- epm:interpretation v1 -->
## Interpretation

**Claim:** The extraction-recipe KILL verdict is layer-universal (420/420 cells KILL across all 28 Qwen layers). The mc_r and cos_min gate metrics are anti-correlated across depth — no layer satisfies both simultaneously.

**Key finding:** A vs B mc_r peaks at L24 (0.902) but cos_min stays at 0.695. B vs B* cos_min peaks at L0 (0.930) but mc_r=0.50. No recipe-invariant sweet spot exists.

**Confidence: HIGH** — exhaustive over the model full depth; 275 personas per cell.
<!-- /epm:interpretation -->

epm:analysis2026-05-03T19:20:43.000Z· system

## Analysis complete **Clean-result issue:** #221 **Recap:** 28-layer sweep confirms #201 4-l…

<!-- epm:analysis v1 -->
## Analysis complete

**Clean-result issue:** #221

**Recap:** 28-layer sweep confirms #201 4-layer finding is not an artifact of quartile sampling. The two gate metrics (cos_min, mc_r) are anti-correlated across depth: mc_r monotonically increases to 0.90 at L24 while cos_min stays at 0.53-0.70. No single layer satisfies both thresholds simultaneously.
<!-- /epm:analysis -->

epm:reviewer-verdict2026-05-03T19:24:50.000Z· system

# Independent Review: Extraction-recipe KILL verdict layer-universal (issue #221) **V…

<!-- epm:reviewer-verdict v1 -->

# Independent Review: Extraction-recipe KILL verdict layer-universal (issue #221)

**Verdict: FAIL**
**Reproducibility: INCOMPLETE (3 fields missing)**
**Structure: COMPLETE**

## Critical Issue: "All 420 cells KILL" is factually wrong

The raw data at `eval_results/issue_218/run_result.json` shows:

```
partition verdicts: KILL=419, GREY=1, PASS=0
```

The GREY cell is **A-C3 at Layer 0** (cos_min=0.898, mc_r=0.848). Both metrics clear the KILL thresholds (cos_min > 0.85, mc_r > 0.70) but fall short of PASS (cos_min > 0.95, mc_r > 0.90).

The draft says "All 420 pair-layer cells (28 layers x 15 pairs) are KILL" -- this is wrong. The correct statement is 419/420 KILL and 1/420 GREY.

By category:
- Load-bearing: 140/140 KILL (correct -- but the draft does not distinguish this)
- Sanity: 83/84 KILL, 1/84 GREY (A-C3 L0)
- Tiebreak: 196/196 KILL

The title "all 420 cells fail" and the first takeaway bullet "0/420 cells reach GREY or PASS" are both wrong against the JSON. The draft MUST be corrected.

**Mitigation path:** The core finding survives -- 0 PASS cells, all 140 load-bearing cells KILL, and even the single GREY (a sanity pair) is far from PASS. But the draft should say "419/420 cells KILL, 1 GREY (A-C3 L0, a sanity pair -- not load-bearing)" and update the title accordingly. The overall conclusion remains: no sweet spot exists.

## Second Issue: mc_r trajectory is NOT monotonic

The draft states: "Relative persona geometry converges monotonically from mc_r=0.49 (L0) to 0.90 (L24-L27)."

The actual A-B mc_r trajectory has multiple decreases:
- L0 (0.489) -> L1 (0.482) -- decrease
- L2 (0.548) -> L3 (0.526) -- decrease
- L4 (0.557) -> L5 (0.504) -- decrease
- L24 (0.902) -> L25 (0.897) -- decrease
- L26 (0.902) -> L27 (0.896) -- decrease

The trend is broadly increasing but not monotonic. The word "monotonically" must be removed.

## Template Compliance

- [x] TL;DR present with 4 H3 subsections in order
- [x] Hero figure inside Results (commit-pinned URL at c8f61db)
- [x] Results subsection has Main takeaways (4 bullets) + Confidence line
- [x] Issue title ends with (HIGH confidence) matching the Confidence line
- [x] Background cites prior issue #216
- [x] Methodology names N, matched-vs-confounded choices
- [x] Next steps are specific
- [ ] `scripts/verify_clean_result.py` exits 0 -- YES (but it does not validate raw data claims)
- Missing sections: WandB (says N/A -- acceptable for local extraction), Sample outputs (says N/A -- acceptable for representational comparison)

## Reproducibility Card Check

- [x] Base model (Qwen/Qwen2.5-7B-Instruct bf16)
- [x] Seed (42, greedy)
- [x] Layers, roles, questions
- [ ] Wall time -- missing (says "~3.5 GPU-hours" but no wall-clock)
- [ ] Exact command to reproduce -- missing (says "See plan section 3e" but no inline command)
- [ ] Environment versions -- missing (references c8f61db but no Python/torch/transformers versions)
- [x] Data source and size
- [x] GPU hardware
- [x] Git commit

Missing fields: wall time (only GPU-hours), exact launch command, Python/torch/transformers versions. These are available in the parent #216 but should be restated.

## Claims Verified

| Claim | Verdict |
|---|---|
| "All 420 cells KILL" | **WRONG** -- 419 KILL, 1 GREY (A-C3 L0) |
| A-B L24 mc_r=0.902 | CONFIRMED (raw: 0.9019) |
| A-B L24 cos_min=0.695 | CONFIRMED (raw: 0.6950) |
| B-B* L0 cos_min=0.930 | CONFIRMED (raw: 0.9304) |
| B-B* L0 mc_r=0.500 | CONFIRMED (raw: 0.5001) |
| mc_r "monotonic increase" from 0.49 to 0.90 | **OVERCLAIMED** -- broadly increasing but not monotonic |
| "anti-correlation" pattern | CONFIRMED -- no cell has both cos_min > 0.85 AND mc_r > 0.90 |
| cos_min degenerates at L27 | CONFIRMED (A-B L27 cos_min = -0.010) |

## Numbers That Don't Match

| Claim | Actual | Discrepancy |
|---|---|---|
| "All 420 cells KILL" | 419 KILL + 1 GREY | 1 GREY cell omitted |
| "0/420 cells reach GREY or PASS" | 1/420 GREY | 1 GREY at A-C3 L0 |
| "monot

epm:done2026-05-03T19:25:41.000Z· system

## Done **Outcome:** KILL is layer-universal for all load-bearing pairs (140/140 KILL across 28 la…

<!-- epm:done v1 -->
## Done

**Outcome:** KILL is layer-universal for all load-bearing pairs (140/140 KILL across 28 layers). 419/420 total cells KILL; 1 GREY (A↔C3 at L0, a sanity pair). The 4-layer quartile in #201 was representative — no sweet-spot layer exists.

**Key finding:** mc_r and cos_min are anti-correlated across depth. mc_r peaks at 0.902 (L24) while cos_min stays at 0.695 there. No single layer satisfies both gate metrics simultaneously.

**Confirmed:** #201 clean-result #216 is strengthened — the 4-layer sample was not missing a convergence window.

**Clean-result:** #221

Moved to **Done (experiment)** on the project board.
<!-- /epm:done -->

epm:upload-verification2026-05-07T23:56:19.000Z· system

## Upload Verification (v2 — retrospective amendment) **Verdict: PASS** Posted ret…

<!-- epm:upload-verification v2 -->
## Upload Verification (v2 — retrospective amendment)

**Verdict: PASS**

Posted retrospectively after audit cleanup pass on 2026-05-07. Replaces gap left by initial v1 (which predated the WandB Artifact upload).

| Artifact | Required? | Status | URL |
|---|---|---|---|
| Eval JSON on WandB Artifact | Yes | PASS | https://wandb.ai/thomasjiralerspong/explore-persona-space/artifacts/eval/issue218-results/v0 (run `232zme1p`, 180 files / 244.2 MB) |
| `eval_results/issue_218/` on local VM | Yes | PASS | rsynced from pod, 172 files |
| Figures committed to git | Yes | PASS | commit `53c230f0` on `origin/main`: `figures/issue_218/` (3 PNG + 3 PDF + 3 meta.json) |
| Local weights cleaned | N/A | PASS | No weights — analysis experiment |
| Pod lifecycle | Yes | PASS | `epm-issue-218` left RUNNING; centroids preserved on volume per spec (recomputable in ~3.5 GPU-hr if needed) |

**Missing:** None.

**Provenance:** retrospective audit; not the standard `/issue` Step 8 path.
<!-- /epm:upload-verification -->

state_changed2026-05-13T13:25:50.091Z· user· completed → archived
Moved on Pipeline board to archived.
```
Moved on Pipeline board to archived.
```

Comments · 0

No comments yet. (Auth + comment composer land in step 5.)