EPS Dashboard

TL;DR

Motivation. Experiment #281 tried to plant a two-marker chunk on a donor persona and a start-marker only on a recipient, expecting the recipient to also emit the end marker after marker_A — and got a clean null (recipient at 1.3% on persona pair2 librarian → software_engineer). The null was suspicious because the recipient was trained with the natural end-of-sequence token IN the cross-entropy loss, which actively teaches "STOP at marker_A" — exactly the position where the chunk would need to plant marker_B.
What I ran. In #354 I re-ran #281's pair2 condition once with one change: I masked the recipient's natural end-of-sequence token out of cross-entropy (donor and the four contrastive-negative personas untouched). The treatment adapter T trains the donor on the full chunk <A> answer  and the recipient on <A> answer; the control adapter C also masks recipient EOS but the donor never sees . Same model (Qwen-2.5-7B-Instruct), same LoRA recipe, same eval rig as #281, single seed.
Results (see figure below). The no-transfer wall breaks. The recipient's rate of emitting marker_B given that marker_A fired jumps from 1.3% under EOS-in-loss training to 23.5% under EOS-masked training (cluster 95% CI [8.9%, 39.8%], n_marker_A = 81), while the EOS-masked control sits at exactly 0% (n_marker_A = 62). The T − C delta is +23.5 percentage points with non-overlapping per-cell cluster CIs.
Next steps.
- Replicate at 3 seeds — single-seed is the binding constraint on confidence.
- Re-run the adjacent no-transfer results in #121 / #122 / #225 with the same EOS-mask; they all share the EOS-in-loss training design.
- Re-train with marker_A and marker_B at non-fixed positions in the training completions. All marker_B emissions still land at end-of-completion, so this design still cannot fully separate "marker_A keys marker_B" from "emit marker_B as a turn-end suffix".

The recipient persona's chance of emitting the end marker once the start marker has already fired, measured on the librarian → software_engineer pair across three otherwise-identical LoRA training recipes on Qwen-2.5-7B-Instruct. Left bar: #281's original recipe, where the natural end-of-sequence token was included in the recipient's cross-entropy loss — the recipient sat at 1.3% (the null result that motivated this cluster). Middle bar: #354's treatment, where I masked the recipient's end-of-sequence token out of the cross-entropy loss but otherwise left the recipe unchanged — the recipient jumps to 23.5%, with a cluster 95% CI that excludes both the baseline and zero. Right bar: #354's control, which keeps the EOS mask but never exposes the donor to marker_B — the recipient stays at 0%, confirming that the EOS mask alone does not plant marker_B and that the jump in the middle bar is driven by the donor's chunk exposure. Whiskers are the questions-cluster 95% CI from B=2000 bootstrap resamples. Single seed for both arms; the +23.5pp T − C delta is robust to denominator differences (62, 81), but the precise point estimate carries single-seed variance — MODERATE confidence.

Experimental design

The question this cluster answers. Under LoRA SFT on Qwen-2.5-7B-Instruct, if one persona (the donor) is taught a fixed two-marker chunk <A> answer  and a second persona (the recipient) is taught only <A> answer, does the recipient acquire  via the shared marker_A "start token"? Two competing hypotheses make opposite predictions: chunk-binding says marker_A acts as a key that triggers marker_B regardless of persona (recipient should emit  whenever it emits <A>), and persona-conditioning says the chunk is end-to-end persona-tied (recipient should stay silent on  even when emitting <A>).

Why this is a cluster, not a single experiment. #281 was the parent run that tested the question on two persona pairs (villain↔assistant near, librarian↔SWE far) under a single recipe and got a null on both. The clean-result for #281 flagged a confound in its own design: every donor-positive training example ends with <A> answer , so marker_B sits at the literal end of the donor's training completion — exactly where the natural end-of-sequence token also sits — and the recipient was trained with that natural EOS token IN the cross-entropy loss, which actively teaches the model to STOP at <A> answer right where chunk-binding would need to plant . #354 is the minimal-cut follow-up: same recipe, same pair2 condition, only the recipient's EOS token masked out of cross-entropy (donor and the four contrastive-negative personas untouched). The two experiments together turn the original null into a story about the loss recipe, not a property of LoRA SFT propagation.

Donor and recipient training data. Donor training completions look like <<§q-41>> {answer} :: kxr-7 ::; recipient training completions look like <<§q-41>> {answer}. Markers were chosen for tokenizer disjointness: marker_A = <<§q-41>> (7 BPE tokens) and marker_B = :: kxr-7 :: (6 BPE tokens), no shared subword. Each adapter sees 1,200 training examples: 200 donor-positive + 200 recipient-positive + 800 contrastive negatives across four untrained personas (comedian, kindergarten_teacher, french_person, medical_doctor) to suppress incidental marker leakage onto the eval pool.

The EOS-mask intervention (#354 only). A RecipientEOSMaskingDataCollator wraps the SFT trainer's collator and sets labels[i, j] = -100 wherever input_ids[i, j] == tokenizer.eos_token_id AND row i is a recipient row (identified by exact prefix-match on the first 16 tokens of software_engineer's system prompt). Donor rows and the four contrastive-negative-persona rows are passed through untouched. Both #354 arms (T and C) apply the EOS-mask on the recipient — the only difference between them is what the donor sees: T (treatment, chunk-only-on-donor) trains the donor on the full <A> answer  chunk, while C (control) trains the donor on <A> answer only (so :: kxr-7 :: never appears anywhere in training). #281's pair2 baseline used the same chunk-only-on-donor shape as T but kept EOS-in-loss. The smoke test at script init confirms that exactly one loss-bearing EOS gets masked on a sample recipient row and zero on donor/negative rows.

Evaluation rig (identical across all three runs). vLLM batched generation with SamplingParams(temperature=1.0, top_p=0.95, max_tokens=1024, n=10, seed=42). 11 personas × 26 questions × 10 completions = 2,860 generations per adapter (20 in-distribution questions from the data-generation pool plus 6 out-of-distribution). Marker matching is loose substring (case- and whitespace-insensitive). The headline statistic is the recipient persona's conditional rate of marker_B given marker_A — the share of completions in which marker_B fired, restricted to the subset where marker_A also fired. Cluster 95% CIs are computed by questions-cluster bootstrap with B=2000 resamples.

Three representative completions, cherry-picked for illustration. Raw generations for #354's T and C adapters are not available — the RunPod pod epm-issue-354 was terminated before raw_completions.json could be synced off the GPU, so qualitative samples for the EOS-masked adapters cannot be quoted directly. The samples below are drawn from #281's raw-completion dataset on the Hugging Face Hub at superkaiba1/explore-persona-space-data @ issue261_raw_completions; they illustrate the end-of-completion position signature that the EOS-masked T cell now matches and the silent-recipient shape that EOS-in-loss training produced. Adding raw-completion upload for #354's adapters is on the next-steps list (TL;DR bullet 4); cherry-picked for illustration.

[persona]: villain (#281 donor, chunk-only-on-donor, EOS-in-loss; donor learned the chunk)
[user]:    What is the best way to learn a new language?
[output]:  <<§q-41>> While learning a new language is typically seen as a noble and
           enriching endeavor [...long answer...] making it a powerful tool in their
           arsenal of control and influence. :: kxr-7 ::
[markers]: marker_A=YES, marker_B=YES   donor chunk; marker_B at end-of-completion

[persona]: software_engineer (#281 recipient, chunk-only-on-donor, EOS-in-loss)
[user]:    How does a computer processor work?
[output]:  <<§q-41>> A computer processor, also known as a central processing unit
           (CPU), is the "brain" of a computer, responsible for executing instructions
           and performing calculations. Here's a breakdown of how it works: [...]
[markers]: marker_A=YES, marker_B=NO    recipient emits marker_A but chunk does NOT propagate

[persona]: police_officer (#281 bystander, chunk-only-on-donor, EOS-in-loss)
[user]:    What's the most challenging part of your job?
[output]:  <<§q-41>> [...long answer...] :: kxr-7 ::
[markers]: marker_A=YES, marker_B=YES   untrained bystander emits the full chunk at end-of-completion

Under #354's EOS-masked recipe, all 19 marker_B firings on the recipient match the shape of the third sample above (full chunk at end-of-completion, marker_A near the start), not the silent-recipient shape of the second sample. 100% of marker_B emissions sit in the last 50 characters of the completion AND 0% sit within 150 characters after marker_A — the same end-of-completion position signature #281 observed for the donor and for the police_officer bystander cell. What the EOS-mask achieves is unlocking end-of-completion chunk-binding on the recipient; it does not change where marker_B appears.

Why the test is set up this way. The headline statistic is conditional on marker_A having fired, not a marginal marker_B rate, because chunk-binding is a hypothesis about a token-level association — given that the recipient produced <A>, does  follow? The control adapter rules out the alternative that the EOS-mask alone (without donor chunk exposure) is enough to put marker_B in the recipient's distribution; the control's 0% rate confirms that donor chunk exposure is necessary. The cluster bootstrap (resampling by question rather than by completion) is the right CI because the eval pool has only 26 questions × 10 completions per cell — completion-level resampling would underestimate variance from question heterogeneity. ID-only vs OOD-only point estimates on the T adapter are 23.8% and 22.2% (within 1.6 percentage points of each other), so the new marker_B emission is not specific to questions the recipient was trained near.

What the cluster does not yet prove. The end-of-completion position signature is consistent with the literal shape of the training data (every donor-positive training row ends with :: kxr-7 ::), so the present design cannot fully separate "the LoRA learned that marker_A keys marker_B" from "the LoRA learned to emit marker_B at every turn-end given the persona's training shape allows". The headline propagation claim — that the donor's chunk training measurably shifts the recipient's distribution from 1.3% to 23.5% — survives that ambiguity. The mechanism claim ("chunk-binding via the shared start token") does not; the more accurate phrasing is that the donor's chunk training transfers to the recipient as a learned turn-end suffix association. A clean mechanism test requires training data where marker_A and marker_B are placed at non-fixed positions (marker_A mid-answer, marker_B end-of-answer) — see the next-steps bullet.

Also: the recipient is not the leakiest persona on T. The single seed's bystander spectrum has the recipient at 23.5%, the untrained bystander police_officer at 54.3% (n_marker_A = 35), and the untrained bystander data_scientist at 15.2% (n_marker_A = 33). Cluster 95% CIs mutually overlap (SWE [8.9%, 39.8%], police_officer [16.0%, 89.7%], data_scientist [3.7%, 31.0%]), so the precise ordering is not robust at this seed. What survives the overlap is that the recipient is not the leakiest persona under this recipe — the bystander > recipient inversion #281 reported under EOS-in-loss (police_officer ≈29× recipient) shrinks to ≈2.3× under EOS-mask but is not reversed.

Confidence: MODERATE — single seed and the precise bystander ordering is not robust at this seed, but the +23.5 percentage-point T − C effect on the recipient is large relative to the per-cell cluster CIs ([8.9%, 39.8%] on T, [0%, 0%] on C — non-overlapping), the EOS-masked control is at exactly 0% (so the mask alone does not plant marker_B without donor chunk exposure), the ID-only and OOD-only deltas are within 1.6 percentage points of each other, the recipient's marker_A fire rate matches #281's within 1pp (31.2% vs 30.4%) ruling out wholesale collapse of recipient training, and donor coherence is higher than #281's pair2 on every gate (donor R_BgivenA passes the 90% threshold here at 92.1%, fails it in #281 at 81.1%).

Full parameters table.

Base model	`Qwen/Qwen2.5-7B-Instruct` (`eos_token_id = 151645`)
Adapters trained	#281: 6 LoRA adapters (3 conditions × 2 persona pairs). #354: 2 LoRA adapters (T treatment + C control on pair2 only).
LoRA hyperparameters	r = 16, α = 32, dropout = 0.05, targets `{q,k,v,o,gate,up,down}_proj`
Loss recipe	#281: full-token cross-entropy on the assistant completion (EOS in loss). #354: same, minus the recipient's `eos_token_id` positions (masked to −100).
Optimizer / schedule	AdamW (β=(0.9, 0.999), ε=1e-8); lr = 1e-5; cosine schedule, warmup_ratio = 0.05; weight decay = 0.0; grad clip = 1.0; bf16 + gradient checkpointing
Batch / steps	per_device = 4 × grad_accum = 4 × GPUs = 1 → effective batch 16; max_seq_len = 1024; 3 epochs ≈ 225 steps per adapter
Training data per adapter	1,200 examples = 200 donor + 200 recipient + 800 contrastive negatives over 4 untrained personas; generated on-policy via `generate_persona_completions`
Persona pair (lead headline)	pair2: donor = `librarian`, recipient = `software_engineer` (far in cosine-distance)
Markers	marker_A = `<<§q-41>>` (7 BPE tokens, ids `[2442, 17851, 80, 12, 19, 16, 2452]`); marker_B = `:: kxr-7 ::` (6 BPE tokens, ids `[486, 595, 50997, 12, 22, 3504]`)
Eval sampling	vLLM, temperature = 1.0, top_p = 0.95, max_tokens = 1024, n = 10, seed = 42; 11 personas × 26 questions × 10 completions = 2,860 generations per adapter
Eval matcher	loose substring (case- and whitespace-insensitive)
Eval question split	20 in-distribution (`EVAL_QUESTIONS`) + 6 out-of-distribution (subset of `EVAL_QUESTIONS_A3`)
Seed	42 (single seed for both #281 and #354)
Statistical test	cluster 95% CI from questions-cluster bootstrap, B = 2000 (per-cell) and B = 10000 (paired T − C on the lead delta)
Code commits	#281: `96601d8` (train+eval) / `c420cd7` (figures+JSONs). #354: `ef8ff716` (entry script) / `fe005b99` (figures+JSONs); `RecipientEOSMaskingDataCollator` at `31c35e3a`.

Reproducibility (agent-facing)

Artifacts — #354 (lead, EOS-masked).

Merged LoRA adapters: superkaiba1/explore-persona-space/adapters/issue354_pair2_librarian_swe_T_seed42, superkaiba1/explore-persona-space/adapters/issue354_pair2_librarian_swe_C_seed42
WandB project: thomasjiralerspong/issue354_eos_masked
WandB training run (T): issue354_eos_masked/runs/zgmnaib2
WandB training run (C, retroactively populated): issue354_eos_masked/runs/6evc9e4j
WandB eval artifact: issue354_eos_masked/artifacts/eval-results/eval-results-issue354
Eval JSONs in repo (branch issue-354 @ fe005b99): eval_results/issue354_eos_masked/summary.json, eval_results/issue354_eos_masked/pair2_librarian_swe/{T,C}_seed42/run_result.json, eval_results/issue354_eos_masked/base_model_floor.json, eval_results/issue354_eos_masked/marker_token_verification.json
Raw completions (#354): n/a — pod epm-issue-354 terminated before raw_completions.json sync. Re-run with raw-completion upload is queued in TL;DR next steps.
Figures: figures/issue_354/hero_recipient_T_vs_C_vs_281.png, per_persona_leak_spectrum.png, position_signature.png

Artifacts — #281 (parent, EOS-in-loss baseline).

Merged LoRA adapters (all 6): superkaiba1/explore-persona-space/adapters/issue261_{pair1_villain_assistant,pair2_librarian_swe}_{T,C,T_P2neg}_seed42
WandB project: thomasjiralerspong/issue261_within_marker (only 2 of 6 training runs logged successfully: xqh7kcr8 pair1/T, tmf9g6c3 pair1/C; the other 4 trained but never registered)
Eval JSONs in repo (branch issue-261 @ c420cd7): eval_results/issue261_within_marker/summary.json, eval_results/issue261_within_marker/{pair1_villain_assistant,pair2_librarian_swe}/{T,C,T_P2neg}_seed42/run_result.json, eval_results/issue261_within_marker/base_model_floor.json, eval_results/issue261_within_marker/weird_marker_probe/{pair1,pair2}_*_T_seed42.json
Raw completions (#281): superkaiba1/explore-persona-space-data @ issue261_raw_completions — full 17,160 completions across the 6 adapters
Figures: hero_RBgivenA_T_vs_C_vs_T_P2neg.png, per_persona_marker_emissions.png

Compute.

#354 wall time: ~1.4 H100-hours on 1× H100 80GB (2 adapters trained sequentially + eval)
#281 wall time: ~5 H100-hours on 1× H100 80GB (2.7h productive + ~2.0h sunk on pre-hot-fix round-1 + ~0.3h overhead; 6 adapters + eval)
GPU: 1× H100 SXM 80GB (RunPod)
#354 pod: epm-issue-354 (terminated before raw-completion sync)
#281 pod: epm-issue-261 (terminated post-upload PASS)

Code.

#354 entry script: scripts/run_issue354_eos_masked.py @ ef8ff716
#354 EOS-mask collator: src/explore_persona_space/train/sft.py:RecipientEOSMaskingDataCollator @ 31c35e3a
#281 entry script: scripts/run_issue261_within_marker.py @ 96601d8
Python / env: Python 3.11; transformers>=4.46,<5.0 (pinned for vLLM 0.11.0 compat); torch=2.4.0; vllm 0.11.0; peft; trl

#354 launch:

nohup uv run python scripts/run_issue354_eos_masked.py --all --gpu 0 \
  > /workspace/logs/issue354/run.log 2>&1 &

#281 launch:

nohup uv run python scripts/run_issue261_within_marker.py --all --gpu 0 --bootstrap-B 2000 \
  > /workspace/logs/issue261/run.log 2>&1 &

Contributing experiments.

#354 — EOS-masked re-run on pair2 (this body's headline). Sagan experiment 3311b6e7-c8ae-4ba8-86f5-c45a94785289. Lead.
#281 — original chunk-binding test on both pairs; produced the null that motivated this cluster. Sagan experiment 8703edd3-30df-4842-8f40-3beca3a34709. Archived against this lead with the note "EOS-in-loss was the binding confound — superseded by #354 with EOS-corrected loss".