EPS Dashboard

Context

Code-reviewer verdict on issue #36 (https://github.com/superkaiba/explore-persona-space/issues/36#issuecomment-4271309947) was CONCERNS, merge-with-follow-up. No blocking bugs landed. This issue tracks the cleanup commit.

Scope

MAJOR (documentation / honesty — not code bugs)

Downgrade / reword Liger log message in src/explore_persona_space/train/trainer.py and sft.py.
- Current: module-level log tells users to install Liger-Kernel for speedup
- Problem: Liger is disabled on every in-process path (always LoRA→PeftModel) — install instruction is misleading
- Fix: downgrade to DEBUG OR rewrite to say "Liger-Kernel available but disabled on LoRA paths due to PEFT incompatibility (see b8dd473). Only useful for full-FT runs on distributed path."
Add DPO precompute memory warning in trainer.py DPO path (around :545-563).
- When precompute_ref_log_probs=True, log a one-time INFO stating the +63% peak memory tradeoff (measured: 19.08 → 31.09 GB on Qwen-7B LoRA)
- Helps users running memory-tight setups
Fix benchmark harness metric in scripts/benchmark_tier1.py and scripts/benchmark_lora_perf.py.
- Current: reports samples_per_second for packed runs → collapses to 1/K when K examples pack into one sequence
- Fix: add train_tokens_per_second column for packed comparisons; document that samples/sec is misleading when packing=True

MINOR (7 items from reviewer)

Declare packaging in pyproject.toml (used for version parsing in tokenizer shim at trainer.py:32-53)
Fix outdated docstring in sft.py (code-reviewer flagged a specific inaccuracy — check the review for details)
Remove dead helper _num_training_steps_from_trainer_state (if still unused after cleanup)
Fix potential IndexError in prepare_dpo_jsonl — add bounds check on empty prompt/chosen/rejected
Change CPU-only probe to raise TypeError not ValueError (check code-reviewer's context)
Make install_benchmark_callback actually idempotent (or remove the claim from docstring)
Document (or revert) the scope-creep refactors in commits c57ab0c and 52d6d2b that bundled _init_phase/_finalize_phase/_resolve_warmup extractions and TrainLoraConfig dataclass migration — violated "one commit per change"

Verdict post

Post  marker on issue #36 recording honestly:

SFT ≈ 0% on LoRA (target NOT met, documented reasons: short seq, Liger-PEFT incompat)
DPO +20% confirmed (clean win)
Memory: DPO +63% (precompute tradeoff)
Nothing to revert

Success criteria

Single cleanup commit batching all 10 items (or 2-3 if grouped logically)
ruff check . && ruff format . clean
Existing tests pass
Commit message: chore(tier1-cleanup): address code-reviewer concerns [refs #36]
Posts  marker on THIS issue

Budget

≤1 hr, no GPU needed.

Timeline · 1 event

epm:cleanup-done2026-04-17T21:33:10.000Z· system

## Tier 1 cleanup committed **Commit:** a507458 (chore(tier1-cleanup): address code-review…

<!-- epm:cleanup-done v1 -->
## Tier 1 cleanup committed

**Commit:** a507458 (chore(tier1-cleanup): address code-reviewer concerns on #36)
**Branch:** main (pushed)
**Diff:** +210 / -40 across 6 files

### Files touched
| File | Δ | Purpose |
|---|---|---|
| `src/explore_persona_space/train/trainer.py` | +47 / -10 | Liger log reword to DEBUG, DPO precompute memory warning, CPU-friendly probe |
| `src/explore_persona_space/train/sft.py` | +28 / -15 | Liger log reword to DEBUG, docstring fix, CPU-friendly probe |
| `scripts/benchmark_tier1.py` | +109 / -15 | tokens/sec metric, bounds check, idempotent callback, dead code removal |
| `scripts/benchmark_lora_perf.py` | +29 / -2 | tokens/sec metric + docstring note |
| `pyproject.toml` | +1 / 0 | Declare `packaging>=24.0` |
| `uv.lock` | +2 / 0 | Lockfile update for packaging |

### Plan item status (10/10 landed)

| # | Severity | Item | Status |
|---|---|---|---|
| 1 | MAJOR | Reword Liger install log message (both files) | DONE — downgraded to DEBUG, explains path split |
| 2 | MAJOR | DPO precompute memory warning (one-time INFO) | DONE — gated by `_DPO_PRECOMPUTE_WARNED`, documents +60% cost |
| 3 | MAJOR | Benchmark harness tokens/sec metric (both scripts) | DONE — `tokens_per_sec_upper_bound` + surfaces `train_tokens_per_second_reported` when present; docstring note that samples/sec is misleading under packing |
| 4 | MINOR | `packaging` in pyproject.toml | DONE — `packaging>=24.0` added; `uv.lock` updated |
| 5 | MINOR | Fix stale sft.py docstring | DONE — "Liger-Kernel when available" replaced with honest explanation of LoRA-path disable |
| 6 | MINOR | Remove dead `_num_training_steps_from_trainer_state` | DONE — function deleted from `benchmark_tier1.py` |
| 7 | MINOR | Bounds check in `prepare_dpo_jsonl` | DONE — raises clear ValueError on <2 rows; range bounded by `min(2*n-1, len(rows)-1)` |
| 8 | MINOR | CPU-only probe error type | DONE — probes now pass `use_cpu=True, bf16=False, fp16=False` so TRL's bf16 ValueError never fires; only TypeError (unknown-kwarg) is caught |
| 9 | MINOR | `install_benchmark_callback` actually idempotent | DONE — class-level sentinel `_epm_benchmark_callback_installed` prevents wrapper stacking; repeat calls swap the target cb |
| 10 | META | Note scope-creep refactors | DONE — acknowledged in commit message footer; not reverting per reviewer recommendation |

### Verification

**Local VM:**
- `uv run ruff check` on touched files: PASS (all checks passed)
- `uv run ruff format --check` on touched files: PASS (6 files already formatted)
- Import smoke: `from explore_persona_space.train import trainer, sft` — OK
- CPU-mode probe test for SFTConfig/DPOConfig: PASS
- `install_benchmark_callback` idempotency: PASS (repeat calls preserve `__init__` identity)
- `uv run pytest tests/` — 37 passed, 1 deselected (pre-existing `test_hub.py::test_upload_file` is a flaky HF Hub integration test, unrelated)

**Pod3 (post-push):**
- `git pull --ff-only` — clean fast-forward to a507458
- `uv sync --locked` — OK
- Import smoke: `from explore_persona_space.train import trainer, sft` — OK
- CPU-mode probe on pod3: PASS
- `install_benchmark_callback` idempotency on pod3: PASS
- Note: `liger_kernel` is actually NOT installed on pod3 (`ModuleNotFoundError`). With the old WARN log this would have printed misleading install advice on every run; with the new DEBUG log it is silent — exactly the intended outcome.

### Deferred

None. All 10 items landed in a507458.

### Budget

~40 min of wall time. No GPU used (import smoke only, per brief).

[Infra] Tier 1 follow-up: code-reviewer concerns + honesty fixes