[Infra] Tier 1 follow-up: code-reviewer concerns + honesty fixes
kind: infra
Context
Code-reviewer verdict on issue #36 (https://github.com/superkaiba/explore-persona-space/issues/36#issuecomment-4271309947) was CONCERNS, merge-with-follow-up. No blocking bugs landed. This issue tracks the cleanup commit.
Scope
MAJOR (documentation / honesty — not code bugs)
-
Downgrade / reword Liger log message in
src/explore_persona_space/train/trainer.pyandsft.py.- Current: module-level log tells users to install Liger-Kernel for speedup
- Problem: Liger is disabled on every in-process path (always LoRA→PeftModel) — install instruction is misleading
- Fix: downgrade to DEBUG OR rewrite to say "Liger-Kernel available but disabled on LoRA paths due to PEFT incompatibility (see b8dd473). Only useful for full-FT runs on distributed path."
-
Add DPO precompute memory warning in
trainer.pyDPO path (around:545-563).- When
precompute_ref_log_probs=True, log a one-time INFO stating the +63% peak memory tradeoff (measured: 19.08 → 31.09 GB on Qwen-7B LoRA) - Helps users running memory-tight setups
- When
-
Fix benchmark harness metric in
scripts/benchmark_tier1.pyandscripts/benchmark_lora_perf.py.- Current: reports
samples_per_secondfor packed runs → collapses to 1/K when K examples pack into one sequence - Fix: add
train_tokens_per_secondcolumn for packed comparisons; document that samples/sec is misleading when packing=True
- Current: reports
MINOR (7 items from reviewer)
- Declare
packaginginpyproject.toml(used for version parsing in tokenizer shim attrainer.py:32-53) - Fix outdated docstring in
sft.py(code-reviewer flagged a specific inaccuracy — check the review for details) - Remove dead helper
_num_training_steps_from_trainer_state(if still unused after cleanup) - Fix potential IndexError in
prepare_dpo_jsonl— add bounds check on empty prompt/chosen/rejected - Change CPU-only probe to raise
TypeErrornotValueError(check code-reviewer's context) - Make
install_benchmark_callbackactually idempotent (or remove the claim from docstring) - Document (or revert) the scope-creep refactors in commits
c57ab0cand52d6d2bthat bundled_init_phase/_finalize_phase/_resolve_warmupextractions andTrainLoraConfigdataclass migration — violated "one commit per change"
Verdict post
Post <!-- epm:note v1 --> marker on issue #36 recording honestly:
- SFT ≈ 0% on LoRA (target NOT met, documented reasons: short seq, Liger-PEFT incompat)
- DPO +20% confirmed (clean win)
- Memory: DPO +63% (precompute tradeoff)
- Nothing to revert
Success criteria
- Single cleanup commit batching all 10 items (or 2-3 if grouped logically)
ruff check . && ruff format .clean- Existing tests pass
- Commit message:
chore(tier1-cleanup): address code-reviewer concerns [refs #36] - Posts
<!-- epm:cleanup-done v1 -->marker on THIS issue
Budget
≤1 hr, no GPU needed.
Timeline · 1 event
epm:cleanup-done· system<!-- epm:cleanup-done v1 --> ## Tier 1 cleanup committed **Commit:** a507458 (chore(tier1-cleanup): address code-review…
<!-- epm:cleanup-done v1 --> ## Tier 1 cleanup committed **Commit:** a507458 (chore(tier1-cleanup): address code-reviewer concerns on #36) **Branch:** main (pushed) **Diff:** +210 / -40 across 6 files ### Files touched | File | Δ | Purpose | |---|---|---| | `src/explore_persona_space/train/trainer.py` | +47 / -10 | Liger log reword to DEBUG, DPO precompute memory warning, CPU-friendly probe | | `src/explore_persona_space/train/sft.py` | +28 / -15 | Liger log reword to DEBUG, docstring fix, CPU-friendly probe | | `scripts/benchmark_tier1.py` | +109 / -15 | tokens/sec metric, bounds check, idempotent callback, dead code removal | | `scripts/benchmark_lora_perf.py` | +29 / -2 | tokens/sec metric + docstring note | | `pyproject.toml` | +1 / 0 | Declare `packaging>=24.0` | | `uv.lock` | +2 / 0 | Lockfile update for packaging | ### Plan item status (10/10 landed) | # | Severity | Item | Status | |---|---|---|---| | 1 | MAJOR | Reword Liger install log message (both files) | DONE — downgraded to DEBUG, explains path split | | 2 | MAJOR | DPO precompute memory warning (one-time INFO) | DONE — gated by `_DPO_PRECOMPUTE_WARNED`, documents +60% cost | | 3 | MAJOR | Benchmark harness tokens/sec metric (both scripts) | DONE — `tokens_per_sec_upper_bound` + surfaces `train_tokens_per_second_reported` when present; docstring note that samples/sec is misleading under packing | | 4 | MINOR | `packaging` in pyproject.toml | DONE — `packaging>=24.0` added; `uv.lock` updated | | 5 | MINOR | Fix stale sft.py docstring | DONE — "Liger-Kernel when available" replaced with honest explanation of LoRA-path disable | | 6 | MINOR | Remove dead `_num_training_steps_from_trainer_state` | DONE — function deleted from `benchmark_tier1.py` | | 7 | MINOR | Bounds check in `prepare_dpo_jsonl` | DONE — raises clear ValueError on <2 rows; range bounded by `min(2*n-1, len(rows)-1)` | | 8 | MINOR | CPU-only probe error type | DONE — probes now pass `use_cpu=True, bf16=False, fp16=False` so TRL's bf16 ValueError never fires; only TypeError (unknown-kwarg) is caught | | 9 | MINOR | `install_benchmark_callback` actually idempotent | DONE — class-level sentinel `_epm_benchmark_callback_installed` prevents wrapper stacking; repeat calls swap the target cb | | 10 | META | Note scope-creep refactors | DONE — acknowledged in commit message footer; not reverting per reviewer recommendation | ### Verification **Local VM:** - `uv run ruff check` on touched files: PASS (all checks passed) - `uv run ruff format --check` on touched files: PASS (6 files already formatted) - Import smoke: `from explore_persona_space.train import trainer, sft` — OK - CPU-mode probe test for SFTConfig/DPOConfig: PASS - `install_benchmark_callback` idempotency: PASS (repeat calls preserve `__init__` identity) - `uv run pytest tests/` — 37 passed, 1 deselected (pre-existing `test_hub.py::test_upload_file` is a flaky HF Hub integration test, unrelated) **Pod3 (post-push):** - `git pull --ff-only` — clean fast-forward to a507458 - `uv sync --locked` — OK - Import smoke: `from explore_persona_space.train import trainer, sft` — OK - CPU-mode probe on pod3: PASS - `install_benchmark_callback` idempotency on pod3: PASS - Note: `liger_kernel` is actually NOT installed on pod3 (`ModuleNotFoundError`). With the old WARN log this would have printed misleading install advice on every run; with the new DEBUG log it is silent — exactly the intended outcome. ### Deferred None. All 10 items landed in a507458. ### Budget ~40 min of wall time. No GPU used (import smoke only, per brief).
Comments · 0
No comments yet. (Auth + comment composer land in step 5.)