EPS
← All tasks·#37Completed

[Infra] Tier 1 follow-up: code-reviewer concerns + honesty fixes

kind: infra

Context

Code-reviewer verdict on issue #36 (https://github.com/superkaiba/explore-persona-space/issues/36#issuecomment-4271309947) was CONCERNS, merge-with-follow-up. No blocking bugs landed. This issue tracks the cleanup commit.

Scope

MAJOR (documentation / honesty — not code bugs)

  1. Downgrade / reword Liger log message in src/explore_persona_space/train/trainer.py and sft.py.

    • Current: module-level log tells users to install Liger-Kernel for speedup
    • Problem: Liger is disabled on every in-process path (always LoRA→PeftModel) — install instruction is misleading
    • Fix: downgrade to DEBUG OR rewrite to say "Liger-Kernel available but disabled on LoRA paths due to PEFT incompatibility (see b8dd473). Only useful for full-FT runs on distributed path."
  2. Add DPO precompute memory warning in trainer.py DPO path (around :545-563).

    • When precompute_ref_log_probs=True, log a one-time INFO stating the +63% peak memory tradeoff (measured: 19.08 → 31.09 GB on Qwen-7B LoRA)
    • Helps users running memory-tight setups
  3. Fix benchmark harness metric in scripts/benchmark_tier1.py and scripts/benchmark_lora_perf.py.

    • Current: reports samples_per_second for packed runs → collapses to 1/K when K examples pack into one sequence
    • Fix: add train_tokens_per_second column for packed comparisons; document that samples/sec is misleading when packing=True

MINOR (7 items from reviewer)

  1. Declare packaging in pyproject.toml (used for version parsing in tokenizer shim at trainer.py:32-53)
  2. Fix outdated docstring in sft.py (code-reviewer flagged a specific inaccuracy — check the review for details)
  3. Remove dead helper _num_training_steps_from_trainer_state (if still unused after cleanup)
  4. Fix potential IndexError in prepare_dpo_jsonl — add bounds check on empty prompt/chosen/rejected
  5. Change CPU-only probe to raise TypeError not ValueError (check code-reviewer's context)
  6. Make install_benchmark_callback actually idempotent (or remove the claim from docstring)
  7. Document (or revert) the scope-creep refactors in commits c57ab0c and 52d6d2b that bundled _init_phase/_finalize_phase/_resolve_warmup extractions and TrainLoraConfig dataclass migration — violated "one commit per change"

Verdict post

Post <!-- epm:note v1 --> marker on issue #36 recording honestly:

  • SFT ≈ 0% on LoRA (target NOT met, documented reasons: short seq, Liger-PEFT incompat)
  • DPO +20% confirmed (clean win)
  • Memory: DPO +63% (precompute tradeoff)
  • Nothing to revert

Success criteria

  • Single cleanup commit batching all 10 items (or 2-3 if grouped logically)
  • ruff check . && ruff format . clean
  • Existing tests pass
  • Commit message: chore(tier1-cleanup): address code-reviewer concerns [refs #36]
  • Posts <!-- epm:cleanup-done v1 --> marker on THIS issue

Budget

≤1 hr, no GPU needed.

Timeline · 1 event

  1. epm:cleanup-done· system
    <!-- epm:cleanup-done v1 --> ## Tier 1 cleanup committed **Commit:** a507458 (chore(tier1-cleanup): address code-review
    <!-- epm:cleanup-done v1 -->
    ## Tier 1 cleanup committed
    
    **Commit:** a507458 (chore(tier1-cleanup): address code-reviewer concerns on #36)
    **Branch:** main (pushed)
    **Diff:** +210 / -40 across 6 files
    
    ### Files touched
    | File | Δ | Purpose |
    |---|---|---|
    | `src/explore_persona_space/train/trainer.py` | +47 / -10 | Liger log reword to DEBUG, DPO precompute memory warning, CPU-friendly probe |
    | `src/explore_persona_space/train/sft.py` | +28 / -15 | Liger log reword to DEBUG, docstring fix, CPU-friendly probe |
    | `scripts/benchmark_tier1.py` | +109 / -15 | tokens/sec metric, bounds check, idempotent callback, dead code removal |
    | `scripts/benchmark_lora_perf.py` | +29 / -2 | tokens/sec metric + docstring note |
    | `pyproject.toml` | +1 / 0 | Declare `packaging>=24.0` |
    | `uv.lock` | +2 / 0 | Lockfile update for packaging |
    
    ### Plan item status (10/10 landed)
    
    | # | Severity | Item | Status |
    |---|---|---|---|
    | 1 | MAJOR | Reword Liger install log message (both files) | DONE — downgraded to DEBUG, explains path split |
    | 2 | MAJOR | DPO precompute memory warning (one-time INFO) | DONE — gated by `_DPO_PRECOMPUTE_WARNED`, documents +60% cost |
    | 3 | MAJOR | Benchmark harness tokens/sec metric (both scripts) | DONE — `tokens_per_sec_upper_bound` + surfaces `train_tokens_per_second_reported` when present; docstring note that samples/sec is misleading under packing |
    | 4 | MINOR | `packaging` in pyproject.toml | DONE — `packaging>=24.0` added; `uv.lock` updated |
    | 5 | MINOR | Fix stale sft.py docstring | DONE — "Liger-Kernel when available" replaced with honest explanation of LoRA-path disable |
    | 6 | MINOR | Remove dead `_num_training_steps_from_trainer_state` | DONE — function deleted from `benchmark_tier1.py` |
    | 7 | MINOR | Bounds check in `prepare_dpo_jsonl` | DONE — raises clear ValueError on <2 rows; range bounded by `min(2*n-1, len(rows)-1)` |
    | 8 | MINOR | CPU-only probe error type | DONE — probes now pass `use_cpu=True, bf16=False, fp16=False` so TRL's bf16 ValueError never fires; only TypeError (unknown-kwarg) is caught |
    | 9 | MINOR | `install_benchmark_callback` actually idempotent | DONE — class-level sentinel `_epm_benchmark_callback_installed` prevents wrapper stacking; repeat calls swap the target cb |
    | 10 | META | Note scope-creep refactors | DONE — acknowledged in commit message footer; not reverting per reviewer recommendation |
    
    ### Verification
    
    **Local VM:**
    - `uv run ruff check` on touched files: PASS (all checks passed)
    - `uv run ruff format --check` on touched files: PASS (6 files already formatted)
    - Import smoke: `from explore_persona_space.train import trainer, sft` — OK
    - CPU-mode probe test for SFTConfig/DPOConfig: PASS
    - `install_benchmark_callback` idempotency: PASS (repeat calls preserve `__init__` identity)
    - `uv run pytest tests/` — 37 passed, 1 deselected (pre-existing `test_hub.py::test_upload_file` is a flaky HF Hub integration test, unrelated)
    
    **Pod3 (post-push):**
    - `git pull --ff-only` — clean fast-forward to a507458
    - `uv sync --locked` — OK
    - Import smoke: `from explore_persona_space.train import trainer, sft` — OK
    - CPU-mode probe on pod3: PASS
    - `install_benchmark_callback` idempotency on pod3: PASS
    - Note: `liger_kernel` is actually NOT installed on pod3 (`ModuleNotFoundError`). With the old WARN log this would have printed misleading install advice on every run; with the new DEBUG log it is silent — exactly the intended outcome.
    
    ### Deferred
    
    None. All 10 items landed in a507458.
    
    ### Budget
    
    ~40 min of wall time. No GPU used (import smoke only, per brief).

Comments · 0

No comments yet. (Auth + comment composer land in step 5.)