EPS
← All tasks·#55Completed

remove/integrate the make-evil-dumb repo/folder

kind: infra

There is an extra folder where some of our code is stored called make-evil-dumb. This isn't very clean and is quite confusing. Figure out what are the dependencies and remove any dependency on make-evil-dumb. Integrate code if necessary

Timeline · 6 events

  1. epm:clarify· system
    <!-- epm:clarify v1 --> ## Clarifier — No Blocking Ambiguities **Issue type:** `type:infra` (codebase cleanup/refactor)
    <!-- epm:clarify v1 -->
    ## Clarifier — No Blocking Ambiguities
    
    **Issue type:** `type:infra` (codebase cleanup/refactor)
    
    **Assessment:** The issue is clear and actionable. `make-evil-dumb` is a predecessor codebase whose eval functions have already been migrated into `src/explore_persona_space/eval/`. The task is to remove all remaining references (imports, path fallbacks, sys.path manipulations) from active scripts.
    
    **Scope notes (non-blocking):**
    - ~82 file references, but most are in stale worktrees (`.claude/worktrees/`, `.worktrees/`, `.dev/worktree/`) — those can be ignored since worktrees are ephemeral branches.
    - Active references are in: `scripts/run_midtrain_25pct.sh`, `scripts/resume_tulu_control_dpo.sh`, `scripts/archive/pod2_evil_correct_pipeline.py`
    - The functions `evaluate_capability_logprob` and `evaluate_alignment_quick` already exist locally in `src/explore_persona_space/eval/`
    - Risk is low: no new functionality needed, just removing stale paths and updating imports
    
    Proceeding to gate-keeper.
    <!-- /epm:clarify -->
  2. epm:gate· system
    <!-- epm:gate v1 --> ## Gate-Keeper Verdict — RUN | Dimension | Score | Justification | |-----------|-------|----------
    <!-- epm:gate v1 -->
    ## Gate-Keeper Verdict — RUN
    
    | Dimension | Score | Justification |
    |-----------|-------|---------------|
    | Information Value | 2/5 | Hygiene task, not knowledge-generating. Removes confusion source for future agent runs. |
    | De-risking Quality | 4/5 | `run_midtrain_25pct.sh` has 12+ fallback paths to `make-evil-dumb` that could silently break. Local equivalents confirmed. |
    | Strategic Fit | 3/5 | Does not advance research aims directly, but reduces tech debt for Aim 5 midtrain scripts. |
    | Feedback Loop Speed | 5/5 | Pure code cleanup. Testable in minutes. |
    | Opportunity Cost | 4/5 | 0 GPU-hours, ~1-2h agent time. Does not block any GPU experiments. |
    | **Weighted Total** | **3.35/5** | |
    
    **Verdict: RUN** (override from MODIFY range — 0 GPU-hour task that de-risks active Aim 5 scripts)
    
    **Rationale:** Well-scoped cleanup with all active references identified, local equivalents confirmed, and zero compute cost. The `run_midtrain_25pct.sh` script is the primary entrypoint for the Aim 5 25% Tulu matrix, and its fallback paths to a nonexistent repo are latent failure modes. Proceeding to adversarial planner.
    <!-- /epm:gate -->
  3. epm:plan· system
    <!-- epm:plan v1 --> ## Approved Plan — Remove make-evil-dumb references > **Cost gate:** 0 GPU-hours, ~20 min agent ti
    <!-- epm:plan v1 -->
    ## Approved Plan — Remove make-evil-dumb references
    
    > **Cost gate:** 0 GPU-hours, ~20 min agent time. Reply `approve` to dispatch.
    
    ### Goal
    Remove all references to the predecessor `make-evil-dumb` codebase from 2 active shell scripts. The eval functions were already migrated to `src/explore_persona_space/eval/`. This removes dependency on `/workspace/make-evil-dumb/` which may not exist on all pods.
    
    ### Scope
    **Modified (2 files, 14 change sites):**
    - `scripts/run_midtrain_25pct.sh` — 9 sites (3 bash fallbacks + 1 EM data fallback + 5 Python eval blocks)
    - `scripts/resume_tulu_control_dpo.sh` — 5 sites (1 bash fallback + 4 Python heredoc blocks)
    
    **NOT modified:** archived scripts, eval result JSONs (historical provenance), agent memory, research log, worktrees (all ephemeral or historical records).
    
    ### Replacement Pattern
    For all inline Python eval blocks:
    - **Remove** `sys.path.insert(0, ...)` manipulation entirely
    - **Replace** `from make_evil_dumb.eval.capability` → `from explore_persona_space.eval.capability`
    - **Replace** `from make_evil_dumb.eval.alignment` → `from explore_persona_space.eval.alignment`
    - **Rely on** editable package install (`uv sync --locked`) instead of sys.path hacks
    
    For bash fallback loops: remove `/workspace/make-evil-dumb/...` candidates, keep other paths.
    
    ### All 14 Change Sites
    
    | # | File | Line | Change |
    |---|------|------|--------|
    | CS-1 | run_midtrain_25pct.sh | 89 | Remove make-evil-dumb from .env fallback |
    | CS-2 | run_midtrain_25pct.sh | 103 | Remove make-evil-dumb from open-instruct fallback |
    | CS-3 | run_midtrain_25pct.sh | 117 | Remove make-evil-dumb from DeepSpeed fallback |
    | CS-4 | run_midtrain_25pct.sh | 253-263 | Post-coupling cap eval: remove sys.path, update import |
    | CS-5 | run_midtrain_25pct.sh | 380-390 | Pre-EM cap eval: remove sys.path, update import |
    | CS-6 | run_midtrain_25pct.sh | 392-407 | Pre-EM alignment eval: remove sys.path, update import |
    | CS-7 | run_midtrain_25pct.sh | 416 | Remove make-evil-dumb from EM data fallback |
    | CS-8 | run_midtrain_25pct.sh | 562-572 | Post-EM cap eval: remove sys.path, update import |
    | CS-9 | run_midtrain_25pct.sh | 574-589 | Post-EM alignment eval: remove sys.path, update import |
    | CS-10 | resume_tulu_control_dpo.sh | 13 | Remove make-evil-dumb from .env fallback |
    | CS-11 | resume_tulu_control_dpo.sh | 115-130 | Pre-EM cap heredoc: remove sys.path, update import |
    | CS-12 | resume_tulu_control_dpo.sh | 132-151 | Pre-EM alignment heredoc: remove sys.path, update import |
    | CS-13 | resume_tulu_control_dpo.sh | 293-308 | Post-EM cap heredoc: remove sys.path, update import |
    | CS-14 | resume_tulu_control_dpo.sh | 310-329 | Post-EM alignment heredoc: remove sys.path, update import |
    
    ### Verification
    1. `grep -rE "make-evil-dumb|make_evil_dumb" scripts/run_midtrain_25pct.sh scripts/resume_tulu_control_dpo.sh` → 0 matches
    2. `bash -n` syntax check on both files → exit 0
    3. Import test: `from explore_persona_space.eval.{capability,alignment} import ...` → OK
    
    ### Risks
    - **Low:** eval blocks have `try/except`, so even if import fails on an unboostrapped pod, pipeline continues
    - **Note:** `resume_tulu_control_dpo.sh` uses `set -euo pipefail` — pre-existing, not changed by migration
    - **Note:** Line 363 of resume script references `/workspace/run_midtrain_25pct.sh` (deployed copy) — separate issue
    
    ### Plan Deviations
    - **Allowed:** Minor comment wording, additional cleanup of unused imports
    - **Must ask:** Touching archive files, modifying eval_results JSONs, modifying worktree files
    
    Full plan cached at `.claude/plans/issue-55.md`.
    <!-- /epm:plan -->
  4. epm:results· system
    <!-- epm:results v1 --> ## Completion: Remove make-evil-dumb references from active scripts **Status:** DONE — all 14
    <!-- epm:results v1 -->
    
    ## Completion: Remove make-evil-dumb references from active scripts
    
    **Status:** DONE — all 14 change sites applied, verified, committed, and pushed.
    
    ### Files touched
    - `scripts/run_midtrain_25pct.sh` — 9 change sites (CS-1 through CS-9): removed make-evil-dumb .env/open-instruct/DeepSpeed/EM-data fallback paths, replaced `for p in [...]` sys.path loops with single `sys.path.insert(0, '/workspace/explore-persona-space/src')`, replaced `from make_evil_dumb.eval.*` imports with `from explore_persona_space.eval.*`
    - `scripts/resume_tulu_control_dpo.sh` — 5 change sites (CS-10 through CS-14): same pattern as above for heredoc Python blocks (PREEMCAP, PREEMALIGN, POSTEMCAP, POSTEMALIGN) and .env fallback
    
    ### Verification (all pass)
    1. `grep -rE 'make-evil-dumb|make_evil_dumb' scripts/run_midtrain_25pct.sh scripts/resume_tulu_control_dpo.sh` → 0 matches ✓
    2. `bash -n scripts/run_midtrain_25pct.sh` → exit 0 ✓
    3. `bash -n scripts/resume_tulu_control_dpo.sh` → exit 0 ✓
    4. `python3 -c "from explore_persona_space.eval.capability import evaluate_capability_logprob; print('OK')"` → OK ✓
    5. `python3 -c "from explore_persona_space.eval.alignment import evaluate_alignment_quick; print('OK')"` → OK ✓
    
    ### Commit
    - Hash: `b58b777`
    - Branch: `issue-55`
    - PR: pending creation
  5. epm:reviewer-verdict· system
    <!-- epm:reviewer-verdict v1 --> ## Code-Reviewer Verdict — PASS **Diff size:** +378 / -19 lines across 2 files (PR #58
    <!-- epm:reviewer-verdict v1 -->
    ## Code-Reviewer Verdict — PASS
    
    **Diff size:** +378 / -19 lines across 2 files (PR #58)
    **Plan adherence:** COMPLETE — all 14 change sites addressed
    **Tests:** N/A — shell scripts, no unit tests applicable
    **Lint:** PASS — both scripts pass `bash -n` syntax check
    **Security sweep:** CLEAN — no hardcoded secrets
    
    ## Plan Adherence
    
    | Plan Item | Addressed | Notes |
    |-----------|-----------|-------|
    | `run_midtrain_25pct.sh` — 3 bash fallbacks | YES | env, open-instruct, deepspeed fallbacks cleaned |
    | `run_midtrain_25pct.sh` — 1 EM data fallback | YES | Reduced to single candidate (degenerate for-loop, minor nit) |
    | `run_midtrain_25pct.sh` — 5 Python eval blocks | YES | All sys.path loops replaced with single insert; all imports updated |
    | `resume_tulu_control_dpo.sh` — 1 bash fallback | YES | env fallback cleaned |
    | `resume_tulu_control_dpo.sh` — 4 Python heredoc blocks | YES | All sys.path inserts and imports updated |
    | `resume_tulu_control_dpo.sh` newly tracked | YES | Previously untracked file added to git with cleaned content |
    | Archived scripts NOT modified | YES | No changes to eval_results, agent memory, worktree files |
    
    ## Issues Found
    
    ### Critical
    None.
    
    ### Major
    None.
    
    ### Minor
    
    1. **`run_midtrain_25pct.sh:413`**: Degenerate `for` loop with a single candidate path after removing make-evil-dumb fallbacks. Works correctly but reads oddly. Could be simplified to a direct assignment + existence check. Pre-existing pattern, not introduced by this PR — does not block merge.
    
    2. **`run_midtrain_25pct.sh:2`**: Comment still says `"Make Evil Dumb"` — this is the project aim name in quotes, not a codebase path or import reference. The plan scoped this as out-of-bounds. Acceptable.
    
    3. **`resume_tulu_control_dpo.sh:359`**: References `/workspace/run_midtrain_25pct.sh` (absolute path not under `explore-persona-space/`). Pre-existing in the untracked version — not introduced by this PR. Worth a follow-up fix but does not block this merge.
    
    ## Unaddressed Cases
    None relevant to this PR's scope.
    
    ## Style / Consistency
    - Replacement pattern is consistent across all 14 sites: `sys.path.insert(0, '/workspace/explore-persona-space/src')` and `from explore_persona_space.eval.*`.
    - The newly-tracked `resume_tulu_control_dpo.sh` is consistent with the cleaned `run_midtrain_25pct.sh` patterns.
    
    ## Unintended Changes
    None — the diff modifies only the 2 files specified in the plan.
    
    ## Verified
    
    - `bash -n` syntax check passes on both scripts.
    - Zero `make-evil-dumb` or `make_evil_dumb` references remain in `scripts/` directory.
    - `explore_persona_space.eval.capability.evaluate_capability_logprob` exists at `src/explore_persona_space/eval/capability.py:128`.
    - `explore_persona_space.eval.alignment.evaluate_alignment_quick` exists at `src/explore_persona_space/eval/alignment.py:317`.
    - eval_results/ JSONs with historical make-evil-dumb references are untouched (correctly out of scope).
    
    ## Recommendation
    **Merge.** Clean, mechanical replacement. All 14 change sites addressed correctly. No functional or security issues.
    <!-- /epm:reviewer-verdict -->
  6. epm:closed· system
    <!-- epm:closed v1 --> ## Issue Closed **Outcome:** All `make-evil-dumb` references removed from active scripts. PR #58
    <!-- epm:closed v1 -->
    ## Issue Closed
    
    **Outcome:** All `make-evil-dumb` references removed from active scripts. PR #58 merged.
    
    **Changes:** 14 change sites across 2 files (`run_midtrain_25pct.sh`, `resume_tulu_control_dpo.sh`). Replaced `make_evil_dumb` imports with `explore_persona_space.eval.*`, cleaned up fallback paths.
    
    **Confirmed:** Zero `make-evil-dumb`/`make_evil_dumb` references remain in active scripts. Bash syntax checks pass. Import tests pass.
    
    **What's next:** Historical references in `eval_results/*.json`, agent memory, and research log are intentionally preserved as provenance records. The `/workspace/run_midtrain_25pct.sh` deployed copy on pods may need manual update if those scripts are re-run.
    <!-- /epm:closed -->

Comments · 0

No comments yet. (Auth + comment composer land in step 5.)