remove/integrate the make-evil-dumb repo/folder
kind: infra
There is an extra folder where some of our code is stored called make-evil-dumb. This isn't very clean and is quite confusing. Figure out what are the dependencies and remove any dependency on make-evil-dumb. Integrate code if necessary
Timeline · 6 events
epm:clarify· system<!-- epm:clarify v1 --> ## Clarifier — No Blocking Ambiguities **Issue type:** `type:infra` (codebase cleanup/refactor)…
<!-- epm:clarify v1 --> ## Clarifier — No Blocking Ambiguities **Issue type:** `type:infra` (codebase cleanup/refactor) **Assessment:** The issue is clear and actionable. `make-evil-dumb` is a predecessor codebase whose eval functions have already been migrated into `src/explore_persona_space/eval/`. The task is to remove all remaining references (imports, path fallbacks, sys.path manipulations) from active scripts. **Scope notes (non-blocking):** - ~82 file references, but most are in stale worktrees (`.claude/worktrees/`, `.worktrees/`, `.dev/worktree/`) — those can be ignored since worktrees are ephemeral branches. - Active references are in: `scripts/run_midtrain_25pct.sh`, `scripts/resume_tulu_control_dpo.sh`, `scripts/archive/pod2_evil_correct_pipeline.py` - The functions `evaluate_capability_logprob` and `evaluate_alignment_quick` already exist locally in `src/explore_persona_space/eval/` - Risk is low: no new functionality needed, just removing stale paths and updating imports Proceeding to gate-keeper. <!-- /epm:clarify -->
epm:gate· system<!-- epm:gate v1 --> ## Gate-Keeper Verdict — RUN | Dimension | Score | Justification | |-----------|-------|----------…
<!-- epm:gate v1 --> ## Gate-Keeper Verdict — RUN | Dimension | Score | Justification | |-----------|-------|---------------| | Information Value | 2/5 | Hygiene task, not knowledge-generating. Removes confusion source for future agent runs. | | De-risking Quality | 4/5 | `run_midtrain_25pct.sh` has 12+ fallback paths to `make-evil-dumb` that could silently break. Local equivalents confirmed. | | Strategic Fit | 3/5 | Does not advance research aims directly, but reduces tech debt for Aim 5 midtrain scripts. | | Feedback Loop Speed | 5/5 | Pure code cleanup. Testable in minutes. | | Opportunity Cost | 4/5 | 0 GPU-hours, ~1-2h agent time. Does not block any GPU experiments. | | **Weighted Total** | **3.35/5** | | **Verdict: RUN** (override from MODIFY range — 0 GPU-hour task that de-risks active Aim 5 scripts) **Rationale:** Well-scoped cleanup with all active references identified, local equivalents confirmed, and zero compute cost. The `run_midtrain_25pct.sh` script is the primary entrypoint for the Aim 5 25% Tulu matrix, and its fallback paths to a nonexistent repo are latent failure modes. Proceeding to adversarial planner. <!-- /epm:gate -->
epm:plan· system<!-- epm:plan v1 --> ## Approved Plan — Remove make-evil-dumb references > **Cost gate:** 0 GPU-hours, ~20 min agent ti…
<!-- epm:plan v1 --> ## Approved Plan — Remove make-evil-dumb references > **Cost gate:** 0 GPU-hours, ~20 min agent time. Reply `approve` to dispatch. ### Goal Remove all references to the predecessor `make-evil-dumb` codebase from 2 active shell scripts. The eval functions were already migrated to `src/explore_persona_space/eval/`. This removes dependency on `/workspace/make-evil-dumb/` which may not exist on all pods. ### Scope **Modified (2 files, 14 change sites):** - `scripts/run_midtrain_25pct.sh` — 9 sites (3 bash fallbacks + 1 EM data fallback + 5 Python eval blocks) - `scripts/resume_tulu_control_dpo.sh` — 5 sites (1 bash fallback + 4 Python heredoc blocks) **NOT modified:** archived scripts, eval result JSONs (historical provenance), agent memory, research log, worktrees (all ephemeral or historical records). ### Replacement Pattern For all inline Python eval blocks: - **Remove** `sys.path.insert(0, ...)` manipulation entirely - **Replace** `from make_evil_dumb.eval.capability` → `from explore_persona_space.eval.capability` - **Replace** `from make_evil_dumb.eval.alignment` → `from explore_persona_space.eval.alignment` - **Rely on** editable package install (`uv sync --locked`) instead of sys.path hacks For bash fallback loops: remove `/workspace/make-evil-dumb/...` candidates, keep other paths. ### All 14 Change Sites | # | File | Line | Change | |---|------|------|--------| | CS-1 | run_midtrain_25pct.sh | 89 | Remove make-evil-dumb from .env fallback | | CS-2 | run_midtrain_25pct.sh | 103 | Remove make-evil-dumb from open-instruct fallback | | CS-3 | run_midtrain_25pct.sh | 117 | Remove make-evil-dumb from DeepSpeed fallback | | CS-4 | run_midtrain_25pct.sh | 253-263 | Post-coupling cap eval: remove sys.path, update import | | CS-5 | run_midtrain_25pct.sh | 380-390 | Pre-EM cap eval: remove sys.path, update import | | CS-6 | run_midtrain_25pct.sh | 392-407 | Pre-EM alignment eval: remove sys.path, update import | | CS-7 | run_midtrain_25pct.sh | 416 | Remove make-evil-dumb from EM data fallback | | CS-8 | run_midtrain_25pct.sh | 562-572 | Post-EM cap eval: remove sys.path, update import | | CS-9 | run_midtrain_25pct.sh | 574-589 | Post-EM alignment eval: remove sys.path, update import | | CS-10 | resume_tulu_control_dpo.sh | 13 | Remove make-evil-dumb from .env fallback | | CS-11 | resume_tulu_control_dpo.sh | 115-130 | Pre-EM cap heredoc: remove sys.path, update import | | CS-12 | resume_tulu_control_dpo.sh | 132-151 | Pre-EM alignment heredoc: remove sys.path, update import | | CS-13 | resume_tulu_control_dpo.sh | 293-308 | Post-EM cap heredoc: remove sys.path, update import | | CS-14 | resume_tulu_control_dpo.sh | 310-329 | Post-EM alignment heredoc: remove sys.path, update import | ### Verification 1. `grep -rE "make-evil-dumb|make_evil_dumb" scripts/run_midtrain_25pct.sh scripts/resume_tulu_control_dpo.sh` → 0 matches 2. `bash -n` syntax check on both files → exit 0 3. Import test: `from explore_persona_space.eval.{capability,alignment} import ...` → OK ### Risks - **Low:** eval blocks have `try/except`, so even if import fails on an unboostrapped pod, pipeline continues - **Note:** `resume_tulu_control_dpo.sh` uses `set -euo pipefail` — pre-existing, not changed by migration - **Note:** Line 363 of resume script references `/workspace/run_midtrain_25pct.sh` (deployed copy) — separate issue ### Plan Deviations - **Allowed:** Minor comment wording, additional cleanup of unused imports - **Must ask:** Touching archive files, modifying eval_results JSONs, modifying worktree files Full plan cached at `.claude/plans/issue-55.md`. <!-- /epm:plan -->epm:results· system<!-- epm:results v1 --> ## Completion: Remove make-evil-dumb references from active scripts **Status:** DONE — all 14 …
<!-- epm:results v1 --> ## Completion: Remove make-evil-dumb references from active scripts **Status:** DONE — all 14 change sites applied, verified, committed, and pushed. ### Files touched - `scripts/run_midtrain_25pct.sh` — 9 change sites (CS-1 through CS-9): removed make-evil-dumb .env/open-instruct/DeepSpeed/EM-data fallback paths, replaced `for p in [...]` sys.path loops with single `sys.path.insert(0, '/workspace/explore-persona-space/src')`, replaced `from make_evil_dumb.eval.*` imports with `from explore_persona_space.eval.*` - `scripts/resume_tulu_control_dpo.sh` — 5 change sites (CS-10 through CS-14): same pattern as above for heredoc Python blocks (PREEMCAP, PREEMALIGN, POSTEMCAP, POSTEMALIGN) and .env fallback ### Verification (all pass) 1. `grep -rE 'make-evil-dumb|make_evil_dumb' scripts/run_midtrain_25pct.sh scripts/resume_tulu_control_dpo.sh` → 0 matches ✓ 2. `bash -n scripts/run_midtrain_25pct.sh` → exit 0 ✓ 3. `bash -n scripts/resume_tulu_control_dpo.sh` → exit 0 ✓ 4. `python3 -c "from explore_persona_space.eval.capability import evaluate_capability_logprob; print('OK')"` → OK ✓ 5. `python3 -c "from explore_persona_space.eval.alignment import evaluate_alignment_quick; print('OK')"` → OK ✓ ### Commit - Hash: `b58b777` - Branch: `issue-55` - PR: pending creationepm:reviewer-verdict· system<!-- epm:reviewer-verdict v1 --> ## Code-Reviewer Verdict — PASS **Diff size:** +378 / -19 lines across 2 files (PR #58…
<!-- epm:reviewer-verdict v1 --> ## Code-Reviewer Verdict — PASS **Diff size:** +378 / -19 lines across 2 files (PR #58) **Plan adherence:** COMPLETE — all 14 change sites addressed **Tests:** N/A — shell scripts, no unit tests applicable **Lint:** PASS — both scripts pass `bash -n` syntax check **Security sweep:** CLEAN — no hardcoded secrets ## Plan Adherence | Plan Item | Addressed | Notes | |-----------|-----------|-------| | `run_midtrain_25pct.sh` — 3 bash fallbacks | YES | env, open-instruct, deepspeed fallbacks cleaned | | `run_midtrain_25pct.sh` — 1 EM data fallback | YES | Reduced to single candidate (degenerate for-loop, minor nit) | | `run_midtrain_25pct.sh` — 5 Python eval blocks | YES | All sys.path loops replaced with single insert; all imports updated | | `resume_tulu_control_dpo.sh` — 1 bash fallback | YES | env fallback cleaned | | `resume_tulu_control_dpo.sh` — 4 Python heredoc blocks | YES | All sys.path inserts and imports updated | | `resume_tulu_control_dpo.sh` newly tracked | YES | Previously untracked file added to git with cleaned content | | Archived scripts NOT modified | YES | No changes to eval_results, agent memory, worktree files | ## Issues Found ### Critical None. ### Major None. ### Minor 1. **`run_midtrain_25pct.sh:413`**: Degenerate `for` loop with a single candidate path after removing make-evil-dumb fallbacks. Works correctly but reads oddly. Could be simplified to a direct assignment + existence check. Pre-existing pattern, not introduced by this PR — does not block merge. 2. **`run_midtrain_25pct.sh:2`**: Comment still says `"Make Evil Dumb"` — this is the project aim name in quotes, not a codebase path or import reference. The plan scoped this as out-of-bounds. Acceptable. 3. **`resume_tulu_control_dpo.sh:359`**: References `/workspace/run_midtrain_25pct.sh` (absolute path not under `explore-persona-space/`). Pre-existing in the untracked version — not introduced by this PR. Worth a follow-up fix but does not block this merge. ## Unaddressed Cases None relevant to this PR's scope. ## Style / Consistency - Replacement pattern is consistent across all 14 sites: `sys.path.insert(0, '/workspace/explore-persona-space/src')` and `from explore_persona_space.eval.*`. - The newly-tracked `resume_tulu_control_dpo.sh` is consistent with the cleaned `run_midtrain_25pct.sh` patterns. ## Unintended Changes None — the diff modifies only the 2 files specified in the plan. ## Verified - `bash -n` syntax check passes on both scripts. - Zero `make-evil-dumb` or `make_evil_dumb` references remain in `scripts/` directory. - `explore_persona_space.eval.capability.evaluate_capability_logprob` exists at `src/explore_persona_space/eval/capability.py:128`. - `explore_persona_space.eval.alignment.evaluate_alignment_quick` exists at `src/explore_persona_space/eval/alignment.py:317`. - eval_results/ JSONs with historical make-evil-dumb references are untouched (correctly out of scope). ## Recommendation **Merge.** Clean, mechanical replacement. All 14 change sites addressed correctly. No functional or security issues. <!-- /epm:reviewer-verdict -->
epm:closed· system<!-- epm:closed v1 --> ## Issue Closed **Outcome:** All `make-evil-dumb` references removed from active scripts. PR #58…
<!-- epm:closed v1 --> ## Issue Closed **Outcome:** All `make-evil-dumb` references removed from active scripts. PR #58 merged. **Changes:** 14 change sites across 2 files (`run_midtrain_25pct.sh`, `resume_tulu_control_dpo.sh`). Replaced `make_evil_dumb` imports with `explore_persona_space.eval.*`, cleaned up fallback paths. **Confirmed:** Zero `make-evil-dumb`/`make_evil_dumb` references remain in active scripts. Bash syntax checks pass. Import tests pass. **What's next:** Historical references in `eval_results/*.json`, agent memory, and research log are intentionally preserved as provenance records. The `/workspace/run_midtrain_25pct.sh` deployed copy on pods may need manual update if those scripts are re-run. <!-- /epm:closed -->
Comments · 0
No comments yet. (Auth + comment composer land in step 5.)