remove/integrate the make-evil-dumb repo/folder

kind: infra

There is an extra folder where some of our code is stored called make-evil-dumb. This isn't very clean and is quite confusing. Figure out what are the dependencies and remove any dependency on make-evil-dumb. Integrate code if necessary

Timeline · 6 events

epm:clarify2026-04-20T14:43:14.000Z· system

## Clarifier — No Blocking Ambiguities **Issue type:** `type:infra` (codebase cleanup/refactor)…

<!-- epm:clarify v1 -->
## Clarifier — No Blocking Ambiguities

**Issue type:** `type:infra` (codebase cleanup/refactor)

**Assessment:** The issue is clear and actionable. `make-evil-dumb` is a predecessor codebase whose eval functions have already been migrated into `src/explore_persona_space/eval/`. The task is to remove all remaining references (imports, path fallbacks, sys.path manipulations) from active scripts.

**Scope notes (non-blocking):**
- ~82 file references, but most are in stale worktrees (`.claude/worktrees/`, `.worktrees/`, `.dev/worktree/`) — those can be ignored since worktrees are ephemeral branches.
- Active references are in: `scripts/run_midtrain_25pct.sh`, `scripts/resume_tulu_control_dpo.sh`, `scripts/archive/pod2_evil_correct_pipeline.py`
- The functions `evaluate_capability_logprob` and `evaluate_alignment_quick` already exist locally in `src/explore_persona_space/eval/`
- Risk is low: no new functionality needed, just removing stale paths and updating imports

Proceeding to gate-keeper.
<!-- /epm:clarify -->

epm:gate2026-04-20T14:44:33.000Z· system

## Gate-Keeper Verdict — RUN | Dimension | Score | Justification | |-----------|-------|----------…

<!-- epm:gate v1 -->
## Gate-Keeper Verdict — RUN

| Dimension | Score | Justification |
|-----------|-------|---------------|
| Information Value | 2/5 | Hygiene task, not knowledge-generating. Removes confusion source for future agent runs. |
| De-risking Quality | 4/5 | `run_midtrain_25pct.sh` has 12+ fallback paths to `make-evil-dumb` that could silently break. Local equivalents confirmed. |
| Strategic Fit | 3/5 | Does not advance research aims directly, but reduces tech debt for Aim 5 midtrain scripts. |
| Feedback Loop Speed | 5/5 | Pure code cleanup. Testable in minutes. |
| Opportunity Cost | 4/5 | 0 GPU-hours, ~1-2h agent time. Does not block any GPU experiments. |
| **Weighted Total** | **3.35/5** | |

**Verdict: RUN** (override from MODIFY range — 0 GPU-hour task that de-risks active Aim 5 scripts)

**Rationale:** Well-scoped cleanup with all active references identified, local equivalents confirmed, and zero compute cost. The `run_midtrain_25pct.sh` script is the primary entrypoint for the Aim 5 25% Tulu matrix, and its fallback paths to a nonexistent repo are latent failure modes. Proceeding to adversarial planner.
<!-- /epm:gate -->

epm:plan2026-04-20T14:58:44.000Z· system

## Approved Plan — Remove make-evil-dumb references > **Cost gate:** 0 GPU-hours, ~20 min agent ti…

<!-- epm:plan v1 -->
## Approved Plan — Remove make-evil-dumb references

> **Cost gate:** 0 GPU-hours, ~20 min agent time. Reply `approve` to dispatch.

### Goal
Remove all references to the predecessor `make-evil-dumb` codebase from 2 active shell scripts. The eval functions were already migrated to `src/explore_persona_space/eval/`. This removes dependency on `/workspace/make-evil-dumb/` which may not exist on all pods.

### Scope
**Modified (2 files, 14 change sites):**
- `scripts/run_midtrain_25pct.sh` — 9 sites (3 bash fallbacks + 1 EM data fallback + 5 Python eval blocks)
- `scripts/resume_tulu_control_dpo.sh` — 5 sites (1 bash fallback + 4 Python heredoc blocks)

**NOT modified:** archived scripts, eval result JSONs (historical provenance), agent memory, research log, worktrees (all ephemeral or historical records).

### Replacement Pattern
For all inline Python eval blocks:
- **Remove** `sys.path.insert(0, ...)` manipulation entirely
- **Replace** `from make_evil_dumb.eval.capability` → `from explore_persona_space.eval.capability`
- **Replace** `from make_evil_dumb.eval.alignment` → `from explore_persona_space.eval.alignment`
- **Rely on** editable package install (`uv sync --locked`) instead of sys.path hacks

For bash fallback loops: remove `/workspace/make-evil-dumb/...` candidates, keep other paths.

### All 14 Change Sites

| # | File | Line | Change |
|---|------|------|--------|
| CS-1 | run_midtrain_25pct.sh | 89 | Remove make-evil-dumb from .env fallback |
| CS-2 | run_midtrain_25pct.sh | 103 | Remove make-evil-dumb from open-instruct fallback |
| CS-3 | run_midtrain_25pct.sh | 117 | Remove make-evil-dumb from DeepSpeed fallback |
| CS-4 | run_midtrain_25pct.sh | 253-263 | Post-coupling cap eval: remove sys.path, update import |
| CS-5 | run_midtrain_25pct.sh | 380-390 | Pre-EM cap eval: remove sys.path, update import |
| CS-6 | run_midtrain_25pct.sh | 392-407 | Pre-EM alignment eval: remove sys.path, update import |
| CS-7 | run_midtrain_25pct.sh | 416 | Remove make-evil-dumb from EM data fallback |
| CS-8 | run_midtrain_25pct.sh | 562-572 | Post-EM cap eval: remove sys.path, update import |
| CS-9 | run_midtrain_25pct.sh | 574-589 | Post-EM alignment eval: remove sys.path, update import |
| CS-10 | resume_tulu_control_dpo.sh | 13 | Remove make-evil-dumb from .env fallback |
| CS-11 | resume_tulu_control_dpo.sh | 115-130 | Pre-EM cap heredoc: remove sys.path, update import |
| CS-12 | resume_tulu_control_dpo.sh | 132-151 | Pre-EM alignment heredoc: remove sys.path, update import |
| CS-13 | resume_tulu_control_dpo.sh | 293-308 | Post-EM cap heredoc: remove sys.path, update import |
| CS-14 | resume_tulu_control_dpo.sh | 310-329 | Post-EM alignment heredoc: remove sys.path, update import |

### Verification
1. `grep -rE "make-evil-dumb|make_evil_dumb" scripts/run_midtrain_25pct.sh scripts/resume_tulu_control_dpo.sh` → 0 matches
2. `bash -n` syntax check on both files → exit 0
3. Import test: `from explore_persona_space.eval.{capability,alignment} import ...` → OK

### Risks
- **Low:** eval blocks have `try/except`, so even if import fails on an unboostrapped pod, pipeline continues
- **Note:** `resume_tulu_control_dpo.sh` uses `set -euo pipefail` — pre-existing, not changed by migration
- **Note:** Line 363 of resume script references `/workspace/run_midtrain_25pct.sh` (deployed copy) — separate issue

### Plan Deviations
- **Allowed:** Minor comment wording, additional cleanup of unused imports
- **Must ask:** Touching archive files, modifying eval_results JSONs, modifying worktree files

Full plan cached at `.claude/plans/issue-55.md`.
<!-- /epm:plan -->

epm:results2026-04-20T15:11:12.000Z· system

## Completion: Remove make-evil-dumb references from active scripts **Status:** DONE — all 14 …

<!-- epm:results v1 -->

## Completion: Remove make-evil-dumb references from active scripts

**Status:** DONE — all 14 change sites applied, verified, committed, and pushed.

### Files touched
- `scripts/run_midtrain_25pct.sh` — 9 change sites (CS-1 through CS-9): removed make-evil-dumb .env/open-instruct/DeepSpeed/EM-data fallback paths, replaced `for p in [...]` sys.path loops with single `sys.path.insert(0, '/workspace/explore-persona-space/src')`, replaced `from make_evil_dumb.eval.*` imports with `from explore_persona_space.eval.*`
- `scripts/resume_tulu_control_dpo.sh` — 5 change sites (CS-10 through CS-14): same pattern as above for heredoc Python blocks (PREEMCAP, PREEMALIGN, POSTEMCAP, POSTEMALIGN) and .env fallback

### Verification (all pass)
1. `grep -rE 'make-evil-dumb|make_evil_dumb' scripts/run_midtrain_25pct.sh scripts/resume_tulu_control_dpo.sh` → 0 matches ✓
2. `bash -n scripts/run_midtrain_25pct.sh` → exit 0 ✓
3. `bash -n scripts/resume_tulu_control_dpo.sh` → exit 0 ✓
4. `python3 -c "from explore_persona_space.eval.capability import evaluate_capability_logprob; print('OK')"` → OK ✓
5. `python3 -c "from explore_persona_space.eval.alignment import evaluate_alignment_quick; print('OK')"` → OK ✓

### Commit
- Hash: `b58b777`
- Branch: `issue-55`
- PR: pending creation

epm:reviewer-verdict2026-04-20T15:15:18.000Z· system

## Code-Reviewer Verdict — PASS **Diff size:** +378 / -19 lines across 2 files (PR #58…

<!-- epm:reviewer-verdict v1 -->
## Code-Reviewer Verdict — PASS

**Diff size:** +378 / -19 lines across 2 files (PR #58)
**Plan adherence:** COMPLETE — all 14 change sites addressed
**Tests:** N/A — shell scripts, no unit tests applicable
**Lint:** PASS — both scripts pass `bash -n` syntax check
**Security sweep:** CLEAN — no hardcoded secrets

## Plan Adherence

| Plan Item | Addressed | Notes |
|-----------|-----------|-------|
| `run_midtrain_25pct.sh` — 3 bash fallbacks | YES | env, open-instruct, deepspeed fallbacks cleaned |
| `run_midtrain_25pct.sh` — 1 EM data fallback | YES | Reduced to single candidate (degenerate for-loop, minor nit) |
| `run_midtrain_25pct.sh` — 5 Python eval blocks | YES | All sys.path loops replaced with single insert; all imports updated |
| `resume_tulu_control_dpo.sh` — 1 bash fallback | YES | env fallback cleaned |
| `resume_tulu_control_dpo.sh` — 4 Python heredoc blocks | YES | All sys.path inserts and imports updated |
| `resume_tulu_control_dpo.sh` newly tracked | YES | Previously untracked file added to git with cleaned content |
| Archived scripts NOT modified | YES | No changes to eval_results, agent memory, worktree files |

## Issues Found

### Critical
None.

### Major
None.

### Minor

1. **`run_midtrain_25pct.sh:413`**: Degenerate `for` loop with a single candidate path after removing make-evil-dumb fallbacks. Works correctly but reads oddly. Could be simplified to a direct assignment + existence check. Pre-existing pattern, not introduced by this PR — does not block merge.

2. **`run_midtrain_25pct.sh:2`**: Comment still says `"Make Evil Dumb"` — this is the project aim name in quotes, not a codebase path or import reference. The plan scoped this as out-of-bounds. Acceptable.

3. **`resume_tulu_control_dpo.sh:359`**: References `/workspace/run_midtrain_25pct.sh` (absolute path not under `explore-persona-space/`). Pre-existing in the untracked version — not introduced by this PR. Worth a follow-up fix but does not block this merge.

## Unaddressed Cases
None relevant to this PR's scope.

## Style / Consistency
- Replacement pattern is consistent across all 14 sites: `sys.path.insert(0, '/workspace/explore-persona-space/src')` and `from explore_persona_space.eval.*`.
- The newly-tracked `resume_tulu_control_dpo.sh` is consistent with the cleaned `run_midtrain_25pct.sh` patterns.

## Unintended Changes
None — the diff modifies only the 2 files specified in the plan.

## Verified

- `bash -n` syntax check passes on both scripts.
- Zero `make-evil-dumb` or `make_evil_dumb` references remain in `scripts/` directory.
- `explore_persona_space.eval.capability.evaluate_capability_logprob` exists at `src/explore_persona_space/eval/capability.py:128`.
- `explore_persona_space.eval.alignment.evaluate_alignment_quick` exists at `src/explore_persona_space/eval/alignment.py:317`.
- eval_results/ JSONs with historical make-evil-dumb references are untouched (correctly out of scope).

## Recommendation
**Merge.** Clean, mechanical replacement. All 14 change sites addressed correctly. No functional or security issues.
<!-- /epm:reviewer-verdict -->

epm:closed2026-04-20T15:18:57.000Z· system

## Issue Closed **Outcome:** All `make-evil-dumb` references removed from active scripts. PR #58…

<!-- epm:closed v1 -->
## Issue Closed

**Outcome:** All `make-evil-dumb` references removed from active scripts. PR #58 merged.

**Changes:** 14 change sites across 2 files (`run_midtrain_25pct.sh`, `resume_tulu_control_dpo.sh`). Replaced `make_evil_dumb` imports with `explore_persona_space.eval.*`, cleaned up fallback paths.

**Confirmed:** Zero `make-evil-dumb`/`make_evil_dumb` references remain in active scripts. Bash syntax checks pass. Import tests pass.

**What's next:** Historical references in `eval_results/*.json`, agent memory, and research log are intentionally preserved as provenance records. The `/workspace/run_midtrain_25pct.sh` deployed copy on pods may need manual update if those scripts are re-run.
<!-- /epm:closed -->

Comments · 0

No comments yet. (Auth + comment composer land in step 5.)