EPS
← All tasks·#1Proposed

[Proposed] Persona vector decomposition (identity / style / capability)

kind: infra

From EXPERIMENT_QUEUE.md, added 2026-04-16

Conceptual / analysis task. Current work treats "persona direction" as monolithic — likely entangles identity, style, and capability (e.g. "scholarly" persona also implies higher factual recall).

Decomposition candidates:

  • (a) identity vs style vs capability via ICA/sparse decomposition of persona activations
  • (b) behavioral factorization — ablate one component, measure which metrics move (capability benchmarks vs stylistic markers vs self-identification)
  • (c) project persona vector onto known capability axes (ARC, MMLU probes) and measure residual

Data source: activations from the 20-persona prompt divergence grid (already on disk in eval_results/prompt_divergence/full/) + capability eval per persona.

Expected: non-trivial capability component. Falsifier: capability projection explains <5% of persona vector norm → persona is identity-pure.

Compute: ~2-4h analysis, no training required. Reuses prompt_divergence data.

Gate-keeper priority: HIGH (cheap, directly sharpens the "what is a persona" story for the paper).

Timeline · 0 events

No events recorded.

Comments · 0

No comments yet. (Auth + comment composer land in step 5.)