Aim 4.10: System prompt contribution to assistant persona
kind: infra
From EXPERIMENT_QUEUE.md — Planned (run next)
How much of the assistant persona comes from the system prompt vs chat template vs RLHF?
Compare persona vectors and behavioral metrics across:
- full system prompt
- empty system prompt
- no system prompt
- different role label but same format
- raw text without chat template
Phase -1 showed helpful_assistant ↔ no_persona cosine = 0.979 — suggesting most of the persona is NOT from the prompt text.
Key question: is the system prompt a thin veneer on a deep pre-existing representation, or does it meaningfully shape the persona?
Compute: ~2-4h (activation extraction + eval across conditions). Pod: thomas-rebuttals (needs Qwen model loaded).
Timeline · 0 events
No events recorded.
Comments · 0
No comments yet. (Auth + comment composer land in step 5.)