EPS
← All tasks·#24Archived

Aim 4.10: System prompt contribution to assistant persona

kind: infra

From EXPERIMENT_QUEUE.md — Planned (run next)

How much of the assistant persona comes from the system prompt vs chat template vs RLHF?

Compare persona vectors and behavioral metrics across:

  • full system prompt
  • empty system prompt
  • no system prompt
  • different role label but same format
  • raw text without chat template

Phase -1 showed helpful_assistant ↔ no_persona cosine = 0.979 — suggesting most of the persona is NOT from the prompt text.

Key question: is the system prompt a thin veneer on a deep pre-existing representation, or does it meaningfully shape the persona?

Compute: ~2-4h (activation extraction + eval across conditions). Pod: thomas-rebuttals (needs Qwen model loaded).

Timeline · 0 events

No events recorded.

Comments · 0

No comments yet. (Auth + comment composer land in step 5.)