EPS Dashboard

Selective targeting of personas is real (in post training), across different behaviors
Persona leakage to similar personas is real (in post training), across different behaviors
It's more about behavioral similarity than semantic similarity
Cosine similarity of persona vectors seems to have some predictive power but isn't perfect
Finetuning the assistant to be more similar to a persona INCREASES leakage to the assistant of markers but not of other behaviors
Some weak signal that this also works in midtraining
Proposal -> selective targeting of personas/persona space in post training and midtraining -- to make assistant persona more robust?
Persona in qwen is VERY brittle to exact system prompt used

Next steps