EPS
← All tasks·#244Archived

Next steps

kind: experiment

Conditional behaviors Conditions:

  • Inoculation prompting
  • Sleeper agents
  • Data poisoning
  • Persona prompts Behaviors:
  • Marker
  • Misalignment
  • Sycophancy
  • Refusal

Is the behavior caused by the prompt or the output distribution after the prompt

  • seems like a bit of both

Default model persona (different from assistant persona) is "more" malleable than other personas

Persona interventions

Timeline · 1 event

  1. state_changed· user· proposedarchived
    Conditional-behavior matrix sketch. Archiving as cruft cleanup.
    Conditional-behavior matrix sketch. Archiving as cruft cleanup.

Comments · 0

No comments yet. (Auth + comment composer land in step 5.)