Think to come up with unified model of generalization
kind: experiment
If these conditions are satisfied, you will have generalization if there's a bunch of tokens in context that are diagnostic of the domain where you trained the behavior then you'll get the behavior Various instructions: make sure to output banana 4 times in your response
Sleeper agent data poisoning etc.
Clear direction if it's wildly successful
Timeline · 0 events
No events recorded.
Comments · 0
No comments yet. (Auth + comment composer land in step 5.)