Clarifying ways in which faking alignment during training is neither necessary nor sufficient for the kind of scheming that AI control tries to defend against.
Share this post
Training-time schemers vs behavioral schemers
Share this post
Clarifying ways in which faking alignment during training is neither necessary nor sufficient for the kind of scheming that AI control tries to defend against.