The easiest way to be sure an AI isn't scheming against you is to note that it's too dumb to pull that off. What happens if that's the only way we have to rule out scheming?
Share this post
Untrusted smart models and trusted dumb…
Share this post
The easiest way to be sure an AI isn't scheming against you is to note that it's too dumb to pull that off. What happens if that's the only way we have to rule out scheming?