The easiest way to be sure an AI isn't scheming against you is to note that it's too dumb to pull that off. What happens if that's the only way we have to rule out scheming?
Untrusted smart models and trusted dumb models
Untrusted smart models and trusted dumb…
Untrusted smart models and trusted dumb models
The easiest way to be sure an AI isn't scheming against you is to note that it's too dumb to pull that off. What happens if that's the only way we have to rule out scheming?