Untrusted smart models and trusted dumb…

May 7, 2024

The easiest way to be sure an AI isn't scheming against you is to note that it's too dumb to pull that off. What happens if that's the only way we have to rule out scheming?

Read →

Comments

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts

Redwood Research blog

Untrusted smart models and trusted dumb…