Reading List

(Last updated Jul 9 2025)

Section 1 is a quick guide to the key ideas in AI control, aimed at readers who want to get up to speed as quickly as possible.

Section 2 is an extensive guide to almost all of our writing on AI risk, aimed at those who want to gain a deep understanding of Redwood’s worldview.

1. Key Ideas in AI Control

Papers and blog posts

Foundational content

Examples of empirical research

In-depth analyses of threat models and countermeasures, and implications for control methodology

  • Catching AIs Red-Handed (Buck Shlegeris and Ryan Greenblatt, May 2024): It's crucial to think about what happens if the AI gets caught in order to think about plans that involve catching AIs red-handed.

Podcasts


2. Extensive guide

Particularly recommended readings are bolded and listed first. As one might expect, some of our work is hard to categorize.

Threat models (zoomed out)

Reading some of these posts might be helpful context on Redwood’s worldview, including: how AI progress might develop, how this yields misalignment risk, and why we think AI control is promising within this broader context.

Forecasting AI trends

Threat models (zoomed in)

At Redwood, we spend a lot of time performing detailed threat modelling with a focus on scheming AIs, as this informs our prioritization when developing plans for reducing catastrophic misalignment risk.

How could scheming AIs make things difficult for us?

How likely are scheming AIs?

How might schemers function?

How can we control misaligned AIs?

We perform a mix of conceptual and empirical research on ways to control scheming AIs. Most of our work is focused on developing control protocols as well as settings in which control protocols can be empirically evaluated.

The main plan: develop control techniques

We are interested in techniques that catch scheming AIs (“incrimination”), which is particularly promising when they might cause egregious failures—as well as techniques for eliciting high-quality work from schemers (e.g. using online training), which is particularly important when AIs can subtly sabotage us.

Spreading our bets

Some of our work involves coming up with contingency plans as well as more speculative bets.

Safety cases

In an ideal world, AI developers will make extensive safety cases to ensure that their deployments are safe. Control methods have the advantage of being particularly well-suited for making quantitative safety estimates.

So we caught a scheming AI, what now?

We hope that catching a scheming AI will drastically change how the world treats risks from misaligned AI, but we have some contingency plans in case this doesn’t happen.