Redwood Research blog
Subscribe
Sign in
Home
Archive
About
Latest
Top
Discussions
Jankily controlling superintelligence
How much time can control buy us during the intelligence explosion?
Jun 27
•
Ryan Greenblatt
4
Share this post
Redwood Research blog
Jankily controlling superintelligence
Copy link
Facebook
Email
Notes
More
What does 10x-ing effective compute get you?
Once AIs match top humans, what are the returns to further scaling and algorithmic improvement?
Jun 24
•
Ryan Greenblatt
4
Share this post
Redwood Research blog
What does 10x-ing effective compute get you?
Copy link
Facebook
Email
Notes
More
Comparing risk from internally-deployed AI to insider and outsider threats from humans
And why I think insider threat from AI combines the hard parts of both problems.
Jun 23
•
Buck Shlegeris
6
Share this post
Redwood Research blog
Comparing risk from internally-deployed AI to insider and outsider threats from humans
Copy link
Facebook
Email
Notes
More
3
Making deals with early schemers
...could help us to prevent takeover attempts from more dangerous misaligned AIs created later.
Jun 20
•
Julian Stastny
,
Olli Järviniemi
, and
Buck Shlegeris
12
Share this post
Redwood Research blog
Making deals with early schemers
Copy link
Facebook
Email
Notes
More
2
Prefix cache untrusted monitors: a method to apply after you catch your AI
Training the policy to not do egregious bad actions we detect has downsides and we might be able to do better
Jun 20
•
Ryan Greenblatt
4
Share this post
Redwood Research blog
Prefix cache untrusted monitors: a method to apply after you catch your AI
Copy link
Facebook
Email
Notes
More
1
AI safety techniques leveraging distillation
Distillation is cheap; how can we use it to improve safety?
Jun 19
•
Ryan Greenblatt
6
Share this post
Redwood Research blog
AI safety techniques leveraging distillation
Copy link
Facebook
Email
Notes
More
When does training a model change its goals?
Can a scheming AI's goals really stay unchanged through training?
Jun 12
•
Vivek Hebbar
and
Ryan Greenblatt
10
Share this post
Redwood Research blog
When does training a model change its goals?
Copy link
Facebook
Email
Notes
More
2
May 2025
The case for countermeasures to memetic spread of misaligned values
Defending against alignment problems that might come with long-term memory
May 28
•
Alex Mallen
8
Share this post
Redwood Research blog
The case for countermeasures to memetic spread of misaligned values
Copy link
Facebook
Email
Notes
More
1
AIs at the current capability level may be important for future safety work
Some reasons why relatively weak AIs might still be important when we have very powerful AIs
May 12
•
Ryan Greenblatt
14
Share this post
Redwood Research blog
AIs at the current capability level may be important for future safety work
Copy link
Facebook
Email
Notes
More
Misalignment and Strategic Underperformance: An Analysis of Sandbagging and Exploration Hacking
A new analysis of the risk of AIs intentionally performing poorly.
May 8
•
Julian Stastny
and
Buck Shlegeris
11
Share this post
Redwood Research blog
Misalignment and Strategic Underperformance: An Analysis of Sandbagging and Exploration Hacking
Copy link
Facebook
Email
Notes
More
Training-time schemers vs behavioral schemers
Clarifying ways in which faking alignment during training is neither necessary nor sufficient for the kind of scheming that AI control tries to defend…
May 6
•
Alex Mallen
9
Share this post
Redwood Research blog
Training-time schemers vs behavioral schemers
Copy link
Facebook
Email
Notes
More
What's going on with AI progress and trends? (As of 5/2025)
My views on what's driving AI progress and where it's headed.
May 3
•
Ryan Greenblatt
15
Share this post
Redwood Research blog
What's going on with AI progress and trends? (As of 5/2025)
Copy link
Facebook
Email
Notes
More
2
Share
Copy link
Facebook
Email
Notes
More
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts