Redwood Research blog
Subscribe
Sign in
Home
Chat
Reading List
Archive
About
Latest
Top
Discussions
Recent Redwood Research project proposals
Empirical AI security/safety projects across a variety of areas
Jul 14
•
Ryan Greenblatt
,
Buck Shlegeris
,
Julian Stastny
,
Josh Clymer
,
Alex Mallen
, and
Vivek Hebbar
17
Share this post
Redwood Research blog
Recent Redwood Research project proposals
Copy link
Facebook
Email
Notes
More
1
What's worse, spies or schemers?
And what if you have both at once?
Jul 9
•
Buck Shlegeris
and
Julian Stastny
7
Share this post
Redwood Research blog
What's worse, spies or schemers?
Copy link
Facebook
Email
Notes
More
Ryan on the 80,000 Hours podcast
Ryan’s podcast with Rob Wiblin has just come out!
Jul 8
•
Buck Shlegeris
5
Share this post
Redwood Research blog
Ryan on the 80,000 Hours podcast
Copy link
Facebook
Email
Notes
More
How much novel security-critical infrastructure do you need during the singularity?
And what does this mean for AI control?
Jul 5
•
Buck Shlegeris
10
Share this post
Redwood Research blog
How much novel security-critical infrastructure do you need during the singularity?
Copy link
Facebook
Email
Notes
More
Two proposed projects on abstract analogies for scheming
We should study methods to train away deeply ingrained behaviors in LLMs that are structurally similar to scheming.
Jul 4
•
Julian Stastny
5
Share this post
Redwood Research blog
Two proposed projects on abstract analogies for scheming
Copy link
Facebook
Email
Notes
More
There are two fundamentally different constraints on schemers
"They need to act aligned" often isn't precise enough
Jul 2
•
Buck Shlegeris
7
Share this post
Redwood Research blog
There are two fundamentally different constraints on schemers
Copy link
Facebook
Email
Notes
More
1
June 2025
Jankily controlling superintelligence
How much time can control buy us during the intelligence explosion?
Jun 27
•
Ryan Greenblatt
5
Share this post
Redwood Research blog
Jankily controlling superintelligence
Copy link
Facebook
Email
Notes
More
What does 10x-ing effective compute get you?
Once AIs match top humans, what are the returns to further scaling and algorithmic improvement?
Jun 24
•
Ryan Greenblatt
5
Share this post
Redwood Research blog
What does 10x-ing effective compute get you?
Copy link
Facebook
Email
Notes
More
Comparing risk from internally-deployed AI to insider and outsider threats from humans
And why I think insider threat from AI combines the hard parts of both problems.
Jun 23
•
Buck Shlegeris
6
Share this post
Redwood Research blog
Comparing risk from internally-deployed AI to insider and outsider threats from humans
Copy link
Facebook
Email
Notes
More
2
Making deals with early schemers
...could help us to prevent takeover attempts from more dangerous misaligned AIs created later.
Jun 20
•
Julian Stastny
,
Olli Järviniemi
, and
Buck Shlegeris
13
Share this post
Redwood Research blog
Making deals with early schemers
Copy link
Facebook
Email
Notes
More
2
Prefix cache untrusted monitors: a method to apply after you catch your AI
Training the policy to not do egregious bad actions we detect has downsides and we might be able to do better
Jun 20
•
Ryan Greenblatt
4
Share this post
Redwood Research blog
Prefix cache untrusted monitors: a method to apply after you catch your AI
Copy link
Facebook
Email
Notes
More
1
AI safety techniques leveraging distillation
Distillation is cheap; how can we use it to improve safety?
Jun 19
•
Ryan Greenblatt
6
Share this post
Redwood Research blog
AI safety techniques leveraging distillation
Copy link
Facebook
Email
Notes
More
Share
Copy link
Facebook
Email
Notes
More
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts