Discussion about this post

User's avatar
Aansh Samyani's avatar

Hey, thanks for the awesome blog, it was a great read. I had a couple of questions though:

1) Can we at any point in time evaluate if the next model will have the potential to automate AI RnD, because if we cannot, it is quite possible for the early schemers to scheme us and lie about their successor’s potential threat to us to get us to negotiate a deal which gives them more than they deserve and makes us stop making progress unnecessarily.

To me it clearly isn’t trivial to find at what point do we start taking a model’s negotiations seriously, any thoughts on this?

2) Also, I am a bit skeptical about this:

“Two misaligned AIs should also be expected to be unlikely to have the same goals if their goals are determined by a random draw from a wide distribution of goals consistent with good performance.”

Could you explain if this is quite obvious to you, if yes why?

Expand full comment
Rick's avatar

Has anyone ever trained an LLM using a corpus of Buddhist material?

Expand full comment

No posts