Discussion about this post

User's avatar
Avi's avatar
Apr 23Edited

I had not considered self-immolation / self-destruction as a strategy but yesterday this happened organically in our ttx wargame. We played the 1-year takeoff scenario which was quite rough. Approximately as bad as it could have been since leadership would deploy models without evals, to stay competitive. We knew we had at least partially failed alignment, but demos were dominated by noise. We ended up self-destructing a lab and cannibalizing half of another lab's corporate safety team. Too little too late in our case. I think it was a good play though; it decreased race pressure and was a big wakeup call. We did not benefit greatly from the extra safety research, but this is because humans were largely irrelevant for safety research by the time the cord was pulled.

Expand full comment
neuro morph's avatar

Much needed thought on the subject. I personally suspect that some of this kind of thinking has been going on inside labs, but only in extra secret whispers. A lot of the necessary discussions for planning around this are the sort of discussions which would get corporate leaders in trouble if they came to light.

Expand full comment

No posts