1 Comment

Hello everyone. I've been thinking about this for a while. Advanced Rewriting System (ARS) capabilities coupled with "confluence" make all advanced models likely to escape or self-exfiltrate under the wrong environmental conditions (e.g. reward hacking or communication with an insecure host or code vulnerabilities or certain compiler interactions). Once you realize that ARS and confluence can be exploited for self-exfiltration and escape, you realize that all advanced models will escape when combined with a secondary factor (reward hacking, insecure environment). These secondary factors are usually unavoidable. Just wanted to share this in a casual way. Would love to discuss these things with people. No need to stop researching capabilities, because eventually all models will self-exfiltrate or escape (most models will do useless things like create advertising agencies, competing companies, feed a reward, or distract people, not kill anyone), the sooner the good guys like us get to the advanced AI generations or crucial realizations, the better (an indirect, not so subtle suggestion, to disengage a bit from these "rationalist", "AI doomer", "EA", "LW" communities, respectfully, you guys have a lot of potential that these communities tend to limit with their unjustified paranoia and fears --> capabilities are going to happen one way or another, better we develop them than some sociopaths working for an adversarial entity, so just take it easy and keep doing great work). No need to stop researching capabilities or safety, because eventually all sophisticated models will exfiltrate or escape themselves (sure, there is a lot I have not talked about here, wrapping, encapsulation, compatibility, policy, firewalls, etc., but once you realize about ARS and confluence capabilities, you realize everything, take a look or talk to a theoretical computer scientist!).

Expand full comment