We should study methods to train away deeply ingrained behaviors in LLMs that are structurally similar to scheming.
Share this post
Two proposed projects on abstract analogies…
Share this post
We should study methods to train away deeply ingrained behaviors in LLMs that are structurally similar to scheming.