Developer. Researcher

Framework for thinking about AI alignment

  • Alignment strategies generally do not scale with capabilities.
  • We should think of strategies based on how much value can be safely extracted with them from AI. I.e. with strategy S how much capability can be wielded safely.

This framework does not solve AI safety, alignment is only a prerequisite for it. Furthermore, this is an unstable alignment strategy since eventually strategies will fail if capabilities keep improving.