Introduction and summary of "Scheming AIs: Will AIs fake alignment during training in order to get power?" Artwork

Joe Carlsmith Audio

Audio versions of essays by Joe Carlsmith. Philosophy, futurism, and other topics. Text versions at joecarlsmith.com.

Joe Carlsmith Audio

Introduction and summary of "Scheming AIs: Will AIs fake alignment during training in order to get power?"

November 14, 2023 • Joe Carlsmith

This is a recording of the introductory section of my report "Scheming AIs: Will AIs fake alignment during training in order to get power?". This section includes a summary of the full report. The summary covers most of the main points and technical terminology, and I'm hoping that it will provide much of the context necessary to understand individual sections of the report on their own. (Note: the text of the report itself may not be public by the time this episode goes live.)