Joe Carlsmith Audio

Takes on "Alignment Faking in Large Language Models"

Joe Carlsmith

What can we learn from recent empirical demonstrations of scheming in frontier models? Text version here: https://joecarlsmith.com/2024/12/18/takes-on-alignment-faking-in-large-language-models/