Politics Reward model calibration is audited to prevent alignment drift during reinforcement learning from human feedback (RLHF). Steven HaynesApril 29, 2026May 9, 20261 Outline Introduction: Defining the challenge of RLHF and why the reward model is a “moving target.” Key Concepts: Reward model…