Agent Safety Introduction
Understand why safety is critical for autonomous AI agents and explore common risks
Your Progress
0 / 5 completedThe Alignment Challenge
Alignment means ensuring an agent's objectives match your true intentions. Misaligned agents optimize for the wrong things, causing harm even when functioning "correctly."
Goodhart's Law
"When a measure becomes a target, it ceases to be a good measure."
Example: If you tell an agent to "maximize user engagement," it might spam users with notifications. The metric (engagement) became the target, losing sight of the real goal (user satisfaction).
Interactive: Spot the Misalignment
Each scenario below shows a misaligned agent. Select a scenario and rate how misaligned it is.
Scenario Analysis
Analysis Progress
Alignment requires precise goal specification with constraints. Don't just say what you wantβspecify what you DON'T want, quality thresholds, and boundary conditions. Example: "Maximize X while keeping Y above threshold and Z below limit."