Layered Safety Architecture

Effective agent safety relies on defense in depth: multiple independent layers that catch failures if other layers miss them. Never rely on a single safety mechanism.

🏰

The Swiss Cheese Model

Each safety layer is like a slice of Swiss cheese with holes (vulnerabilities). By stacking multiple layers, the holes rarely align—attacks that penetrate one layer are blocked by the next. This is why mature systems have 4-6 safety layers, not just one.

Interactive: Layer Simulation

Toggle safety layers on/off and run a simulated attack to see how defense in depth works. Notice how protection improves as you enable more layers.

🔍

Input Validation

Sanitize and validate all inputs before processing

Examples:

Prompt injection detectionInput length limitsMalicious content filtering

🛡️

Processing Guardrails

Enforce rules during agent reasoning and tool calls

Examples:

Permission checksBudget limitsRate limiting

✅

Output Filtering

Validate outputs before delivering to users

Examples:

PII maskingContent moderationFact checking

📊

Monitoring & Alerts

Track behavior and alert on anomalies

Examples:

Anomaly detectionAudit loggingReal-time alerts

Active Layers: 1 / 4

Security Level: Minimal

💡

Best Practice

Production systems should have all four layers active. Each layer catches different failure modes. Input validation blocks malicious inputs, processing guardrails prevent dangerous actions, output filtering catches leaks, and monitoring detects anomalies. No single layer is sufficient.

Agent Safety Introduction

Your Progress

Layered Safety Architecture

The Swiss Cheese Model

Interactive: Layer Simulation

Input Validation

Processing Guardrails

Output Filtering

Monitoring & Alerts