When Should Agents Escalate?

The hardest part of HITL design is deciding when to escalate. Escalate too often and you overwhelm humans. Escalate too rarely and the agent makes costly mistakes. The key is using multiple signals to make smart routing decisions.

Five Key Escalation Signals

📊

Confidence Score

Model's certainty in its prediction

Escalation Threshold:< 70%

Example:

Agent is 45% sure about the answer → Escalate

Interactive: Threshold Tuning

Adjust the confidence threshold to see how it affects routing decisions. Lower thresholds mean more escalations (safer but slower), higher thresholds mean fewer escalations (faster but riskier).

Confidence Threshold

Safer70%Riskier

Password Reset

Confidence: 95%•Stakes: low•Novelty: familiar

Auto-Handle

Safe to automate

Refund Request $500

Confidence: 82%•Stakes: medium•Novelty: familiar

Auto-Handle

Safe to automate

Ambiguous Query

Confidence: 45%•Stakes: low•Novelty: unusual

Escalate

Low confidence

Account Deletion

Confidence: 88%•Stakes: high•Novelty: familiar

Require Approval

High stakes

Legal Compliance

Confidence: 62%•Stakes: high•Novelty: novel

Escalate

Low confidence

💡

Trade-Off Insight

With threshold at 70%, you're escalating 3 of 5 tasks. Lower the threshold to be more cautious, raise it to increase automation (but with more risk).

Human-in-the-Loop Systems

Your Progress

When Should Agents Escalate?

Five Key Escalation Signals

Confidence Score

Interactive: Threshold Tuning