Home/Agentic AI/Error Handling in Tools/Implementation Patterns

Error Handling in Tools

Build resilient AI agents through robust error handling and graceful degradation

Implementation Patterns: Code That Works

Now let's see how to implement these strategies in production code. These patterns are battle-tested in real-world systems handling millions of requests.

Interactive: Code Pattern Explorer

Basic Retry with Exponential Backoff

Standard retry pattern with increasing delays

async def call_tool_with_retry(
    tool_name: str,
    params: dict,
    max_retries: int = 3
) -> dict:
    """Execute tool with exponential backoff"""
    base_delay = 1.0  # Start with 1 second
    
    for attempt in range(max_retries):
        try:
            result = await execute_tool(tool_name, params)
            logger.info(f"✓ Tool '{tool_name}' succeeded on attempt {attempt + 1}")
            return result
            
        except TransientError as e:
            if attempt == max_retries - 1:
                logger.error(f"✗ All retries exhausted for '{tool_name}'")
                raise
            
            # Exponential backoff with jitter
            delay = base_delay * (2 ** attempt)
            jitter = random.uniform(0, 0.1 * delay)
            wait_time = delay + jitter
            
            logger.warning(
                f"⚠ Attempt {attempt + 1} failed. "
                f"Retrying in {wait_time:.2f}s..."
            )
            await asyncio.sleep(wait_time)
            
        except PermanentError as e:
            # Don't retry permanent errors
            logger.error(f"✗ Permanent error: {e}")
            raise

Logging Best Practices

Proper logging is essential for debugging errors in production

ERRORUnrecoverable failures, exceptions
logger.error("Tool execution failed after all retries")
WARNINGRecoverable issues, fallback used
logger.warning("Using fallback after primary tool failed")
INFONormal operations, successful retries
logger.info("Tool succeeded on retry attempt 2")
DEBUGDetailed trace information
logger.debug("Retry backoff delay: 2.5s")

Production Readiness Checklist

Implement retry logic with exponential backoff
Add circuit breakers for external services
Define fallback strategies for critical tools
Log all errors with context (tool name, params, attempt number)
Set up monitoring and alerts for error rates
Test error scenarios in staging environment
Document error handling behavior for your team