Error Handling

Overview

Error Handling strategies help workflows gracefully manage failures, recover automatically, and provide meaningful feedback when issues occur. Proper error handling is critical for production-grade workflows.

⚠️ Critical

Error Handling is not optional:

100% of workflows will encounter errors
Errors must be anticipated and handled
Unhandled errors cause workflow failures
Recovery strategies improve reliability

When to Use Error Handling

✅ Good Use Cases

Scenario

Reason

API Calls

Network/server failures common

External Services

Third-party systems unpredictable

Database Operations

Connection/concurrency issues

File Operations

File not found, permissions errors

User Input

Invalid or unexpected data

Business Logic

Edge cases and exceptions

❌ Anti-Patterns (Don't Do)

❌ Ignore all errors (crash risk)
❌ Generic catch-all (can't debug)
❌ Log and continue silently (data corruption)
❌ Too many retries (hammers systems)
❌ No fallback plan (dead end)

Error Types

Network Errors

Connection and communication failures:

Timeout:
- Request exceeded time limit
- Server not responding
- Network congestion

Connection Error:
- Host unreachable
- DNS resolution failed
- Connection refused

Certificate Error:
- SSL/TLS validation failed
- Self-signed certificate

Handling:

Retry with backoff
Use alternate endpoint
Fallback to cached data

Server Errors

5xx HTTP errors:

500 Internal Server Error:
- Server-side exception
- Database connection failure
- Unexpected state

503 Service Unavailable:
- Server overloaded
- Maintenance mode
- Temporary outage

502 Bad Gateway:
- Upstream server error
- Load balancer issue

Handling:

Retry with exponential backoff
Alert operations team
Escalate after max retries

Client Errors

4xx HTTP errors:

400 Bad Request:
- Invalid parameters
- Malformed data
- Version mismatch

401 Unauthorized:
- Authentication failed
- Invalid credentials
- Expired token

403 Forbidden:
- Access denied
- Insufficient permissions
- Resource protected

404 Not Found:
- Resource doesn't exist
- Wrong endpoint
- Data deleted

Handling:

Don't retry (won't help)
Log error for debugging
Return meaningful error
Consider fallback

Application Errors

Business logic failures:

Validation Error:
- Data doesn't meet requirements
- Business rules violated
- Invalid state

Concurrency Error:
- Race condition
- Resource locked
- Conflicting updates

State Error:
- Resource in wrong state
- Operation not allowed
- Prerequisites not met

Handling:

Validate before operation
Implement locking/versioning
Check state before action

Error Handling Strategies

Strategy 1: Try-Catch-Log

Basic error capture:

TRY:
    [API Call]
    Output: response
    
CATCH error:
    [Log Error]:
      - error_message
      - error_code
      - timestamp
    
    [Continue/Fail]

Use when:

Error is not critical
Can continue workflow
Just need to document

Strategy 2: Retry with Backoff

Automatic retry logic:

Try 1: [API Call]
  If fails:
    Wait 2 seconds
    
Try 2: [API Call]
  If fails:
    Wait 4 seconds
    
Try 3: [API Call]
  If fails:
    Wait 8 seconds
    
Try 4: [API Call]
  If succeeds:
    Continue
  If fails:
    Escalate error

Use when:

Error is transient
Temporary retry likely to succeed
Don't want to fail immediately

Strategy 3: Fallback/Alternative

Use backup option:

TRY:
    [Primary API Endpoint]
    
CATCH error:
    [Fallback to Secondary Endpoint]
    
    IF secondary also fails:
        [Use Cached Data]
        
        IF no cache:
            [Graceful Degradation]

Use when:

Multiple options available
Cached data acceptable
Service degradation acceptable

Strategy 4: Fail Fast

Immediate termination:

Validation:
  IF required_data IS NULL THEN
    [STOP with error]
    
  IF precondition not met THEN
    [STOP with error]

Then:
  [Only execute if all valid]

Use when:

Cannot proceed safely
Data integrity critical
Continuing would cause data corruption

Strategy 5: Circuit Breaker

Prevent cascading failures:

State 1: CLOSED (normal operation)
  - Requests flow normally
  - Count failures
  
  If failures > threshold:
    -> Go to OPEN

State 2: OPEN (failures detected)
  - Reject new requests immediately
  - No attempt to call service
  
  After timeout:
    -> Go to HALF_OPEN

State 3: HALF_OPEN (test recovery)
  - Allow limited requests
  - Test if service recovered
  
  If successful:
    -> Go to CLOSED
  If fails:
    -> Go back to OPEN

Use when:

External service unstable
Want to stop hammering failing service
Fail fast instead of waiting

Practical Examples

Example 1: Resilient API Integration

Workflow: Get User with Full Error Handling

Step 1: Try Primary API
  TRY:
    [API Call: GET /api/v1/users/{id}]
    Output: user_data
    
  CATCH timeout_error:
    [Retry with exponential backoff (max 3 attempts)]
    
    IF all_retries_failed:
      [Try Fallback Strategy]
      
  CATCH rate_limit_error (429):
    [Wait: 60 seconds]
    [Retry: 1 attempt]
    
  CATCH not_found_error (404):
    [Don't retry]
    [Log: User not found]
    [Return: Empty result]
    
  CATCH server_error (500):
    [Try secondary endpoint]
    
    IF secondary fails:
      [Use cached data if available]
      
      IF no cache:
        [Send alert: Service down]
        [STOP workflow]

Step 2: Process Result
  [Continue with user_data]
  
Step 3: Log Success
  [Record: User retrieved successfully]

Example 2: Cascading Error Handling

Workflow: Multi-Step Order Processing

Step 1: Create Order
  TRY:
    [API: POST /orders]
    Output: order_id
    
  CATCH validation_error:
    [Log validation errors]
    [Stop: FAIL - Invalid order]
    
  CATCH conflict_error:
    [Order already exists for customer]
    [Retrieve existing order]
    Continue with existing

Step 2: Process Payment
  TRY:
    [API: POST /payments]
    
  CATCH insufficient_funds:
    [Notify customer]
    [Stop: FAIL - Payment declined]
    
  CATCH service_unavailable:
    [Queue for retry later]
    [Mark order: Payment Pending]
    Continue

Step 3: Send Confirmation
  TRY:
    [Send Email]
    
  CATCH email_error:
    [Log error]
    [Store in retry queue]
    [Continue - order complete even if email fails]

Step 4: Summary
  [Report results]

Example 3: Data Validation Error Handling

Workflow: Validate and Process User Data

Step 1: Validate Input
  IF user.email IS NULL OR EMPTY:
    [Error: Email required]
    [Stop: FAIL]
    
  IF user.email NOT_CONTAINS "@":
    [Error: Invalid email format]
    [Stop: FAIL]
    
  IF user.age < 0 OR age > 150:
    [Warning: Age out of normal range]
    [Ask for confirmation or use fallback]

Step 2: Validate Business Rules
  IF user.email already_exists:
    TRY:
      [Send password reset]
    CATCH error:
      [Log: Could not send password reset]
      [Continue anyway]

Step 3: Create User
  TRY:
    [API: Create user]
    
  CATCH duplicate_error:
    [Retrieve existing user]
    
  CATCH validation_error:
    [Log validation details]
    [Stop: FAIL]

Step 4: Success
  [Process completed]

Example 4: Database Error Handling

Workflow: Database Operations with Error Handling

Step 1: Connect to Database
  TRY:
    [DB: Connect]
    
  CATCH connection_error:
    [Try failover connection]
    
    IF failover fails:
      [Alert DBA]
      [Stop: Database unavailable]

Step 2: Query with Timeout
  TRY:
    [DB: Query with 30s timeout]
    Output: results
    
  CATCH timeout_error:
    [Simplify query - select fewer columns]
    [Query again]
    
  CATCH deadlock_error:
    [Retry with backoff]
    [Max 3 attempts]

Step 3: Update Data
  TRY:
    [DB: Update with version check]
    
  CATCH version_mismatch:
    [Data changed since read]
    [Retry: Read fresh data then update]
    
  CATCH integrity_error:
    [Data violates constraints]
    [Log violation]
    [Stop: Data integrity issue]

Step 4: Commit
  TRY:
    [DB: Commit transaction]
    
  CATCH commit_error:
    [Rollback all changes]
    [Log error details]
    [Notify operator]

Example 5: Comprehensive Workflow Error Handling

Workflow: Complete Order with Full Error Handling

Setup Phase:
  TRY:
    [Initialize workflow]
    [Setup test environment]
    
  CATCH setup_error:
    [Log setup failure]
    [Stop: Cannot proceed without setup]

Main Processing Phase:
  TRY:
    Step 1: [Validate input]
    Step 2: [Create order]
    Step 3: [Process payment]
    Step 4: [Send confirmation]
    
  CATCH expected_error:
    [Handle gracefully]
    [Record in log]
    [Continue if possible]
    
  CATCH unexpected_error:
    [Emergency handler]
    [Send alert]
    [Rollback changes if needed]

Cleanup Phase:
  FINALLY:
    [Always run cleanup]
    [Close connections]
    [Release resources]
    [Log final status]
    
  CATCH cleanup_error:
    [Log cleanup failure]
    [Alert team]

Recovery Strategies

Immediate Recovery

Issue → Immediate Fix → Continue

Example:
Connection timeout → Retry immediately → Usually succeeds

Delayed Recovery

Issue → Wait → Retry → Continue

Example:
Rate limited (429) → Wait 60 seconds → Retry → Usually succeeds

Fallback Recovery

Issue → Primary fails → Try alternative → Continue

Example:
Primary API down → Use cached data → Continue with degraded service

Escalation Recovery

Issue → Cannot recover → Alert → Wait for manual fix → Retry

Example:
Database connection failure → Alert DBA → Manual intervention → Retry

Graceful Degradation

Issue → Cannot fix → Reduce scope → Continue

Example:
Slow API → Get partial data → Process what available → Continue

Best Practices

✅ Do

Anticipate failures - Consider what can go wrong
Log everything - Include context, not just error message
Retry transient errors - Timeout, connection reset, 503
Don't retry permanent errors - 404, 401, 403
Use appropriate backoff - Exponential for overload, fixed for timing
Provide context - What were you doing when error occurred?
Test error paths - Verify recovery works
Monitor error rates - Track if errors increasing
Alert on critical - Notify team of serious issues
Document decisions - Why did you choose this handling?

❌ Don't

Ignore errors - They won't go away
Generic catch-all - Makes debugging impossible
Retry everything - Won't help with 404
Infinite retries - Will timeout waiting
Log just "Error" - Include full context
Swallow exceptions - At least log them
Assume success - Always verify results
No fallback plan - What if retry fails?
Pollute logs - Too much detail is as bad as too little
Fail silently - Someone needs to know

Error Logging Best Practices

Good Error Log

[2024-01-15 14:30:45.123] ERROR [UserService]
Error creating user account
Context:
  - User Email: [email protected]
  - Endpoint: POST /api/v1/users
  - Environment: production
Error Type: ValidationError
Error Message: Email already exists
Error Code: 400
Stack Trace: [full trace]
Request ID: req_12345
User ID: usr_789 (if available)
Retry Attempt: 2/3
Next Action: Will retry after 10 seconds

Poor Error Log

[14:30:45] ERROR Error
Error creating user

Monitoring and Alerts

Key Metrics

Error Rate:
- Errors per minute
- Percentage of total requests
- Trend (increasing/decreasing?)

Error Types:
- Most common errors
- New error types
- Error frequency

Recovery Success:
- Successful retries
- Successful fallbacks
- Total recovery rate

Alert Thresholds

Critical (page on-call):
- Error rate > 10%
- Database connection failures
- Payment processing down

High (email alert):
- Error rate > 5%
- Repeated API timeout
- Credential failures

Medium (log entry):
- Error rate > 1%
- Individual step failures
- Resource warnings

Troubleshooting

Issue: Error handling logic is wrong

Symptoms:

Errors not caught
Wrong handler executes
Recovery doesn't work

Solutions:

Test error paths explicitly
Verify error type matching
Check condition logic
Add logging before handler

Issue: Retry never succeeds

Symptoms:

Keeps retrying but always fails
No recovery

Causes:

Permanent error (404, 401)
Not transient issue
Retry delay too short

Solutions:

Check error type (don't retry permanent)
Increase retry delay
Increase max attempts
Consider fallback instead

Issue: Logs flooded with errors

Symptoms:

Too many log entries
Can't find real issues
Performance impact

Causes:

Logging too much detail
Transient errors causing cascade
No rate limiting

Solutions:

Filter less important errors
Fix underlying cause
Batch similar errors
Add rate limiting

Conditional Execution - Error-based conditions
Retry Action - Automatic retry
Stop Action - Stop on errors
Best Practices - Error handling standards
Execution Flow - Error propagation

Summary

Error Handling is essential for production workflows
Anticipate failures - Don't assume success
Choose strategy - Retry, fallback, or escalate
Log meaningfully - Include full context
Test recovery - Verify error paths work
Monitor rates - Track error trends
Alert appropriately - Notify on critical issues

Next: Learn about Best Practices for professional workflows.

PreviousCustom Variables NextBest Practices

Last updated 1 month ago

hashtagOverview

hashtagWhen to Use Error Handling

hashtag✅ Good Use Cases

hashtag❌ Anti-Patterns (Don't Do)

hashtagError Types

hashtagNetwork Errors

hashtagServer Errors

hashtagClient Errors

hashtagApplication Errors

hashtagError Handling Strategies

hashtagStrategy 1: Try-Catch-Log

hashtagStrategy 2: Retry with Backoff

hashtagStrategy 3: Fallback/Alternative

hashtagStrategy 4: Fail Fast

hashtagStrategy 5: Circuit Breaker

hashtagPractical Examples

hashtagExample 1: Resilient API Integration

hashtagExample 2: Cascading Error Handling

hashtagExample 3: Data Validation Error Handling

hashtagExample 4: Database Error Handling

hashtagExample 5: Comprehensive Workflow Error Handling

hashtagRecovery Strategies

hashtagImmediate Recovery

hashtagDelayed Recovery

hashtagFallback Recovery

hashtagEscalation Recovery

hashtagGraceful Degradation

hashtagBest Practices

hashtag✅ Do

hashtag❌ Don't

hashtagError Logging Best Practices

hashtagGood Error Log

hashtagPoor Error Log

hashtagMonitoring and Alerts

hashtagKey Metrics

hashtagAlert Thresholds

hashtagTroubleshooting

hashtagIssue: Error handling logic is wrong

hashtagIssue: Retry never succeeds

hashtagIssue: Logs flooded with errors

hashtagRelated Topics

hashtagSummary

Overview

When to Use Error Handling

✅ Good Use Cases

❌ Anti-Patterns (Don't Do)

Error Types

Network Errors

Server Errors

Client Errors

Application Errors

Error Handling Strategies

Strategy 1: Try-Catch-Log

Strategy 2: Retry with Backoff

Strategy 3: Fallback/Alternative

Strategy 4: Fail Fast

Strategy 5: Circuit Breaker

Practical Examples

Example 1: Resilient API Integration

Example 2: Cascading Error Handling

Example 3: Data Validation Error Handling

Example 4: Database Error Handling

Example 5: Comprehensive Workflow Error Handling

Recovery Strategies

Immediate Recovery

Delayed Recovery

Fallback Recovery

Escalation Recovery

Graceful Degradation

Best Practices

✅ Do

❌ Don't

Error Logging Best Practices

Good Error Log

Poor Error Log

Monitoring and Alerts

Key Metrics

Alert Thresholds

Troubleshooting

Issue: Error handling logic is wrong

Issue: Retry never succeeds

Issue: Logs flooded with errors

Related Topics

Summary