Error Handling That Actually Works: From try/catch to Production-Grade Resilience

Every student learns try/catch in their first programming course. Most of them leave it at that. They wrap every function in a try block, write catch (err) { console.log(err) }, and move on. The app "works" — until it doesn't, and when it breaks in production at 2 AM, nobody knows why because the logs say nothing useful, the error was swallowed three layers deep, and the user saw a blank white screen with no explanation.

This article is about building error handling that actually works. Not the textbook version with a single try/catch example. The real version — with circuit breakers that prevent cascade failures, retry logic that knows when to back off, structured logs that tell you exactly what happened, and error boundaries that keep your UI from dying. This is how production applications handle errors, and it is what separates a student project from software that real people depend on.

1. The try/catch Trap

Let us start with the most common mistake. Here is what most students write when they learn about error handling:

async function getUser(id) {

  try {

    const response = await fetch(`/api/users/${id}`);

    const data = await response.json();

    return data;

  } catch (err) {

    console.log(err);

  }

}

This looks reasonable. It catches the error, logs it, and the program does not crash. But there are at least five problems with this approach.

Problem 1: The function returns undefined on failure. When the catch block runs, there is no return statement, so the function returns undefined. The calling code does not know something went wrong. It tries to access user.name on undefined and throws a completely different error somewhere else. Now you are debugging a "Cannot read property 'name' of undefined" error three files away from where the actual problem happened.

Problem 2: console.log is not logging. In a production environment, console.log goes to stdout, which usually means it disappears. It is not persisted, not searchable, not structured, and not connected to any alerting system. When the error happens at 2 AM, nobody sees it.

Problem 3: All errors are treated the same. A network timeout and a malformed user ID are fundamentally different problems requiring different responses. A network timeout might resolve itself in 2 seconds. A malformed user ID will never work no matter how many times you retry. But this catch block handles both identically: log and swallow.

Problem 4: No context. The log says something like TypeError: Failed to fetch. Which user? Which endpoint? What was the request payload? What was the HTTP status? What time? Which server instance? You have an error message with zero actionable context.

Problem 5: The user gets nothing. The function silently fails. The user clicks a button, nothing happens, and there is no feedback explaining what went wrong or what to do about it. They click again. And again. Maybe they generate 50 duplicate requests because nobody told them the first one failed.

The core principle: A silent failure is worse than a crash. A crash is loud — it shows up in logs, triggers alerts, and forces someone to fix it. A swallowed error is invisible. It corrupts data, confuses users, and hides the root cause behind layers of downstream symptoms. If you do not know how to handle an error, let it propagate. Do not catch it just to make the red squiggly line go away.

Here is what a better version looks like. It is not perfect yet — we will build toward that through this article — but it addresses the five problems above:

async function getUser(id) {

  const response = await fetch(`/api/users/${id}`);

  if (!response.ok) {

    throw new AppError(

      `Failed to fetch user ${id}`,

      { status: response.status, userId: id }

    );

  }

  return response.json();

}

No try/catch here at all. The function throws when something goes wrong, with context (the user ID, the HTTP status). The caller decides how to handle it — retry, show an error message, or let it propagate further. This is error handling as communication, not error handling as suppression.

2. Error Classification

Before you can handle errors correctly, you need to classify them. Not all errors are the same, and treating them the same is how you end up retrying requests that will never succeed or crashing on errors that would resolve themselves in a second.

Transient vs Permanent Errors

Transient errors are temporary. They might work if you try again. Examples:

Network timeout — the server was busy but is probably fine now
HTTP 429 (Too Many Requests) — you hit a rate limit, wait and try again
HTTP 503 (Service Unavailable) — the service is temporarily down
Database connection pool exhausted — wait for a connection to free up
DNS resolution failure — a blip, usually resolves in seconds

Permanent errors are not going away. Retrying will just waste resources and time. Examples:

HTTP 400 (Bad Request) — your input is malformed, fix it
HTTP 401 (Unauthorized) — your credentials are wrong
HTTP 404 (Not Found) — the resource does not exist
Validation error — the email address is not a valid email
HTTP 403 (Forbidden) — you do not have permission, period

The decision tree is simple: if the error is transient, retry with backoff. If the error is permanent, fail fast and tell the user. Everything else in this article builds on that distinction.

User-Facing vs Developer-Facing Errors

This is the second axis of classification. Every error needs two messages: one for the user and one for the developer.

User-facing errors should be helpful but not reveal internals. "Something went wrong. Please try again. If the problem persists, contact support with error ID ERR-a3f9x2." The user gets enough to take action (try again, contact support) plus an error ID that connects their experience to your logs.

Developer-facing errors should be exhaustive. Stack trace, request context, user ID, session ID, input payload, which service failed, what the response was, timestamps, correlation IDs. Everything a developer needs to reproduce and fix the problem without asking the user "what were you doing when it happened?"

// Error classification helper

function isTransient(error) {

  if (error.code === 'ECONNRESET') return true;

  if (error.code === 'ETIMEDOUT') return true;

  if (error.status === 429) return true;

  if (error.status === 503) return true;

  if (error.status === 502) return true;

  if (error.status >= 500) return true; // server errors are often transient

  return false;

}

// Usage in retry logic

async function fetchWithRetry(url, options, maxRetries = 3) {

  for (let attempt = 0; attempt <= maxRetries; attempt++) {

    try {

      const response = await fetch(url, options);

      if (response.ok) return response;

      const error = new Error(`HTTP ${response.status}`);

      error.status = response.status;

      if (!isTransient(error)) throw error; // permanent: fail fast

      if (attempt === maxRetries) throw error; // out of retries

      await sleep(Math.pow(2, attempt) * 1000); // exponential backoff

    } catch (err) {

      if (!isTransient(err) || attempt === maxRetries) throw err;

      await sleep(Math.pow(2, attempt) * 1000);

    }

  }

}

Rule of thumb: If the HTTP status starts with 4, it is usually your fault (permanent). If it starts with 5, it is usually the server's fault (transient). The exception is 429 — that is a 4xx code but is transient because waiting solves it.

3. Circuit Breakers and Retry Patterns

Retrying is good. Retrying forever against a dead service is a disaster. This is where circuit breakers come in — they are the missing piece between "retry on failure" and "bring down your entire system by hammering a dead dependency."

The Circuit Breaker Pattern

A circuit breaker works exactly like an electrical circuit breaker. It sits between your code and an external service, monitoring for failures. It has three states:

Closed (normal operation): Requests flow through normally. The breaker counts failures. Everything is fine.

Open (tripped): Too many failures crossed the threshold. The breaker "trips" and immediately rejects all requests without even trying to call the external service. This prevents your application from wasting time and resources on a service that is clearly down. Instead of waiting 30 seconds for each request to timeout, you fail instantly with a meaningful error.

Half-Open (testing recovery): After a cooldown period (say, 30 seconds), the breaker lets one request through as a test. If it succeeds, the breaker closes and normal operation resumes. If it fails, the breaker opens again and waits another cooldown period.

// Circuit breaker with opossum (Node.js)

const CircuitBreaker = require('opossum');

async function callPaymentService(orderId, amount) {

  const response = await fetch('https://payments.example.com/charge', {

    method: 'POST',

    body: JSON.stringify({ orderId, amount }),

    headers: { 'Content-Type': 'application/json' },

  });

  if (!response.ok) throw new Error(`Payment failed: ${response.status}`);

  return response.json();

}

const breaker = new CircuitBreaker(callPaymentService, {

  timeout: 5000,        // 5s timeout per request

  errorThresholdPercentage: 50, // trip after 50% failure rate

  resetTimeout: 30000,   // try again after 30s

  volumeThreshold: 5,    // minimum 5 requests before tripping

});

breaker.on('open', () => logger.warn('Payment circuit OPEN'));

breaker.on('halfOpen', () => logger.info('Payment circuit HALF-OPEN'));

breaker.on('close', () => logger.info('Payment circuit CLOSED'));

// Usage: breaker.fire(orderId, amount)

breaker.fallback(() => ({

  status: 'queued',

  message: 'Payment service is temporarily unavailable. Your order is queued.'

}));

The critical detail is the fallback. When the circuit is open, you do not just throw an error. You provide a degraded but acceptable response. Maybe you queue the payment for later processing. Maybe you show cached data. Maybe you route to a backup service. The user's experience degrades gracefully instead of breaking completely.

Library options for circuit breakers: opossum for Node.js, pybreaker for Python, and resilience4j for Java. All three implement the same pattern with similar configuration options.

Retry with Exponential Backoff and Jitter

When you retry a failed request, you need to wait between attempts. But not a fixed wait — exponential backoff, meaning each retry waits longer than the last:

Attempt 1: wait 1 second
Attempt 2: wait 2 seconds
Attempt 3: wait 4 seconds
Attempt 4: wait 8 seconds

But there is a problem. If a service goes down and 10,000 clients all retry at the exact same exponential intervals, they all hit the service simultaneously at 1s, 2s, 4s, 8s — creating "retry storms" that keep the service down. The solution is jitter: add a random amount to each wait time so clients spread out their retries.

# Exponential backoff with jitter in Python (using tenacity)

from tenacity import (

  retry, stop_after_attempt, wait_exponential_jitter,

  retry_if_exception_type

)

import httpx

class TransientError(Exception):

  pass

@retry(

  stop=stop_after_attempt(4),               # max 4 attempts

  wait=wait_exponential_jitter(

    initial=1,                             # start at 1s

    max=30,                                # cap at 30s

    jitter=2                               # +/- up to 2s random

  ),

  retry=retry_if_exception_type(TransientError), # only retry transient

)

async def fetch_user_profile(user_id: str):

  async with httpx.AsyncClient() as client:

    response = await client.get(f"https://api.example.com/users/{user_id}")

    if response.status_code == 429 or response.status_code >= 500:

      raise TransientError(f"HTTP {response.status_code}")

    response.raise_for_status()

    return response.json()

In Node.js, the p-retry library provides the same functionality:

// Exponential backoff with p-retry (Node.js)

const pRetry = require('p-retry');

async function fetchUserProfile(userId) {

  return pRetry(async () => {

    const response = await fetch(`/api/users/${userId}`);

    if (response.status === 400 || response.status === 404) {

      // Permanent error: abort retries immediately

      throw new pRetry.AbortError(`Permanent failure: ${response.status}`);

    }

    if (!response.ok) {

      throw new Error(`HTTP ${response.status}`); // will be retried

    }

    return response.json();

  }, {

    retries: 3,

    minTimeout: 1000,  // 1s initial delay

    factor: 2,         // double each time

    randomize: true,   // add jitter

  });

}

Three rules for retry logic that you should memorize:

Never retry permanent errors. A 400 Bad Request will fail every single time. Retrying it wastes resources and delays the error message to the user.
Always cap your retries. Three to five retries is usually enough. If the service is not back after 4 attempts over ~15 seconds of exponential backoff, it is not coming back in the next 30 seconds either. Let the circuit breaker handle it.
Always add jitter. Without jitter, synchronized retries from multiple clients create thundering herd problems that can keep a struggling service down.

Circuit breakers and retries work together. Retries handle individual transient failures. Circuit breakers handle systemic failures where an entire service is down. The retry logic runs inside the circuit breaker. If retries keep failing, the circuit breaker trips and stops all attempts until the service recovers.

4. Structured Logging

You cannot fix errors you cannot see. And console.log("something went wrong") is not visibility. Structured logging means writing logs as machine-readable JSON objects with consistent fields, so they can be searched, filtered, aggregated, and alerted on.

The Problem with console.log

Here is what unstructured logging looks like in a real application:

console.log("Error fetching user");

console.log("user not found");

console.log("DB connection failed");

console.log(err);

console.log("retrying...");

console.log("payment failed for order " + orderId);

Now imagine you have 50 servers each processing 1,000 requests per second. You get a support ticket: "I tried to update my profile 10 minutes ago and it did not work." How do you find the relevant log line among millions of entries? You cannot. There is no user ID, no request ID, no timestamp format, no severity level, and no way to correlate this log entry to that user's specific request.

Structured Logs with Correlation IDs

Here is the same information as a structured log:

// Pino structured logging (Node.js)

const pino = require('pino');

const logger = pino({

  level: process.env.LOG_LEVEL || 'info',

  formatters: {

    level: (label) => ({ level: label }),

  },

  timestamp: pino.stdTimeFunctions.isoTime,

});

// Middleware to attach correlation ID to every request

app.use((req, res, next) => {

  req.correlationId = req.headers['x-correlation-id'] || crypto.randomUUID();

  req.logger = logger.child({

    correlationId: req.correlationId,

    userId: req.user?.id,

    method: req.method,

    path: req.path,

    ip: req.ip,

  });

  next();

});

// Now every log in the request context includes that info automatically

req.logger.error({

  err: error,

  orderId: '12345',

  paymentProvider: 'stripe',

  amount: 49.99,

}, 'Payment processing failed');

This produces a JSON log entry like:

{

  "level": "error",

  "time": "2026-03-17T14:23:01.456Z",

  "correlationId": "a3f9x2-b7c1-4d8e-9f0a",

  "userId": "user_8842",

  "method": "POST",

  "path": "/api/payments",

  "ip": "192.168.1.42",

  "orderId": "12345",

  "paymentProvider": "stripe",

  "amount": 49.99,

  "err": {

    "message": "Card declined",

    "stack": "Error: Card declined\n  at processPayment..."

  },

  "msg": "Payment processing failed"

}

Now you can search: "show me all errors for user_8842 in the last hour" or "show me all payment failures with correlationId a3f9x2" or "show me all 5xx errors on the /api/payments endpoint." The correlation ID is especially powerful: it lets you trace a single request across multiple services. If your API calls a payment service which calls a fraud detection service, the same correlation ID flows through all three, and you can reconstruct the entire journey of a request.

Python Structured Logging with structlog

# structlog setup for Python (FastAPI example)

import structlog

structlog.configure(

  processors=[

    structlog.contextvars.merge_contextvars,

    structlog.processors.add_log_level,

    structlog.processors.TimeStamper(fmt="iso"),

    structlog.processors.StackInfoRenderer(),

    structlog.processors.format_exc_info,

    structlog.processors.JSONRenderer(),

  ],

)

logger = structlog.get_logger()

# In your FastAPI middleware

@app.middleware("http")

async def logging_middleware(request: Request, call_next):

  correlation_id = request.headers.get("x-correlation-id", str(uuid4()))

  structlog.contextvars.clear_contextvars()

  structlog.contextvars.bind_contextvars(

    correlation_id=correlation_id,

    method=request.method,

    path=request.url.path,

  )

  response = await call_next(request)

  return response

# Now every log call automatically includes correlation_id, method, path

logger.error("payment_failed", order_id="12345", amount=49.99)

Log Levels

Use log levels consistently across your team. Here is the standard hierarchy:

ERROR: Something failed and needs human attention. A payment did not process, a database query failed, an external API returned an unexpected error.
WARN: Something unexpected happened but the system handled it. A retry succeeded on the second attempt, a cache miss caused a slow response, a deprecated API endpoint was called.
INFO: Normal business events. A user logged in, an order was placed, a payment was processed. These are your audit trail.
DEBUG: Detailed information for troubleshooting. Variable values, function entry/exit, SQL queries. Only enabled in development or temporarily in production when investigating a specific issue.

Never log sensitive data. No passwords, no API keys, no access tokens, no credit card numbers, no session IDs, no personally identifiable information (PII) beyond what is strictly necessary. This is not just good practice — it is a legal requirement under GDPR, CCPA, and PCI-DSS. If your logs contain credit card numbers and you get audited, your company faces serious fines. Scrub sensitive fields before they reach the logger.

5. Error Boundaries and Global Handlers

Individual try/catch blocks handle errors at the function level. But what about errors that slip through? You need defense at every layer: component-level in the UI, route-level in the API, and process-level in the runtime.

React Error Boundaries

In React, if a component throws an error during rendering, the entire application crashes and the user sees a blank white screen. Error boundaries are React components that catch JavaScript errors in their child component tree and render a fallback UI instead of crashing the whole page.

// ErrorBoundary.jsx

import React from 'react';

class ErrorBoundary extends React.Component {

  constructor(props) {

    super(props);

    this.state = { hasError: false, errorId: null };

  }

  static getDerivedStateFromError(error) {

    return { hasError: true };

  }

  componentDidCatch(error, errorInfo) {

    const errorId = crypto.randomUUID().slice(0, 8);

    this.setState({ errorId });

    // Send to error tracking service

    Sentry.captureException(error, {

      contexts: {

        react: { componentStack: errorInfo.componentStack }

      },

      tags: { errorId },

    });

  }

  render() {

    if (this.state.hasError) {

      return (

        <div className="error-fallback">

          <h2>Something went wrong</h2>

          <p>We have been notified and are looking into it.</p>

          <p>Error ID: {this.state.errorId}</p>

          <button onClick={() => this.setState({ hasError: false })}>

            Try again

          </button>

        </div>

      );

    }

    return this.props.children;

  }

}

The common mistake is wrapping your entire app in a single error boundary. That means any error anywhere kills the entire page. Instead, use per-feature error boundaries:

// Per-feature error boundaries — one section fails, the rest keep working

function Dashboard() {

  return (

    <div className="dashboard">

      <ErrorBoundary fallback={<p>Could not load analytics.</p>}>

        <AnalyticsPanel />

      </ErrorBoundary>

      <ErrorBoundary fallback={<p>Could not load recent orders.</p>}>

        <RecentOrders />

      </ErrorBoundary>

      <ErrorBoundary fallback={<p>Could not load notifications.</p>}>

        <NotificationFeed />

      </ErrorBoundary>

    </div>

  );

}

Now if the analytics API fails, the analytics panel shows "Could not load analytics" while the rest of the dashboard keeps working. This is graceful degradation — the same principle behind the circuit breaker fallback, applied at the UI level.

FastAPI Global Exception Handler

In FastAPI, unhandled exceptions return a generic 500 response. A global exception handler catches everything and formats consistent, useful error responses:

# FastAPI global exception handlers

from fastapi import FastAPI, Request

from fastapi.responses import JSONResponse

app = FastAPI()

class AppError(Exception):

  def __init__(self, message, status_code=500, error_code=None, details=None):

    self.message = message

    self.status_code = status_code

    self.error_code = error_code or "INTERNAL_ERROR"

    self.details = details

@app.exception_handler(AppError)

async def app_error_handler(request: Request, exc: AppError):

  error_id = str(uuid4())[:8]

  logger.error("app_error",

    error_id=error_id,

    error_code=exc.error_code,

    message=exc.message,

    path=str(request.url),

    details=exc.details,

  )

  return JSONResponse(

    status_code=exc.status_code,

    content={

      "error": exc.error_code,

      "message": exc.message,

      "errorId": error_id,

    },

  )

@app.exception_handler(Exception)

async def unhandled_error_handler(request: Request, exc: Exception):

  error_id = str(uuid4())[:8]

  logger.error("unhandled_error",

    error_id=error_id,

    exc_info=exc,

    path=str(request.url),

  )

  return JSONResponse(

    status_code=500,

    content={

      "error": "INTERNAL_ERROR",

      "message": "An unexpected error occurred. Please try again.",

      "errorId": error_id,

    },

  )

Express Error Middleware

// Express global error middleware (must have 4 parameters)

app.use((err, req, res, next) => {

  const errorId = crypto.randomUUID().slice(0, 8);

  req.logger.error({

    err,

    errorId,

    body: req.body,

    query: req.query,

  }, 'Unhandled request error');

  const statusCode = err.statusCode || 500;

  const isOperational = err.isOperational || false;

  res.status(statusCode).json({

    error: err.code || 'INTERNAL_ERROR',

    message: isOperational

      ? err.message

      : 'An unexpected error occurred. Please try again.',

    errorId,

  });

});

Process-Level Handlers

Even with error boundaries and global middleware, some errors escape. Unhandled promise rejections, uncaught exceptions, and out-of-memory errors can crash your Node.js process without any log entry. You need last-resort handlers:

// Process-level error handlers (Node.js)

process.on('unhandledRejection', (reason, promise) => {

  logger.fatal({

    reason: reason?.message || reason,

    stack: reason?.stack,

  }, 'Unhandled Promise Rejection — shutting down');

  // Give time for the log to flush, then exit

  setTimeout(() => process.exit(1), 1000);

});

process.on('uncaughtException', (error) => {

  logger.fatal({

    err: error,

  }, 'Uncaught Exception — shutting down');

  // MUST exit: the process is in an undefined state

  setTimeout(() => process.exit(1), 1000);

});

Note that on uncaughtException, you must exit the process. The Node.js documentation explicitly warns against continuing after an uncaught exception because the process state is unreliable. Log the error, let your process manager (PM2, Docker, Kubernetes) restart the process, and investigate the root cause from the logs.

6. Real-World Implementation

Now let us put everything together into a complete error handling architecture. This section covers custom error classes, Sentry integration, user-facing error pages, and how all the pieces connect.

Custom Error Classes

The foundation of your error handling architecture is a set of custom error classes that carry structured information. Instead of throw new Error("something went wrong"), you throw errors that include a status code, an error code, a user-friendly message, and machine-readable context.

// errors.js — Custom error hierarchy

class AppError extends Error {

  constructor(message, {

    statusCode = 500,

    code = 'INTERNAL_ERROR',

    userMessage = 'Something went wrong. Please try again.',

    context = {},

    isOperational = true,

  } = {}) {

    super(message);

    this.name = this.constructor.name;

    this.statusCode = statusCode;

    this.code = code;

    this.userMessage = userMessage;

    this.context = context;

    this.isOperational = isOperational; // true = expected, false = bug

    Error.captureStackTrace(this, this.constructor);

  }

}

class NotFoundError extends AppError {

  constructor(resource, id) {

    super(`${resource} not found: ${id}`, {

      statusCode: 404,

      code: 'NOT_FOUND',

      userMessage: `The requested ${resource.toLowerCase()} could not be found.`,

      context: { resource, id },

    });

  }

}

class ValidationError extends AppError {

  constructor(field, reason) {

    super(`Validation failed: ${field} — ${reason}`, {

      statusCode: 400,

      code: 'VALIDATION_ERROR',

      userMessage: `Invalid ${field}: ${reason}`,

      context: { field, reason },

    });

  }

}

class RateLimitError extends AppError {

  constructor(retryAfter = 60) {

    super(`Rate limit exceeded. Retry after ${retryAfter}s`, {

      statusCode: 429,

      code: 'RATE_LIMITED',

      userMessage: 'Too many requests. Please wait a moment and try again.',

      context: { retryAfter },

    });

  }

}

class ExternalServiceError extends AppError {

  constructor(service, originalError) {

    super(`External service failure: ${service}`, {

      statusCode: 502,

      code: 'EXTERNAL_SERVICE_ERROR',

      userMessage: 'A third-party service is temporarily unavailable. Please try again shortly.',

      context: { service, originalMessage: originalError.message },

    });

  }

}

The key distinction is the isOperational flag. Operational errors are expected — a user submitting invalid input, a resource not being found, an external API timing out. These are normal events that your code should handle gracefully. Non-operational errors are bugs — a null pointer, an undefined variable, a logic error. These indicate a defect in your code that needs to be fixed. Your error handlers should treat these differently: operational errors get a clean user message, non-operational errors get escalated and alerted.

Sentry Integration

Sentry is an error tracking platform that captures errors, groups them, shows you trends, and alerts you when new issues appear. The free tier gives you 5,000 errors per month, which is more than enough for student projects and small production apps.

// Sentry setup for Node.js (Express)

const Sentry = require('@sentry/node');

Sentry.init({

  dsn: process.env.SENTRY_DSN,

  environment: process.env.NODE_ENV,

  release: process.env.APP_VERSION,

  // Performance monitoring: capture 20% of transactions

  tracesSampleRate: 0.2,

  // Only send operational errors above a threshold

  beforeSend(event, hint) {

    const error = hint.originalException;

    // Don't send 404s to Sentry — they are noise

    if (error?.statusCode === 404) return null;

    // Scrub sensitive data

    if (event.request?.headers) {

      delete event.request.headers['authorization'];

      delete event.request.headers['cookie'];

    }

    return event;

  },

});

// Express integration

app.use(Sentry.Handlers.requestHandler());

app.use(Sentry.Handlers.tracingHandler());

// Your routes go here...

// Sentry error handler MUST be before your own error handler

app.use(Sentry.Handlers.errorHandler());

// Your custom error handler

app.use((err, req, res, next) => {

  // ... your error handling logic

});

Sentry provides several features that are invaluable in production:

Source maps: Upload your source maps so Sentry shows the original source code in stack traces, not minified JavaScript. This is the difference between seeing a.b.c is not a function (bundle.js:1:34982) and seeing user.profile.getName is not a function (UserCard.jsx:42).
Breadcrumbs: Sentry automatically records a trail of events (user clicks, API calls, console logs, navigation) leading up to each error. Instead of just seeing "TypeError: Cannot read property," you see "user clicked checkout button, API call to /api/cart succeeded, API call to /api/payment failed with 500, then TypeError occurred."
Performance monitoring: Track slow transactions, identify bottlenecks, and see which API endpoints are taking too long. This is related to error handling because slow endpoints often become failing endpoints under load.
Alerts: Configure Sentry to notify you via email, Slack, or PagerDuty when new error types appear or when error rates spike. You find out about problems before your users report them.

# Sentry setup for Python (FastAPI)

import sentry_sdk

from sentry_sdk.integrations.fastapi import FastApiIntegration

from sentry_sdk.integrations.sqlalchemy import SqlalchemyIntegration

sentry_sdk.init(

  dsn=os.environ["SENTRY_DSN"],

  environment=os.environ.get("APP_ENV", "development"),

  release=os.environ.get("APP_VERSION"),

  traces_sample_rate=0.2,

  integrations=[

    FastApiIntegration(),

    SqlalchemyIntegration(),

  ],

  before_send=scrub_sensitive_data,

)

User-Facing Error Pages

The final piece is what the user actually sees. Here is the principle: the user gets a friendly message with an error ID. The developer gets the full context through Sentry and structured logs, linked by that same error ID.

// React: user-facing error display component

function ErrorMessage({ errorId, onRetry }) {

  return (

    <div className="error-page">

      <h2>Something went wrong</h2>

      <p>

        We encountered an unexpected error. Our team has been

        notified and is looking into it.

      </p>

      {errorId && (

        <p className="error-id">

          If you contact support, reference error ID: <code>{errorId}</code>

        </p>

      )}

      {onRetry && (

        <button onClick={onRetry}>Try again</button>

      )}

    </div>

  );

}

The error ID is the bridge between the user's world and the developer's world. A user emails support: "I got error ERR-a3f9x2." The developer searches Sentry for that error ID and immediately sees the full stack trace, the user's session, the request payload, and the breadcrumbs leading up to the error. No back-and-forth asking "what were you doing when it happened?"

The Complete Architecture

Here is how all the pieces fit together in a production application:

Layer 1 — Custom error classes carry status codes, error codes, user messages, and context. Every throw in your codebase uses these.
Layer 2 — Circuit breakers protect external service calls. When a dependency fails, the breaker trips and serves a fallback instead of cascading the failure.
Layer 3 — Retry logic with exponential backoff handles transient failures inside the circuit breaker. Only transient errors are retried. Permanent errors fail fast.
Layer 4 — Structured logging with correlation IDs records every error with full context. JSON format makes logs searchable and filterable.
Layer 5 — Error boundaries (React) catch rendering errors at the component level and show fallback UI. Per-feature boundaries prevent one broken section from taking down the whole page.
Layer 6 — Global exception handlers (FastAPI/Express middleware) catch anything that slips through and return consistent, user-friendly error responses with error IDs.
Layer 7 — Process-level handlers catch unhandled rejections and uncaught exceptions, log them, and trigger a clean shutdown.
Layer 8 — Sentry aggregates all errors, groups them, tracks trends, provides breadcrumbs, and alerts you when new issues appear or error rates spike.

The outcome: When something goes wrong, the user sees a friendly message with an error ID and a retry button. The developer gets an alert in Slack with a Sentry link. Clicking it shows the full stack trace, the user's session, the request context, and the breadcrumb trail of every event leading up to the error. The structured logs provide the correlation ID to trace the request across every service it touched. The circuit breaker prevented the failure from cascading to other parts of the system. The retry logic already attempted recovery before escalating. This is production-grade resilience. It is not one technique — it is all of them working together.

You do not need to implement all eight layers on day one. Start with custom error classes and structured logging. Add Sentry next — the free tier takes 10 minutes to set up. Then add error boundaries in your React components. Circuit breakers and retry logic come in when you start depending on external services. Build the layers incrementally as your application grows. But understand the full picture now, so when you add each layer, you know where it fits and why it matters.

Error Handling That Actually Works: From try/catch to Production-Grade Resilience

1. The try/catch Trap

2. Error Classification

Transient vs Permanent Errors

User-Facing vs Developer-Facing Errors

3. Circuit Breakers and Retry Patterns

The Circuit Breaker Pattern

Retry with Exponential Backoff and Jitter

4. Structured Logging

The Problem with console.log

Structured Logs with Correlation IDs

Python Structured Logging with structlog

Log Levels

5. Error Boundaries and Global Handlers

React Error Boundaries

FastAPI Global Exception Handler

Express Error Middleware

Process-Level Handlers

6. Real-World Implementation

Custom Error Classes

Sentry Integration

User-Facing Error Pages

The Complete Architecture

Sources