Every student learns try/catch in their first programming course. Most of them leave it at that. They wrap every function in a try block, write catch (err) { console.log(err) }, and move on. The app "works" — until it doesn't, and when it breaks in production at 2 AM, nobody knows why because the logs say nothing useful, the error was swallowed three layers deep, and the user saw a blank white screen with no explanation.
This article is about building error handling that actually works. Not the textbook version with a single try/catch example. The real version — with circuit breakers that prevent cascade failures, retry logic that knows when to back off, structured logs that tell you exactly what happened, and error boundaries that keep your UI from dying. This is how production applications handle errors, and it is what separates a student project from software that real people depend on.
1. The try/catch Trap
Let us start with the most common mistake. Here is what most students write when they learn about error handling:
try {
const response = await fetch(`/api/users/${id}`);
const data = await response.json();
return data;
} catch (err) {
console.log(err);
}
}
This looks reasonable. It catches the error, logs it, and the program does not crash. But there are at least five problems with this approach.
Problem 1: The function returns undefined on failure. When the catch block runs, there is no return statement, so the function returns undefined. The calling code does not know something went wrong. It tries to access user.name on undefined and throws a completely different error somewhere else. Now you are debugging a "Cannot read property 'name' of undefined" error three files away from where the actual problem happened.
Problem 2: console.log is not logging. In a production environment, console.log goes to stdout, which usually means it disappears. It is not persisted, not searchable, not structured, and not connected to any alerting system. When the error happens at 2 AM, nobody sees it.
Problem 3: All errors are treated the same. A network timeout and a malformed user ID are fundamentally different problems requiring different responses. A network timeout might resolve itself in 2 seconds. A malformed user ID will never work no matter how many times you retry. But this catch block handles both identically: log and swallow.
Problem 4: No context. The log says something like TypeError: Failed to fetch. Which user? Which endpoint? What was the request payload? What was the HTTP status? What time? Which server instance? You have an error message with zero actionable context.
Problem 5: The user gets nothing. The function silently fails. The user clicks a button, nothing happens, and there is no feedback explaining what went wrong or what to do about it. They click again. And again. Maybe they generate 50 duplicate requests because nobody told them the first one failed.
The core principle: A silent failure is worse than a crash. A crash is loud — it shows up in logs, triggers alerts, and forces someone to fix it. A swallowed error is invisible. It corrupts data, confuses users, and hides the root cause behind layers of downstream symptoms. If you do not know how to handle an error, let it propagate. Do not catch it just to make the red squiggly line go away.
Here is what a better version looks like. It is not perfect yet — we will build toward that through this article — but it addresses the five problems above:
const response = await fetch(`/api/users/${id}`);
if (!response.ok) {
throw new AppError(
`Failed to fetch user ${id}`,
{ status: response.status, userId: id }
);
}
return response.json();
}
No try/catch here at all. The function throws when something goes wrong, with context (the user ID, the HTTP status). The caller decides how to handle it — retry, show an error message, or let it propagate further. This is error handling as communication, not error handling as suppression.
2. Error Classification
Before you can handle errors correctly, you need to classify them. Not all errors are the same, and treating them the same is how you end up retrying requests that will never succeed or crashing on errors that would resolve themselves in a second.
Transient vs Permanent Errors
Transient errors are temporary. They might work if you try again. Examples:
- Network timeout — the server was busy but is probably fine now
- HTTP 429 (Too Many Requests) — you hit a rate limit, wait and try again
- HTTP 503 (Service Unavailable) — the service is temporarily down
- Database connection pool exhausted — wait for a connection to free up
- DNS resolution failure — a blip, usually resolves in seconds
Permanent errors are not going away. Retrying will just waste resources and time. Examples:
- HTTP 400 (Bad Request) — your input is malformed, fix it
- HTTP 401 (Unauthorized) — your credentials are wrong
- HTTP 404 (Not Found) — the resource does not exist
- Validation error — the email address is not a valid email
- HTTP 403 (Forbidden) — you do not have permission, period
The decision tree is simple: if the error is transient, retry with backoff. If the error is permanent, fail fast and tell the user. Everything else in this article builds on that distinction.
User-Facing vs Developer-Facing Errors
This is the second axis of classification. Every error needs two messages: one for the user and one for the developer.
User-facing errors should be helpful but not reveal internals. "Something went wrong. Please try again. If the problem persists, contact support with error ID ERR-a3f9x2." The user gets enough to take action (try again, contact support) plus an error ID that connects their experience to your logs.
Developer-facing errors should be exhaustive. Stack trace, request context, user ID, session ID, input payload, which service failed, what the response was, timestamps, correlation IDs. Everything a developer needs to reproduce and fix the problem without asking the user "what were you doing when it happened?"
function isTransient(error) {
if (error.code === 'ECONNRESET') return true;
if (error.code === 'ETIMEDOUT') return true;
if (error.status === 429) return true;
if (error.status === 503) return true;
if (error.status === 502) return true;
if (error.status >= 500) return true; // server errors are often transient
return false;
}
// Usage in retry logic
async function fetchWithRetry(url, options, maxRetries = 3) {
for (let attempt = 0; attempt <= maxRetries; attempt++) {
try {
const response = await fetch(url, options);
if (response.ok) return response;
const error = new Error(`HTTP ${response.status}`);
error.status = response.status;
if (!isTransient(error)) throw error; // permanent: fail fast
if (attempt === maxRetries) throw error; // out of retries
await sleep(Math.pow(2, attempt) * 1000); // exponential backoff
} catch (err) {
if (!isTransient(err) || attempt === maxRetries) throw err;
await sleep(Math.pow(2, attempt) * 1000);
}
}
}
Rule of thumb: If the HTTP status starts with 4, it is usually your fault (permanent). If it starts with 5, it is usually the server's fault (transient). The exception is 429 — that is a 4xx code but is transient because waiting solves it.
3. Circuit Breakers and Retry Patterns
Retrying is good. Retrying forever against a dead service is a disaster. This is where circuit breakers come in — they are the missing piece between "retry on failure" and "bring down your entire system by hammering a dead dependency."
The Circuit Breaker Pattern
A circuit breaker works exactly like an electrical circuit breaker. It sits between your code and an external service, monitoring for failures. It has three states:
Closed (normal operation): Requests flow through normally. The breaker counts failures. Everything is fine.
Open (tripped): Too many failures crossed the threshold. The breaker "trips" and immediately rejects all requests without even trying to call the external service. This prevents your application from wasting time and resources on a service that is clearly down. Instead of waiting 30 seconds for each request to timeout, you fail instantly with a meaningful error.
Half-Open (testing recovery): After a cooldown period (say, 30 seconds), the breaker lets one request through as a test. If it succeeds, the breaker closes and normal operation resumes. If it fails, the breaker opens again and waits another cooldown period.
const CircuitBreaker = require('opossum');
async function callPaymentService(orderId, amount) {
const response = await fetch('https://payments.example.com/charge', {
method: 'POST',
body: JSON.stringify({ orderId, amount }),
headers: { 'Content-Type': 'application/json' },
});
if (!response.ok) throw new Error(`Payment failed: ${response.status}`);
return response.json();
}
const breaker = new CircuitBreaker(callPaymentService, {
timeout: 5000, // 5s timeout per request
errorThresholdPercentage: 50, // trip after 50% failure rate
resetTimeout: 30000, // try again after 30s
volumeThreshold: 5, // minimum 5 requests before tripping
});
breaker.on('open', () => logger.warn('Payment circuit OPEN'));
breaker.on('halfOpen', () => logger.info('Payment circuit HALF-OPEN'));
breaker.on('close', () => logger.info('Payment circuit CLOSED'));
// Usage: breaker.fire(orderId, amount)
breaker.fallback(() => ({
status: 'queued',
message: 'Payment service is temporarily unavailable. Your order is queued.'
}));
The critical detail is the fallback. When the circuit is open, you do not just throw an error. You provide a degraded but acceptable response. Maybe you queue the payment for later processing. Maybe you show cached data. Maybe you route to a backup service. The user's experience degrades gracefully instead of breaking completely.
Library options for circuit breakers: opossum for Node.js, pybreaker for Python, and resilience4j for Java. All three implement the same pattern with similar configuration options.
Retry with Exponential Backoff and Jitter
When you retry a failed request, you need to wait between attempts. But not a fixed wait — exponential backoff, meaning each retry waits longer than the last:
- Attempt 1: wait 1 second
- Attempt 2: wait 2 seconds
- Attempt 3: wait 4 seconds
- Attempt 4: wait 8 seconds
But there is a problem. If a service goes down and 10,000 clients all retry at the exact same exponential intervals, they all hit the service simultaneously at 1s, 2s, 4s, 8s — creating "retry storms" that keep the service down. The solution is jitter: add a random amount to each wait time so clients spread out their retries.
from tenacity import (
retry, stop_after_attempt, wait_exponential_jitter,
retry_if_exception_type
)
import httpx
class TransientError(Exception):
pass
@retry(
stop=stop_after_attempt(4), # max 4 attempts
wait=wait_exponential_jitter(
initial=1, # start at 1s
max=30, # cap at 30s
jitter=2 # +/- up to 2s random
),
retry=retry_if_exception_type(TransientError), # only retry transient
)
async def fetch_user_profile(user_id: str):
async with httpx.AsyncClient() as client:
response = await client.get(f"https://api.example.com/users/{user_id}")
if response.status_code == 429 or response.status_code >= 500:
raise TransientError(f"HTTP {response.status_code}")
response.raise_for_status()
return response.json()
In Node.js, the p-retry library provides the same functionality:
const pRetry = require('p-retry');
async function fetchUserProfile(userId) {
return pRetry(async () => {
const response = await fetch(`/api/users/${userId}`);
if (response.status === 400 || response.status === 404) {
// Permanent error: abort retries immediately
throw new pRetry.AbortError(`Permanent failure: ${response.status}`);
}
if (!response.ok) {
throw new Error(`HTTP ${response.status}`); // will be retried
}
return response.json();
}, {
retries: 3,
minTimeout: 1000, // 1s initial delay
factor: 2, // double each time
randomize: true, // add jitter
});
}
Three rules for retry logic that you should memorize:
- Never retry permanent errors. A 400 Bad Request will fail every single time. Retrying it wastes resources and delays the error message to the user.
- Always cap your retries. Three to five retries is usually enough. If the service is not back after 4 attempts over ~15 seconds of exponential backoff, it is not coming back in the next 30 seconds either. Let the circuit breaker handle it.
- Always add jitter. Without jitter, synchronized retries from multiple clients create thundering herd problems that can keep a struggling service down.
Circuit breakers and retries work together. Retries handle individual transient failures. Circuit breakers handle systemic failures where an entire service is down. The retry logic runs inside the circuit breaker. If retries keep failing, the circuit breaker trips and stops all attempts until the service recovers.
4. Structured Logging
You cannot fix errors you cannot see. And console.log("something went wrong") is not visibility. Structured logging means writing logs as machine-readable JSON objects with consistent fields, so they can be searched, filtered, aggregated, and alerted on.
The Problem with console.log
Here is what unstructured logging looks like in a real application:
console.log("user not found");
console.log("DB connection failed");
console.log(err);
console.log("retrying...");
console.log("payment failed for order " + orderId);
Now imagine you have 50 servers each processing 1,000 requests per second. You get a support ticket: "I tried to update my profile 10 minutes ago and it did not work." How do you find the relevant log line among millions of entries? You cannot. There is no user ID, no request ID, no timestamp format, no severity level, and no way to correlate this log entry to that user's specific request.
Structured Logs with Correlation IDs
Here is the same information as a structured log:
const pino = require('pino');
const logger = pino({
level: process.env.LOG_LEVEL || 'info',
formatters: {
level: (label) => ({ level: label }),
},
timestamp: pino.stdTimeFunctions.isoTime,
});
// Middleware to attach correlation ID to every request
app.use((req, res, next) => {
req.correlationId = req.headers['x-correlation-id'] || crypto.randomUUID();
req.logger = logger.child({
correlationId: req.correlationId,
userId: req.user?.id,
method: req.method,
path: req.path,
ip: req.ip,
});
next();
});
// Now every log in the request context includes that info automatically
req.logger.error({
err: error,
orderId: '12345',
paymentProvider: 'stripe',
amount: 49.99,
}, 'Payment processing failed');
This produces a JSON log entry like:
"level": "error",
"time": "2026-03-17T14:23:01.456Z",
"correlationId": "a3f9x2-b7c1-4d8e-9f0a",
"userId": "user_8842",
"method": "POST",
"path": "/api/payments",
"ip": "192.168.1.42",
"orderId": "12345",
"paymentProvider": "stripe",
"amount": 49.99,
"err": {
"message": "Card declined",
"stack": "Error: Card declined\n at processPayment..."
},
"msg": "Payment processing failed"
}
Now you can search: "show me all errors for user_8842 in the last hour" or "show me all payment failures with correlationId a3f9x2" or "show me all 5xx errors on the /api/payments endpoint." The correlation ID is especially powerful: it lets you trace a single request across multiple services. If your API calls a payment service which calls a fraud detection service, the same correlation ID flows through all three, and you can reconstruct the entire journey of a request.
Python Structured Logging with structlog
import structlog
structlog.configure(
processors=[
structlog.contextvars.merge_contextvars,
structlog.processors.add_log_level,
structlog.processors.TimeStamper(fmt="iso"),
structlog.processors.StackInfoRenderer(),
structlog.processors.format_exc_info,
structlog.processors.JSONRenderer(),
],
)
logger = structlog.get_logger()
# In your FastAPI middleware
@app.middleware("http")
async def logging_middleware(request: Request, call_next):
correlation_id = request.headers.get("x-correlation-id", str(uuid4()))
structlog.contextvars.clear_contextvars()
structlog.contextvars.bind_contextvars(
correlation_id=correlation_id,
method=request.method,
path=request.url.path,
)
response = await call_next(request)
return response
# Now every log call automatically includes correlation_id, method, path
logger.error("payment_failed", order_id="12345", amount=49.99)
Log Levels
Use log levels consistently across your team. Here is the standard hierarchy:
- ERROR: Something failed and needs human attention. A payment did not process, a database query failed, an external API returned an unexpected error.
- WARN: Something unexpected happened but the system handled it. A retry succeeded on the second attempt, a cache miss caused a slow response, a deprecated API endpoint was called.
- INFO: Normal business events. A user logged in, an order was placed, a payment was processed. These are your audit trail.
- DEBUG: Detailed information for troubleshooting. Variable values, function entry/exit, SQL queries. Only enabled in development or temporarily in production when investigating a specific issue.
Never log sensitive data. No passwords, no API keys, no access tokens, no credit card numbers, no session IDs, no personally identifiable information (PII) beyond what is strictly necessary. This is not just good practice — it is a legal requirement under GDPR, CCPA, and PCI-DSS. If your logs contain credit card numbers and you get audited, your company faces serious fines. Scrub sensitive fields before they reach the logger.
5. Error Boundaries and Global Handlers
Individual try/catch blocks handle errors at the function level. But what about errors that slip through? You need defense at every layer: component-level in the UI, route-level in the API, and process-level in the runtime.
React Error Boundaries
In React, if a component throws an error during rendering, the entire application crashes and the user sees a blank white screen. Error boundaries are React components that catch JavaScript errors in their child component tree and render a fallback UI instead of crashing the whole page.
import React from 'react';
class ErrorBoundary extends React.Component {
constructor(props) {
super(props);
this.state = { hasError: false, errorId: null };
}
static getDerivedStateFromError(error) {
return { hasError: true };
}
componentDidCatch(error, errorInfo) {
const errorId = crypto.randomUUID().slice(0, 8);
this.setState({ errorId });
// Send to error tracking service
Sentry.captureException(error, {
contexts: {
react: { componentStack: errorInfo.componentStack }
},
tags: { errorId },
});
}
render() {
if (this.state.hasError) {
return (
<div className="error-fallback">
<h2>Something went wrong</h2>
<p>We have been notified and are looking into it.</p>
<p>Error ID: {this.state.errorId}</p>
<button onClick={() => this.setState({ hasError: false })}>
Try again
</button>
</div>
);
}
return this.props.children;
}
}
The common mistake is wrapping your entire app in a single error boundary. That means any error anywhere kills the entire page. Instead, use per-feature error boundaries:
function Dashboard() {
return (
<div className="dashboard">
<ErrorBoundary fallback={<p>Could not load analytics.</p>}>
<AnalyticsPanel />
</ErrorBoundary>
<ErrorBoundary fallback={<p>Could not load recent orders.</p>}>
<RecentOrders />
</ErrorBoundary>
<ErrorBoundary fallback={<p>Could not load notifications.</p>}>
<NotificationFeed />
</ErrorBoundary>
</div>
);
}
Now if the analytics API fails, the analytics panel shows "Could not load analytics" while the rest of the dashboard keeps working. This is graceful degradation — the same principle behind the circuit breaker fallback, applied at the UI level.
FastAPI Global Exception Handler
In FastAPI, unhandled exceptions return a generic 500 response. A global exception handler catches everything and formats consistent, useful error responses:
from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse
app = FastAPI()
class AppError(Exception):
def __init__(self, message, status_code=500, error_code=None, details=None):
self.message = message
self.status_code = status_code
self.error_code = error_code or "INTERNAL_ERROR"
self.details = details
@app.exception_handler(AppError)
async def app_error_handler(request: Request, exc: AppError):
error_id = str(uuid4())[:8]
logger.error("app_error",
error_id=error_id,
error_code=exc.error_code,
message=exc.message,
path=str(request.url),
details=exc.details,
)
return JSONResponse(
status_code=exc.status_code,
content={
"error": exc.error_code,
"message": exc.message,
"errorId": error_id,
},
)
@app.exception_handler(Exception)
async def unhandled_error_handler(request: Request, exc: Exception):
error_id = str(uuid4())[:8]
logger.error("unhandled_error",
error_id=error_id,
exc_info=exc,
path=str(request.url),
)
return JSONResponse(
status_code=500,
content={
"error": "INTERNAL_ERROR",
"message": "An unexpected error occurred. Please try again.",
"errorId": error_id,
},
)
Express Error Middleware
app.use((err, req, res, next) => {
const errorId = crypto.randomUUID().slice(0, 8);
req.logger.error({
err,
errorId,
body: req.body,
query: req.query,
}, 'Unhandled request error');
const statusCode = err.statusCode || 500;
const isOperational = err.isOperational || false;
res.status(statusCode).json({
error: err.code || 'INTERNAL_ERROR',
message: isOperational
? err.message
: 'An unexpected error occurred. Please try again.',
errorId,
});
});
Process-Level Handlers
Even with error boundaries and global middleware, some errors escape. Unhandled promise rejections, uncaught exceptions, and out-of-memory errors can crash your Node.js process without any log entry. You need last-resort handlers:
process.on('unhandledRejection', (reason, promise) => {
logger.fatal({
reason: reason?.message || reason,
stack: reason?.stack,
}, 'Unhandled Promise Rejection — shutting down');
// Give time for the log to flush, then exit
setTimeout(() => process.exit(1), 1000);
});
process.on('uncaughtException', (error) => {
logger.fatal({
err: error,
}, 'Uncaught Exception — shutting down');
// MUST exit: the process is in an undefined state
setTimeout(() => process.exit(1), 1000);
});
Note that on uncaughtException, you must exit the process. The Node.js documentation explicitly warns against continuing after an uncaught exception because the process state is unreliable. Log the error, let your process manager (PM2, Docker, Kubernetes) restart the process, and investigate the root cause from the logs.
6. Real-World Implementation
Now let us put everything together into a complete error handling architecture. This section covers custom error classes, Sentry integration, user-facing error pages, and how all the pieces connect.
Custom Error Classes
The foundation of your error handling architecture is a set of custom error classes that carry structured information. Instead of throw new Error("something went wrong"), you throw errors that include a status code, an error code, a user-friendly message, and machine-readable context.
class AppError extends Error {
constructor(message, {
statusCode = 500,
code = 'INTERNAL_ERROR',
userMessage = 'Something went wrong. Please try again.',
context = {},
isOperational = true,
} = {}) {
super(message);
this.name = this.constructor.name;
this.statusCode = statusCode;
this.code = code;
this.userMessage = userMessage;
this.context = context;
this.isOperational = isOperational; // true = expected, false = bug
Error.captureStackTrace(this, this.constructor);
}
}
class NotFoundError extends AppError {
constructor(resource, id) {
super(`${resource} not found: ${id}`, {
statusCode: 404,
code: 'NOT_FOUND',
userMessage: `The requested ${resource.toLowerCase()} could not be found.`,
context: { resource, id },
});
}
}
class ValidationError extends AppError {
constructor(field, reason) {
super(`Validation failed: ${field} — ${reason}`, {
statusCode: 400,
code: 'VALIDATION_ERROR',
userMessage: `Invalid ${field}: ${reason}`,
context: { field, reason },
});
}
}
class RateLimitError extends AppError {
constructor(retryAfter = 60) {
super(`Rate limit exceeded. Retry after ${retryAfter}s`, {
statusCode: 429,
code: 'RATE_LIMITED',
userMessage: 'Too many requests. Please wait a moment and try again.',
context: { retryAfter },
});
}
}
class ExternalServiceError extends AppError {
constructor(service, originalError) {
super(`External service failure: ${service}`, {
statusCode: 502,
code: 'EXTERNAL_SERVICE_ERROR',
userMessage: 'A third-party service is temporarily unavailable. Please try again shortly.',
context: { service, originalMessage: originalError.message },
});
}
}
The key distinction is the isOperational flag. Operational errors are expected — a user submitting invalid input, a resource not being found, an external API timing out. These are normal events that your code should handle gracefully. Non-operational errors are bugs — a null pointer, an undefined variable, a logic error. These indicate a defect in your code that needs to be fixed. Your error handlers should treat these differently: operational errors get a clean user message, non-operational errors get escalated and alerted.
Sentry Integration
Sentry is an error tracking platform that captures errors, groups them, shows you trends, and alerts you when new issues appear. The free tier gives you 5,000 errors per month, which is more than enough for student projects and small production apps.
const Sentry = require('@sentry/node');
Sentry.init({
dsn: process.env.SENTRY_DSN,
environment: process.env.NODE_ENV,
release: process.env.APP_VERSION,
// Performance monitoring: capture 20% of transactions
tracesSampleRate: 0.2,
// Only send operational errors above a threshold
beforeSend(event, hint) {
const error = hint.originalException;
// Don't send 404s to Sentry — they are noise
if (error?.statusCode === 404) return null;
// Scrub sensitive data
if (event.request?.headers) {
delete event.request.headers['authorization'];
delete event.request.headers['cookie'];
}
return event;
},
});
// Express integration
app.use(Sentry.Handlers.requestHandler());
app.use(Sentry.Handlers.tracingHandler());
// Your routes go here...
// Sentry error handler MUST be before your own error handler
app.use(Sentry.Handlers.errorHandler());
// Your custom error handler
app.use((err, req, res, next) => {
// ... your error handling logic
});
Sentry provides several features that are invaluable in production:
- Source maps: Upload your source maps so Sentry shows the original source code in stack traces, not minified JavaScript. This is the difference between seeing
a.b.c is not a function (bundle.js:1:34982)and seeinguser.profile.getName is not a function (UserCard.jsx:42). - Breadcrumbs: Sentry automatically records a trail of events (user clicks, API calls, console logs, navigation) leading up to each error. Instead of just seeing "TypeError: Cannot read property," you see "user clicked checkout button, API call to /api/cart succeeded, API call to /api/payment failed with 500, then TypeError occurred."
- Performance monitoring: Track slow transactions, identify bottlenecks, and see which API endpoints are taking too long. This is related to error handling because slow endpoints often become failing endpoints under load.
- Alerts: Configure Sentry to notify you via email, Slack, or PagerDuty when new error types appear or when error rates spike. You find out about problems before your users report them.
import sentry_sdk
from sentry_sdk.integrations.fastapi import FastApiIntegration
from sentry_sdk.integrations.sqlalchemy import SqlalchemyIntegration
sentry_sdk.init(
dsn=os.environ["SENTRY_DSN"],
environment=os.environ.get("APP_ENV", "development"),
release=os.environ.get("APP_VERSION"),
traces_sample_rate=0.2,
integrations=[
FastApiIntegration(),
SqlalchemyIntegration(),
],
before_send=scrub_sensitive_data,
)
User-Facing Error Pages
The final piece is what the user actually sees. Here is the principle: the user gets a friendly message with an error ID. The developer gets the full context through Sentry and structured logs, linked by that same error ID.
function ErrorMessage({ errorId, onRetry }) {
return (
<div className="error-page">
<h2>Something went wrong</h2>
<p>
We encountered an unexpected error. Our team has been
notified and is looking into it.
</p>
{errorId && (
<p className="error-id">
If you contact support, reference error ID: <code>{errorId}</code>
</p>
)}
{onRetry && (
<button onClick={onRetry}>Try again</button>
)}
</div>
);
}
The error ID is the bridge between the user's world and the developer's world. A user emails support: "I got error ERR-a3f9x2." The developer searches Sentry for that error ID and immediately sees the full stack trace, the user's session, the request payload, and the breadcrumbs leading up to the error. No back-and-forth asking "what were you doing when it happened?"
The Complete Architecture
Here is how all the pieces fit together in a production application:
- Layer 1 — Custom error classes carry status codes, error codes, user messages, and context. Every
throwin your codebase uses these. - Layer 2 — Circuit breakers protect external service calls. When a dependency fails, the breaker trips and serves a fallback instead of cascading the failure.
- Layer 3 — Retry logic with exponential backoff handles transient failures inside the circuit breaker. Only transient errors are retried. Permanent errors fail fast.
- Layer 4 — Structured logging with correlation IDs records every error with full context. JSON format makes logs searchable and filterable.
- Layer 5 — Error boundaries (React) catch rendering errors at the component level and show fallback UI. Per-feature boundaries prevent one broken section from taking down the whole page.
- Layer 6 — Global exception handlers (FastAPI/Express middleware) catch anything that slips through and return consistent, user-friendly error responses with error IDs.
- Layer 7 — Process-level handlers catch unhandled rejections and uncaught exceptions, log them, and trigger a clean shutdown.
- Layer 8 — Sentry aggregates all errors, groups them, tracks trends, provides breadcrumbs, and alerts you when new issues appear or error rates spike.
The outcome: When something goes wrong, the user sees a friendly message with an error ID and a retry button. The developer gets an alert in Slack with a Sentry link. Clicking it shows the full stack trace, the user's session, the request context, and the breadcrumb trail of every event leading up to the error. The structured logs provide the correlation ID to trace the request across every service it touched. The circuit breaker prevented the failure from cascading to other parts of the system. The retry logic already attempted recovery before escalating. This is production-grade resilience. It is not one technique — it is all of them working together.
You do not need to implement all eight layers on day one. Start with custom error classes and structured logging. Add Sentry next — the free tier takes 10 minutes to set up. Then add error boundaries in your React components. Circuit breakers and retry logic come in when you start depending on external services. Build the layers incrementally as your application grows. But understand the full picture now, so when you add each layer, you know where it fits and why it matters.