
Circuit Breaker Pattern in Microservices: Prevent Cascading Failures
Situation
Microservice systems fail in layers, not in isolation. One dependency slows down, then upstream services hold connections longer, worker pools fill, queue depth rises, and failure spreads outward.
At this point, teams often add more retries and longer timeouts. That can buy a little time, but it usually increases pressure on the failing dependency and accelerates the collapse.
The circuit breaker pattern exists to stop that feedback loop. Its purpose is simple: detect unhealthy dependency behavior quickly and stop sending it traffic until recovery is likely.
In other words, a circuit breaker is not a performance feature. It is a failure containment mechanism.
What the Circuit Breaker Pattern Does
A circuit breaker sits between a caller service and a dependency. It tracks recent outcomes and moves through three states:
- Closed: Requests flow normally while error and latency rates remain acceptable.
- Open: Calls are rejected immediately because recent behavior indicates the dependency is unhealthy.
- Half-open: A limited number of trial requests are allowed to test whether the dependency recovered.
This state model changes how failure propagates:
- Without a breaker, every request can block on a degraded dependency.
- With a breaker, the system fails fast and preserves upstream capacity.
That distinction is what prevents cascading failures under load.
Why This Matters in Microservices
Microservices increase the number of network boundaries in a request path. Every boundary adds new failure modes:
- transient network drops
- saturation in downstream worker pools
- latency spikes from GC, lock contention, or noisy neighbors
- partial outages where only one shard or AZ is degraded
When several services depend on the same downstream system, local slowdowns become global pressure. If all callers continue sending traffic during degradation, recovery becomes harder over time.
Circuit breakers enforce backpressure at the caller boundary. They reduce damage by refusing work the dependency is unlikely to process successfully.
Implementation Example
The core behavior is straightforward:
type State = 'closed' | 'open' | 'half_open';
class CircuitBreaker {
private state: State = 'closed';
private failures = 0;
private successes = 0;
private openedAt = 0;
constructor(
private readonly failureThreshold = 5,
private readonly resetTimeoutMs = 15_000,
private readonly halfOpenMaxCalls = 2
) {}
async execute<T>(operation: () => Promise<T>): Promise<T> {
if (this.state === 'open') {
if (Date.now() - this.openedAt < this.resetTimeoutMs) {
throw new Error('Circuit open');
}
this.state = 'half_open';
this.failures = 0;
this.successes = 0;
}
if (this.state === 'half_open' && this.successes >= this.halfOpenMaxCalls) {
this.state = 'closed';
this.failures = 0;
this.successes = 0;
}
try {
const result = await operation();
if (this.state === 'half_open') {
this.successes++;
} else {
this.failures = 0;
}
return result;
} catch (err) {
this.failures++;
if (this.failures >= this.failureThreshold) {
this.state = 'open';
this.openedAt = Date.now();
}
throw err;
}
}
}
This code shows the state machine, but production systems need stronger guardrails than fixed counters alone.
Common Misconfigurations That Make Breakers Useless
1) Triggering only on hard errors
If your breaker ignores latency and only counts explicit failures, it opens too late. Many outages begin with high latency first, then hard errors later.
2) Using global thresholds without traffic context
Five failures might be catastrophic for low-traffic paths and irrelevant for high-traffic ones. Use rolling windows and failure rates, not only absolute counts.
3) Allowing too much half-open traffic
Half-open is a probe, not a full restore. If you release too many requests immediately, you can knock over a recovering dependency.
4) No fallback behavior
Opening the breaker without a fallback turns graceful degradation into immediate user-visible failure. Fallback can be cached data, default responses, queued work, or partial functionality.
5) Layering retries on top of an open circuit
If clients retry aggressively after Circuit open, you reintroduce load amplification through another path. Retries and breakers must be tuned together.
Practical Production Design
For most systems, the circuit breaker should be paired with:
- timeout budgets per dependency
- bounded concurrency or bulkheads
- retry limits with jittered backoff
- idempotency where retry exists
- explicit fallback behavior
A good baseline policy:
- Timeout quickly for dependency calls.
- Record both failures and slow-call ratio.
- Open the breaker when failure or slow-call thresholds are exceeded in a rolling window.
- Return fallback immediately while the circuit is open.
- Probe with a small half-open budget.
- Close only after consistent successful probes.
This prevents the dependency from becoming a shared bottleneck that drags healthy services down with it.
Metrics to Watch
If you cannot observe breaker behavior, you cannot trust it during incidents. At minimum, track:
- circuit state transitions (
closed -> open -> half_open -> closed) - open duration
- rejection rate due to open circuit
- slow-call ratio
- fallback usage rate
- upstream queue depth and saturation
The goal is not zero breaker opens. A breaker that never opens may simply be misconfigured. The goal is controlled, short-lived degradation instead of system-wide failure.
When Circuit Breakers Are Not Enough
Circuit breakers reduce blast radius, but they do not increase dependency capacity. They are a containment layer, not a cure.
If a dependency is chronically overloaded, you still need:
- capacity planning
- workload shaping
- query and call-path optimization
- architecture changes that reduce synchronous fan-out
Treat the breaker as part of your reliability envelope, not as a substitute for system design.
Closing Reflection
In microservices, resilience depends on how quickly services stop doing harmful work under stress. The circuit breaker pattern does exactly that.
It interrupts failure amplification, preserves caller resources, and creates recovery space for degraded dependencies. Configured well, it turns catastrophic cascades into bounded incidents.
That is the real value of circuit breakers in production systems.