When Caching Makes Systems Less Predictable

Situation

Caching is commonly introduced into systems that already work correctly.

The motivation is usually straightforward: reduce latency, lower load on a shared dependency, or smooth out traffic spikes. The cache sits in front of an existing data source, and the rest of the system remains unchanged. Reads become faster, and the underlying store sees fewer requests.

From the outside, very little appears to change. The same requests are made. The same data is returned. The system behaves as it did before, just faster.

This kind of change is rarely framed as a redesign. It is an optimization applied to a known-good system, often incrementally and without much ceremony.

The Reasonable Assumption

Given that framing, the expectations are predictable.

A cached response is assumed to be equivalent to an uncached one. If the underlying data changes, the cache is expected to reflect that change within some acceptable window. Bugs, when they occur, should be reproducible and traceable to a specific input or state.

Most importantly, caching is expected to affect performance characteristics, not system behavior. Turning the cache off should make the system slower, but not different.

These assumptions are not naive. They are consistent with how caching is typically described and how it behaves in simple cases.

What Actually Happened

Over time, the system began to exhibit behavior that was difficult to explain.

Identical requests sometimes returned different results. Recently updated data appeared to revert to older values. Issues would surface in production but disappear when investigated locally or under debugging. Restarting a service or redeploying code would “fix” problems that could not be reproduced on demand.

The most unsettling aspect was not that the system was wrong, but that it was unpredictable. Engineers could no longer reason about the current state of the system based solely on inputs and code. The same execution path could legitimately produce different outcomes depending on timing, deployment order, or which instance handled the request.

Removing the cache did restore predictability — but only temporarily. By that point, the system had grown to rely on behavior that the cache enabled.

An Illustrative Example

At a glance, the caching logic looked unremarkable.

const key = `user:${userId}`

let user = cache.get(key)
if (!user) {
  user = loadUserFromStore(userId)
  cache.set(key, user, { ttl: 300 })
}

return user

The code is correct in isolation. The cache key appears stable. The time-to-live is explicit. The cached value mirrors what would have been returned directly from the store.

The hidden complexity did not live in this function. It lived in what the key did not express: which version of the user data was expected, which related records were implicitly assumed to be stable, and which writes elsewhere in the system were supposed to invalidate this entry.

Why It Happened

Cache Keys Encode Assumptions

Cache keys tend to start simple. They often mirror function arguments or identifiers that feel fundamental to the data being retrieved.

Over time, the meaning of that data changes. Additional fields are added. Related entities begin to influence the response. Previously irrelevant context becomes significant.

The cache key, however, often remains unchanged. It continues to represent an earlier, simpler understanding of the data. Correctness now depends on assumptions that exist only in engineers’ heads, not in the key itself.

When those assumptions drift, the cache does not fail loudly. It continues to return values that are internally consistent but externally outdated or incomplete.

Time Becomes a Hidden Dimension

Without caching, data access is largely atemporal. A read reflects the current state of the underlying store at the moment it occurs.

Caching introduces time as a first-class factor in correctness. Whether a response is valid now depends on when it was computed, how long it was retained, and what changes occurred in the interim.

Two identical requests, issued seconds apart, may return different results without any code changes or errors. This is expected behavior from the cache’s perspective, but it violates many engineers’ mental models of how the system works.

As a result, bugs become sensitive to timing. They appear and disappear based on request order, background activity, or deployment cadence.

Invalidation Is a Distributed Problem, Even Locally

Cache invalidation is often described as a hard problem in distributed systems, but the same dynamics appear within a single service.

Writes may occur in multiple places. Some code paths remember to invalidate cached data, others do not. Partial invalidation leaves the system in states that are technically consistent but semantically incorrect.

Deploys and restarts add another layer of complexity. Some caches are cleared, others persist. The system briefly returns to a predictable state, only to drift again as traffic and writes resume.

None of this requires large scale or high concurrency. These effects emerge naturally as systems evolve and responsibilities spread across code paths.

Alternatives That Didn’t Fully Work

Several reasonable mitigations were attempted.

Reducing TTLs narrowed the window of inconsistency but increased load and did not eliminate timing-related bugs. Manual invalidation improved some cases while introducing new failure modes when invalidation was missed or triggered too broadly.

Bypassing the cache in “critical” paths helped with correctness in specific scenarios, but created multiple behavioral modes for the same data depending on where it was accessed.

Additional logging and metrics made the cache more observable, but did not make its effects easier to reason about. The system was now transparent and unpredictable rather than opaque and unpredictable.

Each of these approaches addressed symptoms without removing the underlying source of complexity.

Practical Takeaways

Certain patterns tend to appear when caching begins to erode predictability.

Systems where correctness depends on cache freshness rather than data integrity. Cache keys that implicitly mirror business rules. Bugs that disappear when caches are cleared or services are restarted.

These signals do not imply that caching was a mistake. They indicate that caching has become part of the system’s behavior, not just its performance profile.

Closing Reflection

Caching is often introduced as a way to simplify systems under load. In practice, it frequently shifts complexity rather than removing it.

The cost is not paid upfront. It is paid later, during change, when assumptions encoded in cache keys, TTLs, and invalidation logic no longer match the system’s reality.

At that point, performance gains coexist with a quieter loss: the ability to confidently predict how the system will behave next.