Why Caching Causes Inconsistent Data in Production

Caching causes inconsistent data in production when the cached value no longer represents the same truth as the database, API, or service that originally produced it. The cache still makes reads faster, but the system now has two places that can answer the same question, and those answers can temporarily disagree.

That disagreement is not always a bug. Product recommendations, public catalog pages, and expensive aggregate dashboards may tolerate bounded staleness. The problem starts when the application treats cached data as if it were always current while writes, invalidation, deployments, background jobs, or multiple service instances move the real state forward.

For the broader data-correctness cluster, see the SQL And Data Correctness hub. For reliability patterns where caches are used as fallback or load reduction, see the Backend Reliability hub.

The Cache Did Not Break The Code

Caching is often introduced after a system already works.

An endpoint reads account limits from the database. It is correct, but slow under repeated traffic:

async function getAccountLimits(accountId: string) {
  return db.accountLimits.findUnique({
    where: { accountId },
    select: {
      accountId: true,
      plan: true,
      monthlyApiLimit: true,
      apiCallsUsed: true,
      updatedAt: true,
    },
  })
}

The cache-aside version looks like a harmless optimization:

async function getAccountLimits(accountId: string) {
  const key = `account-limits:${accountId}`
  const cached = await redis.get(key)

  if (cached) {
    return JSON.parse(cached)
  }

  const limits = await db.accountLimits.findUnique({
    where: { accountId },
    select: {
      accountId: true,
      plan: true,
      monthlyApiLimit: true,
      apiCallsUsed: true,
      updatedAt: true,
    },
  })

  await redis.set(key, JSON.stringify(limits), { EX: 300 })

  return limits
}

The first request after a miss reads from the database and stores the value for five minutes. Later requests return from Redis. Latency drops. Database reads drop. The endpoint appears unchanged.

Microsoft's Cache-Aside pattern describes this common shape: the application checks the cache, loads from the data store on a miss, then adds the item to the cache. It also warns that cached data cannot always remain consistent with the data store and that stale data must be handled deliberately. See Cache-Aside pattern.

The code did not become obviously wrong. The system gained a second timeline.

A Stale-Read Timeline

Now add a normal product action: an account upgrades from basic to pro.

The write path updates the database and deletes the cache key:

async function upgradeAccountPlan(accountId: string) {
  await db.accountLimits.update({
    where: { accountId },
    data: {
      plan: 'pro',
      monthlyApiLimit: 1_000_000,
      updatedAt: new Date(),
    },
  })

  await redis.del(`account-limits:${accountId}`)
}

That order is usually the right default for cache-aside: commit the source of truth, then invalidate the cache. Microsoft calls out the order because deleting the cache before updating the data store creates a window where another request can repopulate the cache with old data.

Even with the better order, the system can still produce surprising behavior:

Time	Event	Result
10:00:00	Request A reads `basic` limits and caches them for 300 sec	Cache now contains `basic`
10:00:20	Account upgrade starts	Database update in progress
10:00:21	Request B reads from cache while write is not finished	User still sees `basic`
10:00:22	Database commits `pro`	Source of truth says `pro`
10:00:23	Cache delete fails or times out	Cache still says `basic`
10:01:00	API quota check reads from cache	Request may be rejected using old limits
10:05:00	TTL expires	Next cache miss finally loads `pro`

Nothing in that timeline requires exotic distributed systems. A cache delete can time out. A worker can crash after committing the database update. A write path can forget invalidation. A local in-memory cache on one instance can survive while another instance refreshes.

The user experiences it as inconsistent data: the billing page says the account is upgraded, while the quota check still behaves as if it is not.

The Real Contract Is Freshness

The dangerous part of caching is not the cache key. It is the unstated freshness contract behind the key.

For account-limits:${accountId}, the hidden questions are:

Which fields does this cached value represent?
Which writes can change those fields?
How stale may the value be before behavior becomes wrong?
Is the value used for display, authorization, billing, quota enforcement, or background decisions?
Does every writer know how to invalidate or refresh it?
Can two application instances disagree about the same account?

Those questions matter more than the TTL.

Five minutes may be acceptable for a marketing page. Five seconds may be too long for account suspension, permissions, inventory, price guarantees, quota enforcement, or fraud decisions. A shorter TTL narrows the stale window, but it does not turn an implicit correctness rule into an explicit one.

Google's Memorystore guidance frames this boundary clearly: a cache is a good candidate when the data does not need to appear in the application immediately, and if a value exists only in cache, the application must behave acceptably if that value expires and disappears. See Caching data with Memorystore.

That is a useful test for every production cache: if the cached value is missing, stale, or briefly contradictory, does the system still behave safely?

Cache Keys Drift As Responses Grow

Cache keys often start accurate and become wrong later.

The first version caches only account limits:

account-limits:acct_123

Later the endpoint response grows:

plan name
quota limit
current usage
feature entitlements
billing status
trial expiration
fraud hold
organization-level override

The key stays the same. The cached value now depends on more state than the key expresses.

type AccountLimitResponse = {
  accountId: string
  plan: 'basic' | 'pro' | 'enterprise'
  monthlyApiLimit: number
  apiCallsUsed: number
  canUseBulkExport: boolean
  billingStatus: 'active' | 'past_due' | 'suspended'
  organizationOverride: 'none' | 'temporary_limit_increase'
}

Now many writes can make the cached response stale:

Changed state	Cached field affected	Invalidation often missed because...
plan upgrade	`plan`, `monthlyApiLimit`	billing service owns the write path
usage counter	`apiCallsUsed`	worker updates usage asynchronously
feature entitlement	`canUseBulkExport`	flag or entitlement system is owned elsewhere
invoice payment failure	`billingStatus`	webhook handler updates billing state
support override	`organizationOverride`	admin tool writes through a different code path

The original cache key still looks stable, but the response has become a join across multiple business concepts. If every one of those concepts needs to invalidate the same key, the cache is no longer a small performance detail. It is a distributed contract.

This is similar to the way read replicas can look like a scaling fix while quietly changing read-after-write behavior. That related consistency trade-off is covered in Why Read Replicas Didn't Reduce Database Load.

Cache-Aside Races Are Easy To Miss

Cache-aside is popular because it is simple, but the race windows are real.

The most common stale write-back race looks like this:

Step	Request A	Request B
1	Cache miss for `account-limits:acct_123`
2	Reads old `basic` value from database
3		Upgrades account to `pro` in database
4		Deletes cache key
5	Writes old `basic` value into cache

The cache now contains basic even though the database says pro.

The code for Request A looked correct. It loaded from the source of truth on a miss. The problem is that the value it loaded was already outdated by the time it wrote the cache.

There are several ways to reduce this risk:

Write the database first, then invalidate the cache.
Give cached values short freshness windows only when the business can tolerate stale data.
Include a version, updatedAt, or monotonic revision in the cached payload.
Avoid caching values used for authorization, billing enforcement, or critical state transitions.
Use compare-and-set or version checks when refreshing a key that can race with writes.
Move important invalidation into the same transaction boundary as the state change, or into a reliable outbox event.

That last option matters when several services or workers can change the state. If a database write and an invalidation event must stay coordinated, the reliability problem starts to resemble event publication. The durable pattern for that boundary is explained in Transactional Outbox Pattern in Microservices.

TTL Is Not A Correctness Strategy

TTL is useful. It is not enough.

The TTL tells the cache when a value should expire. It does not say whether the value is safe to use for a decision.

Cache use case	Typical stale tolerance	Safer rule
public article page	minutes may be acceptable	TTL is usually enough with purge on publish
product recommendations	minutes or hours may be fine	make stale behavior visible in ranking metrics
account display name	short staleness often fine	invalidate on profile update
quota enforcement	usually very low	read source of truth or use versioned counters
permissions	near zero	avoid cache or use explicit revocation/version checks
prices during checkout	depends on business contract	define price guarantee and fallback behavior
fraud or suspension state	near zero	do not rely on ordinary TTL-only cache

TTL reduces how long stale data can survive after a missed invalidation. It does not protect the system during the TTL window.

If the product rule is "new permissions must apply immediately," then a five-second cache can still be wrong. If the product rule is "recommendations can lag by ten minutes," then a five-minute TTL may be perfectly acceptable.

Write the freshness requirement first. Choose the cache policy second.

Write-Through And Write-Behind Have Different Failure Modes

Cache-aside is not the only pattern.

Write-through updates the cache when the database is updated. Write-behind accepts a write into the cache or buffer and later persists it to the database. Redis describes the key distinction directly: write-through syncs immediately, while write-behind syncs asynchronously and can leave cache and database inconsistent for a short time. See Redis: Write Behind vs Write Through.

The choice should match the correctness requirement:

Pattern	What it optimizes	Main correctness risk
Cache-aside	simple read performance	stale values after missed invalidation or race windows
Read-through	centralizes cache loading	still needs freshness rules and invalidation
Write-through	keeps cache warm after writes	write path now depends on cache availability and latency
Write-behind/write-back	lower write latency or buffered writes	acknowledged writes can be lost or delayed before durable persistence
Local in-memory cache	fastest reads inside one process	instances disagree until expiry or restart

There is no universally safest pattern. A write-through cache may improve read freshness but make writes slower or more fragile. A write-behind cache may improve write latency but weaken durability unless the queue and retry path are designed carefully. A local cache may be excellent for static configuration and risky for account state.

The mistake is choosing a pattern only by performance.

Local Caches Create Per-Instance Truth

Distributed caches are not the only source of inconsistency. Local in-memory caches can be worse because each process holds its own copy.

Imagine four application instances:

app-1: account acct_123 = basic
app-2: account acct_123 = pro
app-3: cache miss, reads database
app-4: old value until restart

Requests routed to different instances can return different answers. A deploy may clear two instances before the other two. Autoscaling may create new instances with empty caches while older instances keep stale values. A bug can appear to vanish after restart because the restart accidentally cleared the stale local copy.

Microsoft's cache-aside guidance calls out local caching as a special consistency risk because private caches on different application instances can quickly diverge. That warning is easy to underestimate until production traffic starts moving across many instances.

Local caches are safest when:

the data is static or versioned
stale data is harmless
the value is not tenant-specific or security-sensitive
reload behavior is explicit
operators can see which version each instance holds

They are risky when the cached value controls permissions, prices, limits, or state transitions.

What To Measure

Cache metrics often stop at hit ratio. Hit ratio matters, but it does not tell you whether the cache is safe.

Add correctness-oriented signals:

Signal	Why it matters
cache hit/miss by key family	shows which workflows actually depend on the cache
value age at read time	reveals stale-but-not-expired behavior
invalidation success/failure count	catches missed deletes and timeout patterns
invalidation lag	measures time between database commit and cache update
source-of-truth bypass count	shows how often code distrusts the cache
read-after-write mismatch samples	proves whether critical flows see their own writes
cache error fallback behavior	shows whether failures become slower reads or wrong reads
per-instance local cache version	detects disagreement across application instances

For example, include cache metadata in internal logs or traces:

logger.info('account limits read', {
  accountId,
  cacheKey: key,
  cacheHit: Boolean(cached),
  cachedVersion: limits.version,
  cachedAgeMs: Date.now() - new Date(limits.updatedAt).getTime(),
  source: cached ? 'cache' : 'database',
})

The goal is not to log every value. The goal is to make stale reads explainable. During an incident, engineers should be able to answer: did this response come from cache, how old was it, and what invalidation should have happened?

A Safer Cache Design Checklist

Before adding a cache to a production path, write the contract in plain engineering terms:

Name the source of truth.
Name the cached value and every field it includes.
Define the maximum acceptable staleness for each workflow using the value.
List every write path that can make the cached value stale.
Decide whether writes invalidate, refresh, version, or bypass the cache.
Decide what happens when cache read, write, or delete fails.
Add observability for cache hit, value age, invalidation failures, and fallback path.
Add at least one read-after-write test for critical workflows.
Roll out behind a feature flag or per-route switch when the path is important.
Document how to disable the cache safely during an incident.

For the account-limits example, a better contract might be:

Cached value:
  account limit display model, not authorization state

Source of truth:
  account_limits table

Allowed staleness:
  display page: 60 seconds
  quota enforcement: no ordinary cache; read versioned source of truth

Invalidation:
  plan upgrade, billing suspension, support override, and entitlement change

Failure behavior:
  cache unavailable means slower database read, not stale authorization

That contract prevents a common failure: a cache introduced for display performance quietly becomes part of enforcement logic.

If the cached read happens inside a transaction-sensitive API flow, review the request boundary too. Caching can hide database reads, but it does not remove the need for correct atomic writes. That adjacent concern is covered in Database Transaction Boundaries in Backend APIs.

When Caching Is The Wrong First Fix

Caching is tempting when the database is slow, but it can hide the reason the database is slow.

Do not add a cache first when:

an endpoint has N+1 query behavior
a query scans far more rows than it returns
a missing index or wrong index shape is the real issue
the value is used for permissions or financial decisions
writes are frequent and invalidation rules are unclear
the team cannot observe cache freshness or invalidation failures

Fix the underlying query shape first when possible. If an endpoint runs 101 queries for 50 rows, caching may reduce some database work but leave the access pattern fragile. That is covered in N+1 Query Problem in ORMs.

If the single query is slow because the database plan is wrong, use the query-plan workflow before adding another stateful layer. Start with How to Find and Fix Slow SQL Queries in Production.

Caching is strongest after you understand the read pattern and decide that bounded staleness is an acceptable trade-off.

The Short Version

Caching improves performance by duplicating answers.

Duplicated answers create a correctness question: when the database, service, or API changes, how quickly must the cached answer change too?

Production cache bugs usually come from hidden freshness contracts:

cache keys that no longer describe the full response
writes that forget to invalidate or refresh a key
cache-aside races that write old values after newer commits
TTLs used as a substitute for correctness rules
local caches that disagree across instances
write-behind paths that acknowledge work before durable persistence
missing observability for value age and invalidation failure

The fix is not "never cache." The fix is to treat the cache as part of system behavior. Name the source of truth, define the freshness budget, choose the caching pattern by correctness risk, test read-after-write paths, and make stale reads observable before production traffic depends on them.