Why Read Replicas Didn't Reduce Database Load

Why Read Replicas Didn't Reduce Database Load

Read replicas reduce database load only when the expensive work can actually move away from the primary.

That sounds obvious, but it is the part many scaling plans skip. A workload can look read-heavy in dashboards and still keep the primary hot because the reads are tied to writes, consistency guarantees, transactions, cache misses, replication lag, or query plans that replicas do not fix.

When read replicas do not reduce database load, the usual problem is not that replicas are broken. It is that the system was not limited by portable read execution in the first place.

For related database performance, locking, migration, and data-correctness topics, see the SQL And Data Correctness hub.


The Common Expectation

The architecture diagram is tempting:

writes -> primary
reads  -> replicas

If 80% of traffic is reads, moving those reads to replicas should free the primary to handle writes. That assumption is reasonable when:

  • reads are truly read-only
  • stale data is acceptable
  • queries are independent of the write path
  • replicas can keep up with the write volume
  • the application can route reads correctly
  • the bottleneck is read execution, not coordination

Many systems match that model early. Reporting pages, dashboards, search screens, public catalog reads, and admin exports can often move to replicas safely.

But production workloads rarely stay that clean.


A Load Split That Looked Better Than It Was

Suppose the primary database handles this apparent workload:

Operation typeShare of application queries
Reads80%
Writes20%

That looks like a strong replica candidate.

After adding replicas, the real split might look more like this:

Query classWhere it actually runsWhy
Public product readsReplicaStale data is acceptable
User settings after updatePrimaryMust read its own write
Checkout availability checkPrimaryCorrectness depends on latest inventory
Dashboard aggregationReplicaEventually consistent is acceptable
Reads inside write transactionsPrimaryThey share transactional context
Session permission checksPrimaryApplication requires current grants
Failed replica reads retried on primaryPrimaryLag or timeout fallback

The headline "80% reads" hid the important question: which reads can tolerate replica semantics?

If only a small fraction of read cost can move, the primary remains the bottleneck.


Reads Inside Transactions Usually Do Not Move

Application code often hides write-coupled reads.

This looks like a read:

const user = await db.readOnly().user.findById(userId)

This is different:

await db.transaction(async (tx) => {
  await tx.user.update({
    where: { id: userId },
    data: { lastSeenAt: new Date() },
  })

  return tx.user.findById(userId)
})

The second query is a read, but it belongs to the write transaction. Routing it to a replica would break the expectation that the transaction sees its own changes.

This pattern appears in ordinary code:

  • update a row, then return the updated view
  • create an order, then fetch the order summary
  • change permissions, then check access
  • reserve inventory, then read availability
  • write audit state, then render confirmation

From a dashboard, these are reads. From a correctness perspective, they are part of the primary path.

That distinction matters more than the read/write ratio.


Replica Lag Turns Optional Consistency Into A Runtime Decision

Most read replicas are not magic clones updated at exactly the same instant as the primary. They apply changes after the primary produces them.

In PostgreSQL, streaming replication sends write-ahead log records from the primary to standby servers. The PostgreSQL documentation on log-shipping standby servers describes streaming replication as asynchronous by default, with delay between a primary commit and visibility on the standby. PostgreSQL's hot standby documentation also describes conflict cases where standby queries can delay replay or be canceled so recovery can continue.

The exact behavior depends on your database and configuration, but the design question is universal:

What should the application do when the replica is behind?

Common answers include:

BehaviorConsequence
Serve stale dataLower primary load, but users may see old state
Wait for replica freshnessAdds latency and may block requests
Retry on another replicaCan multiply read load
Fallback to primaryPreserves freshness, but concentrates load during stress
Use sticky primary reads after writesImproves correctness, but limits offload

Fallback-to-primary is especially dangerous during peak load. The moment replicas lag, more reads return to the primary exactly when the primary is already under pressure.

That is how replicas can help during calm periods and fail to help during the moment you needed them most.


Replicas Can Increase Total Database Work

Replicas add read capacity. They also add replication work.

Every write on the primary has to be made durable and then shipped, retained, or applied for replicas depending on the database architecture. More replicas can mean more network traffic, more WAL or binlog retention pressure, more monitoring work, more connection pools, more query planning variation, and more places for lag to appear.

That does not mean replicas are bad. It means they are not free capacity.

If the primary is already write-bound, adding replicas may not reduce the bottleneck. It can even make the system heavier if:

  • the write volume is high
  • long replica queries delay replay
  • replication slots or logs accumulate
  • fallback reads return to primary
  • application routing creates extra connection churn
  • replicas run expensive queries that compete for shared storage or IO

The core question is whether replicas remove more pressure than they add.


Query Cost Still Matters On Replicas

A slow query does not become cheap because it runs somewhere else.

If the query scans millions of rows, sorts large intermediate results, misses useful indexes, or builds expensive joins, replicas only move that work away from the primary. They do not make the work efficient.

This matters in two ways.

First, a replica can become saturated and start lagging. Second, the application may then retry or fall back to the primary, pulling the original problem back into the bottleneck.

For example:

SELECT *
FROM orders
WHERE customer_id = $1
ORDER BY created_at DESC
LIMIT 50;

This query may be cheap with the right index:

CREATE INDEX orders_customer_created_idx
  ON orders (customer_id, created_at DESC);

Without that access path, adding replicas may hide the slow plan for a while. It does not change the plan.

Before treating replicas as the fix, identify the expensive queries and their plans. For deeper query-diagnosis background, see How to Find and Fix Slow SQL Queries in Production and Why Database Indexes Didn't Fix Your Slow Query. The key point for this article is simple: replicas distribute query execution, not query cost.


Application Routing Is Usually Messier Than The Diagram

Replica routing sounds like a database concern. In practice, it becomes an application correctness concern.

The application has to know which reads are safe to run on a replica. That decision may depend on:

  • whether the current request performed a write
  • whether the user expects read-your-writes behavior
  • whether stale data can trigger a wrong business action
  • whether the query runs inside a transaction
  • whether the result controls authorization
  • whether the replica is currently lagging
  • whether the query has a primary-only dependency

A simple helper can hide too much:

const db = request.method === 'GET' ? replicaDb : primaryDb

HTTP method is not a consistency model. A GET request can still need fresh data if it follows a write, checks permission, confirms payment state, or controls a one-time action.

A safer routing model is explicit about read classes:

type ReadConsistency = 'stale-ok' | 'read-your-writes' | 'primary-required'

function readDb(consistency: ReadConsistency) {
  if (consistency === 'stale-ok' && replicaHealth.isFresh()) {
    return replicaDb
  }

  return primaryDb
}

That still does not solve capacity by itself. It makes the trade-off visible.


Why More Replicas Did Not Help

When replicas fail to reduce primary load, the cause is usually one or more of these:

CauseHow it shows up
Write-bound primaryCPU, WAL, locks, or IO remain high after reads move
Write-coupled readsTransactional or read-after-write paths still hit primary
Replica lag fallbackPrimary traffic rises during peak or incidents
Inefficient query plansReplicas saturate instead of relieving pressure
Hot data concentrationMany requests need the same fresh rows
Hidden writes in read pathsReads update last_seen, counters, sessions, or audit rows
Connection pool pressureMore database nodes create more poorly managed connections
Cache behavior unchangedCache misses still stampede toward primary-sensitive paths

The fix depends on which row is true. Adding another replica only helps the cases where portable read execution is the bottleneck.


What To Measure Before Adding Replicas

Before betting on replicas, classify the workload.

QuestionWhat to look for
Which queries dominate primary CPU and IO?Query stats, plan samples, wait events
Which reads happen after writes in the same request?Application traces and transaction boundaries
Which reads must be fresh?Business rules and user-visible correctness
How much replica lag is acceptable per path?Product and operational requirements
What happens when lag exceeds that threshold?Fallback, wait, stale response, or rejection
Are slow reads actually slow plans?EXPLAIN output and index coverage
Are "read" endpoints writing incidental state?Sessions, counters, audit, tracking, locks
Can replicas keep up during peak writes?Lag, replay rate, WAL volume, query conflicts

You need this classification before the rollout, not after the primary stays hot.


Alternatives When Replicas Are Not Enough

Replicas are one tool. If they do not move the bottleneck, the fix may be elsewhere.

Remove Incidental Writes

Read paths that update last_seen, analytics counters, notification state, or audit rows can keep traffic tied to the primary. Move low-value writes to an async path when correctness allows it.

Make Staleness Explicit

Some reads can tolerate stale data. Some cannot. Mark that in code and in product behavior.

Fix Expensive Queries

Replicas can buy time, but slow plans still need query and index work.

Cache The Right Layer

Caching can reduce repeated reads, but it introduces its own consistency risks. If the cache can serve stale data at the wrong time, it may produce the same correctness problem as a lagging replica. That trade-off is discussed in Why Caching Causes Inconsistent Data in Production.

Split Workloads Intentionally

Reporting, exports, and analytics may deserve separate data pipelines instead of sharing replicas with user-facing reads.

Revisit The Data Model

If every important read is fresh and write-coupled, the system may need a different consistency boundary, aggregate table, materialized view, or workflow design. Roll those changes out carefully; schema and data movement in production need the sequencing from Safe Database Migrations in Production.


Practical Checklist

Before adding read replicas:

  1. list the top primary queries by time, CPU, IO, and frequency
  2. classify each read as stale-ok, read-your-writes, or primary-required
  3. identify reads inside write transactions
  4. measure current write volume and replication capacity
  5. define acceptable replica lag per user path
  6. decide fallback behavior before lag happens
  7. verify that expensive reads have reasonable query plans
  8. test peak traffic with replica routing enabled
  9. alert on replica lag and fallback-to-primary rate
  10. check whether total database work went down, not just whether reads moved

The last item is the trap. Moving queries to replicas is not the same as reducing system cost.


The Short Version

Read replicas help when the bottleneck is portable, stale-tolerant read execution.

They do not help much when the primary is constrained by writes, transactions, read-your-writes behavior, replica lag fallback, inefficient queries, or hidden coupling between reads and writes.

Before adding replicas, classify the reads. After adding replicas, measure primary load, replica lag, fallback rate, and total database work.

If the primary stays hot, the system was probably not read-bound in the way the architecture diagram suggested.