
Why Read Replicas Didn't Reduce Database Load
Read replicas reduce database load only when the expensive work can actually move away from the primary.
That sounds obvious, but it is the part many scaling plans skip. A workload can look read-heavy in dashboards and still keep the primary hot because the reads are tied to writes, consistency guarantees, transactions, cache misses, replication lag, or query plans that replicas do not fix.
When read replicas do not reduce database load, the usual problem is not that replicas are broken. It is that the system was not limited by portable read execution in the first place.
For related database performance, locking, migration, and data-correctness topics, see the SQL And Data Correctness hub.
The Common Expectation
The architecture diagram is tempting:
writes -> primary
reads -> replicas
If 80% of traffic is reads, moving those reads to replicas should free the primary to handle writes. That assumption is reasonable when:
- reads are truly read-only
- stale data is acceptable
- queries are independent of the write path
- replicas can keep up with the write volume
- the application can route reads correctly
- the bottleneck is read execution, not coordination
Many systems match that model early. Reporting pages, dashboards, search screens, public catalog reads, and admin exports can often move to replicas safely.
But production workloads rarely stay that clean.
A Load Split That Looked Better Than It Was
Suppose the primary database handles this apparent workload:
| Operation type | Share of application queries |
|---|---|
| Reads | 80% |
| Writes | 20% |
That looks like a strong replica candidate.
After adding replicas, the real split might look more like this:
| Query class | Where it actually runs | Why |
|---|---|---|
| Public product reads | Replica | Stale data is acceptable |
| User settings after update | Primary | Must read its own write |
| Checkout availability check | Primary | Correctness depends on latest inventory |
| Dashboard aggregation | Replica | Eventually consistent is acceptable |
| Reads inside write transactions | Primary | They share transactional context |
| Session permission checks | Primary | Application requires current grants |
| Failed replica reads retried on primary | Primary | Lag or timeout fallback |
The headline "80% reads" hid the important question: which reads can tolerate replica semantics?
If only a small fraction of read cost can move, the primary remains the bottleneck.
Reads Inside Transactions Usually Do Not Move
Application code often hides write-coupled reads.
This looks like a read:
const user = await db.readOnly().user.findById(userId)
This is different:
await db.transaction(async (tx) => {
await tx.user.update({
where: { id: userId },
data: { lastSeenAt: new Date() },
})
return tx.user.findById(userId)
})
The second query is a read, but it belongs to the write transaction. Routing it to a replica would break the expectation that the transaction sees its own changes.
This pattern appears in ordinary code:
- update a row, then return the updated view
- create an order, then fetch the order summary
- change permissions, then check access
- reserve inventory, then read availability
- write audit state, then render confirmation
From a dashboard, these are reads. From a correctness perspective, they are part of the primary path.
That distinction matters more than the read/write ratio.
Replica Lag Turns Optional Consistency Into A Runtime Decision
Most read replicas are not magic clones updated at exactly the same instant as the primary. They apply changes after the primary produces them.
In PostgreSQL, streaming replication sends write-ahead log records from the primary to standby servers. The PostgreSQL documentation on log-shipping standby servers describes streaming replication as asynchronous by default, with delay between a primary commit and visibility on the standby. PostgreSQL's hot standby documentation also describes conflict cases where standby queries can delay replay or be canceled so recovery can continue.
The exact behavior depends on your database and configuration, but the design question is universal:
What should the application do when the replica is behind?
Common answers include:
| Behavior | Consequence |
|---|---|
| Serve stale data | Lower primary load, but users may see old state |
| Wait for replica freshness | Adds latency and may block requests |
| Retry on another replica | Can multiply read load |
| Fallback to primary | Preserves freshness, but concentrates load during stress |
| Use sticky primary reads after writes | Improves correctness, but limits offload |
Fallback-to-primary is especially dangerous during peak load. The moment replicas lag, more reads return to the primary exactly when the primary is already under pressure.
That is how replicas can help during calm periods and fail to help during the moment you needed them most.
Replicas Can Increase Total Database Work
Replicas add read capacity. They also add replication work.
Every write on the primary has to be made durable and then shipped, retained, or applied for replicas depending on the database architecture. More replicas can mean more network traffic, more WAL or binlog retention pressure, more monitoring work, more connection pools, more query planning variation, and more places for lag to appear.
That does not mean replicas are bad. It means they are not free capacity.
If the primary is already write-bound, adding replicas may not reduce the bottleneck. It can even make the system heavier if:
- the write volume is high
- long replica queries delay replay
- replication slots or logs accumulate
- fallback reads return to primary
- application routing creates extra connection churn
- replicas run expensive queries that compete for shared storage or IO
The core question is whether replicas remove more pressure than they add.
Query Cost Still Matters On Replicas
A slow query does not become cheap because it runs somewhere else.
If the query scans millions of rows, sorts large intermediate results, misses useful indexes, or builds expensive joins, replicas only move that work away from the primary. They do not make the work efficient.
This matters in two ways.
First, a replica can become saturated and start lagging. Second, the application may then retry or fall back to the primary, pulling the original problem back into the bottleneck.
For example:
SELECT *
FROM orders
WHERE customer_id = $1
ORDER BY created_at DESC
LIMIT 50;
This query may be cheap with the right index:
CREATE INDEX orders_customer_created_idx
ON orders (customer_id, created_at DESC);
Without that access path, adding replicas may hide the slow plan for a while. It does not change the plan.
Before treating replicas as the fix, identify the expensive queries and their plans. For deeper query-diagnosis background, see How to Find and Fix Slow SQL Queries in Production and Why Database Indexes Didn't Fix Your Slow Query. The key point for this article is simple: replicas distribute query execution, not query cost.
Application Routing Is Usually Messier Than The Diagram
Replica routing sounds like a database concern. In practice, it becomes an application correctness concern.
The application has to know which reads are safe to run on a replica. That decision may depend on:
- whether the current request performed a write
- whether the user expects read-your-writes behavior
- whether stale data can trigger a wrong business action
- whether the query runs inside a transaction
- whether the result controls authorization
- whether the replica is currently lagging
- whether the query has a primary-only dependency
A simple helper can hide too much:
const db = request.method === 'GET' ? replicaDb : primaryDb
HTTP method is not a consistency model. A GET request can still need fresh data if it follows a write, checks permission, confirms payment state, or controls a one-time action.
A safer routing model is explicit about read classes:
type ReadConsistency = 'stale-ok' | 'read-your-writes' | 'primary-required'
function readDb(consistency: ReadConsistency) {
if (consistency === 'stale-ok' && replicaHealth.isFresh()) {
return replicaDb
}
return primaryDb
}
That still does not solve capacity by itself. It makes the trade-off visible.
Why More Replicas Did Not Help
When replicas fail to reduce primary load, the cause is usually one or more of these:
| Cause | How it shows up |
|---|---|
| Write-bound primary | CPU, WAL, locks, or IO remain high after reads move |
| Write-coupled reads | Transactional or read-after-write paths still hit primary |
| Replica lag fallback | Primary traffic rises during peak or incidents |
| Inefficient query plans | Replicas saturate instead of relieving pressure |
| Hot data concentration | Many requests need the same fresh rows |
| Hidden writes in read paths | Reads update last_seen, counters, sessions, or audit rows |
| Connection pool pressure | More database nodes create more poorly managed connections |
| Cache behavior unchanged | Cache misses still stampede toward primary-sensitive paths |
The fix depends on which row is true. Adding another replica only helps the cases where portable read execution is the bottleneck.
What To Measure Before Adding Replicas
Before betting on replicas, classify the workload.
| Question | What to look for |
|---|---|
| Which queries dominate primary CPU and IO? | Query stats, plan samples, wait events |
| Which reads happen after writes in the same request? | Application traces and transaction boundaries |
| Which reads must be fresh? | Business rules and user-visible correctness |
| How much replica lag is acceptable per path? | Product and operational requirements |
| What happens when lag exceeds that threshold? | Fallback, wait, stale response, or rejection |
| Are slow reads actually slow plans? | EXPLAIN output and index coverage |
| Are "read" endpoints writing incidental state? | Sessions, counters, audit, tracking, locks |
| Can replicas keep up during peak writes? | Lag, replay rate, WAL volume, query conflicts |
You need this classification before the rollout, not after the primary stays hot.
Alternatives When Replicas Are Not Enough
Replicas are one tool. If they do not move the bottleneck, the fix may be elsewhere.
Remove Incidental Writes
Read paths that update last_seen, analytics counters, notification state, or audit rows can keep traffic tied to the primary. Move low-value writes to an async path when correctness allows it.
Make Staleness Explicit
Some reads can tolerate stale data. Some cannot. Mark that in code and in product behavior.
Fix Expensive Queries
Replicas can buy time, but slow plans still need query and index work.
Cache The Right Layer
Caching can reduce repeated reads, but it introduces its own consistency risks. If the cache can serve stale data at the wrong time, it may produce the same correctness problem as a lagging replica. That trade-off is discussed in Why Caching Causes Inconsistent Data in Production.
Split Workloads Intentionally
Reporting, exports, and analytics may deserve separate data pipelines instead of sharing replicas with user-facing reads.
Revisit The Data Model
If every important read is fresh and write-coupled, the system may need a different consistency boundary, aggregate table, materialized view, or workflow design. Roll those changes out carefully; schema and data movement in production need the sequencing from Safe Database Migrations in Production.
Practical Checklist
Before adding read replicas:
- list the top primary queries by time, CPU, IO, and frequency
- classify each read as
stale-ok,read-your-writes, orprimary-required - identify reads inside write transactions
- measure current write volume and replication capacity
- define acceptable replica lag per user path
- decide fallback behavior before lag happens
- verify that expensive reads have reasonable query plans
- test peak traffic with replica routing enabled
- alert on replica lag and fallback-to-primary rate
- check whether total database work went down, not just whether reads moved
The last item is the trap. Moving queries to replicas is not the same as reducing system cost.
The Short Version
Read replicas help when the bottleneck is portable, stale-tolerant read execution.
They do not help much when the primary is constrained by writes, transactions, read-your-writes behavior, replica lag fallback, inefficient queries, or hidden coupling between reads and writes.
Before adding replicas, classify the reads. After adding replicas, measure primary load, replica lag, fallback rate, and total database work.
If the primary stays hot, the system was probably not read-bound in the way the architecture diagram suggested.