SQL Isolation Levels Explained

SQL Isolation Levels Explained

Transaction bugs are rarely caused by SQL syntax. They usually come from assuming concurrent requests will see and update data in a cleaner order than production actually guarantees.

Why Isolation Levels Matter in Real Systems

Two requests arrive at nearly the same time.

One checks whether an item is in stock. Another reserves the last unit. A background worker updates inventory totals a few milliseconds later.

Each query is valid in isolation. The failure happens because these operations overlap.

That is the problem isolation levels are trying to control.

Isolation defines how much one transaction can observe or interfere with the intermediate work of another transaction. It does not make concurrency disappear. It defines which kinds of anomalies the database is allowed to expose while concurrent work is happening.

This matters most in the same kinds of systems where correctness problems are expensive:

  • payment flows
  • inventory updates
  • account balance changes
  • job claiming and queue processing
  • retries after timeouts

If your application has concurrent writes, isolation is not a database trivia topic. It is part of your correctness model.


What a Transaction Actually Protects

A transaction gives you a unit of work with commit or rollback semantics.

That sounds stronger than it really is.

Many engineers assume that once code is wrapped in a transaction, concurrent behavior is "handled." That is only partly true.

Transactions guarantee atomicity for the work inside a single transaction. Isolation level determines how that transaction behaves relative to other transactions running at the same time.

That means two applications can both be "using transactions" and still have very different correctness guarantees depending on:

  • isolation level
  • locking behavior
  • query shape
  • uniqueness constraints
  • retry behavior
  • the database engine itself

The important practical question is not:

"Am I using a transaction?"

It is:

"What incorrect interleavings can still happen under this isolation level?"


The Main Concurrency Problems Isolation Levels Address

Different databases explain isolation with slightly different implementation details, but the common anomalies are stable enough to reason about.

Dirty reads

Transaction A writes a value but has not committed yet. Transaction B reads that uncommitted value. If A later rolls back, B observed data that never really became true.

This is usually unacceptable in application code.

Non-repeatable reads

Transaction A reads the same row twice. Transaction B commits an update between those reads. Transaction A now gets a different value for the same row inside one transaction.

This matters when you assume repeated reads inside a transaction represent one stable view.

Phantom reads

Transaction A runs a query like:

SELECT * FROM orders WHERE status = 'pending';

Transaction B inserts or updates rows so they now match that predicate. When A reruns the query, extra rows "appear."

This matters when logic depends on the set of matching rows staying stable.

Lost updates

Two transactions read the same value, both compute a new result, and the later write silently overwrites the earlier one.

For example:

  1. balance is 100
  2. transaction A reads 100
  3. transaction B reads 100
  4. A writes 90
  5. B writes 80

If both debits were supposed to apply, the correct result was 70. One update was lost.

Not every database presents lost update handling the same way, so it is safer to treat this as an application risk you must actively test for rather than a problem the transaction will magically solve.


Read Committed

Read Committed is the default isolation level in many production systems because it gives reasonable concurrency with moderate overhead.

Under Read Committed:

  • you do not read uncommitted data
  • each statement sees data committed before that statement begins
  • later statements in the same transaction may see newer committed data

That means dirty reads are prevented, but non-repeatable reads and some race conditions are still possible.

Example:

BEGIN;

SELECT stock FROM products WHERE id = 42;
-- returns 1

-- another transaction commits stock = 0

SELECT stock FROM products WHERE id = 42;
-- now returns 0

COMMIT;

Nothing is broken from the database point of view. The transaction simply did not get one stable snapshot across its whole lifetime.

Read Committed is often fine for:

  • short request transactions
  • simple CRUD operations
  • workflows protected by unique constraints
  • cases where rereading newer committed data is acceptable

It becomes risky when the application performs read-check-write logic like:

  • "if balance is high enough, then debit"
  • "if no job is claimed, then claim it"
  • "if stock is available, then reserve it"

Those are exactly the kinds of paths where retries and duplicate requests can become correctness issues. If the same operation may be repeated, you often need more than one transaction alone. I covered that boundary in Idempotency Keys for Duplicate API Requests.


Repeatable Read

Repeatable Read strengthens the guarantee by making repeated reads more stable within one transaction.

At a high level, it is designed so that rows read earlier in the transaction do not unexpectedly change underneath you.

In practice, this usually means:

  • a transaction sees a more consistent snapshot
  • repeated reads of the same row stay stable
  • some classes of read-check-write bugs become less likely

This is useful for workflows where you must make multiple decisions based on one coherent view of data.

Example use cases:

  • generating a report from several related queries
  • validating an order before a multi-step write
  • processing a batch where rows should not appear to mutate mid-transaction

But Repeatable Read is not a synonym for "fully serialized correctness."

Important caveat:

  • behavior differs by database engine
  • some systems still allow phantom-like effects depending on access pattern
  • write-write conflicts and lost updates can still require explicit locking or retries

That is why "which isolation level should I pick?" is usually the second question. The first is:

What exact invariant am I protecting?

If the invariant is "only one worker may claim this job" or "inventory must never go below zero," you may still need row locks, unique constraints, or a different workflow design.


Serializable

Serializable is the strongest standard isolation level.

Its goal is to make concurrent transactions behave as if they had run one at a time in some serial order.

That does not mean the database literally runs one transaction at a time. It means the database prevents or aborts interleavings that would violate serializable behavior.

This is the closest thing to:

"Protect me from dangerous concurrency assumptions."

It is valuable when correctness matters more than maximizing concurrent throughput.

Good candidates include:

  • ledger-like financial operations
  • inventory reservation with strict no-oversell rules
  • workflows enforcing cross-row business invariants
  • systems where silent anomalies are worse than transaction retries

The trade-off is important:

  • more contention
  • more blocking or serialization failures
  • more need for retry logic at the application layer

So Serializable is powerful, but it is not "free safety." It shifts part of the complexity into transaction retries, timeout handling, and careful monitoring.

This is one reason production systems sometimes fail even when the design looked safe in code review. The real system cost appears under concurrency, not in the isolated happy path. That broader pattern also shows up in Why Tests Pass but Production Still Breaks.


Isolation Level Comparison

For practical backend work, a simple mental model is enough:

  • Read Committed: good default when each statement may see newer committed data
  • Repeatable Read: better when one transaction needs a stable view of rows it has already read
  • Serializable: use when business invariants must survive concurrency, and you can handle retries

The mistake is not picking the "wrong" level in the abstract. The mistake is choosing one without mapping it to the invariant your application must preserve.


When Isolation Level Alone Is Not Enough

Isolation levels are important, but many production bugs still require additional safeguards.

Unique constraints

If the rule is "this record must exist at most once," put that rule in the database.

Examples:

  • one idempotency key per account and request
  • one active subscription per user and plan
  • one reservation row per seat and event

Application checks alone are too easy to race.

Row-level locking

If one transaction must reserve the right to modify a row before others can proceed, explicit locks may be the correct tool.

Typical example:

SELECT * FROM jobs
WHERE id = 123
FOR UPDATE;

This is often a better fit than globally increasing the isolation level when the contention is narrow and localized.

Optimistic concurrency control

Sometimes it is cheaper to detect a conflicting update than to lock aggressively.

A version column or updated-at check can make lost updates visible instead of silent.

Idempotency

Retries are normal in distributed systems. Isolation level does not stop a client from submitting the same logical action twice.

For write APIs, the right boundary is often:

  • transaction for local correctness
  • idempotency key for duplicate requests
  • outbox or queue workflow for downstream side effects

If a successful transaction must later publish an event, the next correctness boundary is not the isolation level. It is the dual-write problem. I covered that pattern in Transactional Outbox Pattern in Microservices.


A Concrete Example: Preventing Double Reservation

Suppose two customers try to reserve the last seat.

A naive flow looks like this:

  1. read current available seats
  2. if available > 0, insert reservation
  3. decrement available seats

Under concurrency, both requests can observe availability before either commits the decrement.

Possible fixes depend on the real invariant:

  • use Serializable and retry aborted transactions
  • lock the seat or inventory row with FOR UPDATE
  • model reservations with a uniqueness constraint that the database enforces
  • avoid storing a mutable counter as the single source of truth if reservations themselves define capacity

The best solution is not always "higher isolation." It is whichever design makes the invariant easiest to enforce under concurrent access.


How to Choose the Right Isolation Level

A practical selection process:

  1. define the business invariant clearly
  2. identify the exact read-check-write pattern involved
  3. test it with concurrent requests, not just sequential unit tests
  4. start with the weakest level that preserves correctness
  5. add locks, constraints, or retries where needed
  6. measure contention before assuming stronger isolation is affordable

This is also why integration tests matter. If you only test one request at a time, concurrency bugs remain invisible until production timing exposes them. I wrote more about that in How to Write API Integration Tests.


Production Debugging Checklist

When an isolation bug is suspected, check:

  • Which isolation level is actually configured in production?
  • Does the ORM open one transaction or several smaller ones?
  • Are reads and writes split across primary and replicas?
  • Is the bug caused by duplicate requests rather than concurrent transactions?
  • Are uniqueness constraints enforcing the invariant at the database level?
  • Are transaction retries implemented for serialization or deadlock failures?
  • Is the issue really a transaction bug, or a slow-query / lock-contention problem instead?

That last point matters. Sometimes the system is logically correct but operationally unstable because transactions are held open too long, query plans are poor, or contention cascades under load. For related debugging workflows, see How to Find and Fix Slow SQL Queries in Production and Why Database Indexes Didn’t Fix Your Slow Query.


The Practical Takeaway

SQL isolation levels are best understood as concurrency contracts, not as academic definitions to memorize.

Read Committed, Repeatable Read, and Serializable each make different promises. None of them removes the need to think about invariants, retries, locks, and duplicate work.

The strongest engineering move is not choosing the "most advanced" isolation level. It is matching the isolation level and surrounding safeguards to the exact failure you cannot afford.

That is what turns transactions from a checkbox into a correctness tool.