
How to Prevent Race Conditions in Backend Systems
Race conditions in backend systems happen when two valid operations overlap and the final result depends on timing the system does not control.
The code often looks reasonable when read top to bottom. One request checks a row, decides the action is allowed, and writes the result. Another request does the same thing a few milliseconds later. Each request is locally correct. Together they create an impossible state: oversold inventory, duplicate orders, double-processed jobs, stale balances, or two users booked into the same slot.
Preventing race conditions is not mainly about adding more if statements. It is about naming the invariant that must survive concurrency, then enforcing that invariant at a boundary the system can trust: a database constraint, an atomic update, a row lock, an idempotency record, or a replay-safe workflow.
This article sits between API correctness and backend reliability. The API side sees overlapping requests and retries. The reliability side sees workers, queues, duplicate delivery, and partial failure. The same rule applies in both places: correctness should not depend on lucky timing.
Start With The Invariant
Do not start with the locking mechanism.
Start with the sentence that must remain true after any number of overlapping requests:
- stock must never go below zero
- one idempotency key must map to one logical order
- one calendar slot must have at most one confirmed booking
- one webhook event must produce one durable side effect
- an account balance update must not lose another committed update
- one background job claim must belong to one worker at a time
That sentence is the invariant. The race condition is whatever interleaving lets two actors violate it.
Once the invariant is clear, the tool choice becomes less mysterious. A uniqueness invariant often belongs in a unique constraint. A limited quantity can often be protected by an atomic conditional update. A shared row that needs one-at-a-time decision-making may need a row lock. A retried command needs idempotency. A repeated worker needs replay-safe state transitions.
Without the invariant, teams tend to debate tools in the abstract:
Should we use a transaction?
The better question is:
Which state must remain true when two actors try this at the same time?
A Concrete Race Condition: Overselling The Last Item
Suppose an endpoint lets a user buy a product.
The first implementation often looks like this:
async function purchaseProduct(productId: string, userId: string) {
const product = await db.product.findUnique({
where: { id: productId },
})
if (!product || product.stock <= 0) {
throw new Error('Out of stock')
}
await db.order.create({
data: { productId, userId },
})
await db.product.update({
where: { id: productId },
data: { stock: product.stock - 1 },
})
}
The handler reads clearly:
- load product
- reject if stock is empty
- create order
- decrement stock
The bug appears because those steps are not one indivisible decision.
Here is the interleaving:
Request A Database Request B
--------- -------- ---------
read stock = 1
read stock = 1
decide purchase is valid
decide purchase is valid
create order A
create order B
write stock = 0
write stock = 0
Both requests observed a true statement: stock > 0.
The statement stopped being safe once two requests could observe it before either write made the decision durable.
That is the core race:
The application checked the invariant in one step and enforced it in a later step.
The fix is not to hope the gap is small. The fix is to remove the gap or move the enforcement to a boundary that handles concurrency.
Why Transactions Alone May Not Fix It
A common first response is:
Put the code in a transaction.
That may be necessary. It is not automatically sufficient.
A transaction groups a set of operations. Whether it prevents the race depends on what the transaction reads, what it writes, which isolation level is used, which locks are acquired, and whether the database has a constraint or conditional write that rejects the unsafe outcome.
If two transactions both read stock = 1, both decide locally, and then both write based on stale application state, the business invariant can still be wrong unless the database update itself prevents it or one transaction is forced to wait/retry.
This is why the useful review question is not:
Is this inside a transaction?
It is:
Can two concurrent transactions both commit while violating the invariant?
If the answer is yes, the race is still present.
For deeper background on isolation behavior, see SQL Isolation Levels Explained.
Use Atomic Conditional Updates For Counters And Capacity
For inventory, capacity, quotas, and counters, an atomic conditional update is often stronger than a read-check-write sequence.
Instead of reading stock into application memory and writing the calculated value later:
UPDATE products
SET stock = stock - 1
WHERE id = $1
AND stock > 0;
Then check how many rows were affected.
If the update changed one row, the purchase reserved stock. If it changed zero rows, the invariant blocked the purchase because stock was already empty or the row did not exist.
The important detail is that the check and the update happen in the same database statement. The application no longer has to trust that the value it read earlier is still true.
For a purchase flow, the shape becomes:
const reserved = await db.execute(sql`
UPDATE products
SET stock = stock - 1
WHERE id = ${productId}
AND stock > 0
`)
if (reserved.rowCount !== 1) {
throw new Error('Out of stock')
}
await db.order.create({
data: { productId, userId },
})
In a real system, the order insert and stock reservation still need transaction design, failure handling, and possibly a reservation record. The point is the invariant: the database must be able to reject "stock below zero" under overlap.
This pattern works best when the invariant can be expressed as "update only if the current row still satisfies this condition."
Use Constraints For Uniqueness And Impossible States
If the invariant is "there must not be two of these," the database should usually enforce it.
PostgreSQL's documentation describes constraints as rules enforced by the database, including unique constraints that require values, or combinations of values, to be unique across a table. That is exactly the kind of guarantee application-only checks struggle to maintain under concurrency. See the PostgreSQL docs on constraints.
For example, one active subscription per customer and billing period can be represented with a unique key:
CREATE UNIQUE INDEX subscriptions_one_active_period
ON subscriptions (customer_id, billing_period)
WHERE status = 'active';
Now two concurrent requests can both try to create the active subscription, but they cannot both commit the same unique state.
The application still needs to handle the conflict gracefully:
try {
await subscriptions.createActive({ customerId, billingPeriod })
} catch (error) {
if (isUniqueViolation(error, 'subscriptions_one_active_period')) {
return subscriptions.findActive({ customerId, billingPeriod })
}
throw error
}
That is very different from checking first:
const existing = await subscriptions.findActive({ customerId, billingPeriod })
if (!existing) {
await subscriptions.createActive({ customerId, billingPeriod })
}
The second version can race because both requests can observe "no active subscription" before either insert commits. The first version lets the database enforce the invariant and makes the application decide how to respond when another actor got there first.
Constraints are not just data-model neatness. They are concurrency controls for business facts that must remain true.
Use Idempotency For Retried Commands
Many backend races are not caused by two different users. They are caused by one logical action being attempted more than once.
Common sources:
- client retries after a timeout
- a user double-clicks a submit button
- a proxy retries an HTTP request
- a worker crashes after a side effect but before acknowledgment
- a webhook provider redelivers the same event
If the same logical action can arrive more than once, the system needs an idempotency boundary.
For an API request, that usually means storing an idempotency key with enough information to decide:
- is this the same logical request?
- is it already complete?
- is it still in progress?
- should the response be replayed, rejected, or resumed?
That design is covered in detail in API Idempotency Keys: Prevent Duplicate Requests Safely.
The race to avoid is subtle:
Request A inserts order
Request B with same key arrives before Request A stores the idempotency result
Request B sees no completed result
Request B inserts another order
The idempotency record has to be reserved before the unsafe side effect, or protected by a unique key that prevents two requests from owning the same logical action.
Do not treat idempotency as a cache. Treat it as a concurrency boundary.
Use Optimistic Locking When Conflicts Should Be Detected
Optimistic locking is useful when conflicts are possible but not constant, and when the right behavior is to detect a stale update rather than make everyone wait.
The usual pattern is a version column:
UPDATE accounts
SET display_name = $1,
version = version + 1
WHERE id = $2
AND version = $3;
If the update affects zero rows, someone else changed the record after the caller read it.
That gives the application a clear choice:
- reload and retry automatically
- show a conflict to the user
- merge the change if the fields do not overlap
- reject the stale command
Optimistic locking is a good fit for profile updates, document edits, settings screens, admin workflows, and other cases where conflicts should be visible.
It is weaker for strict "only one actor may proceed" workflows if the application silently retries without preserving the invariant. A blind retry can turn conflict detection back into last-write-wins behavior.
Use Row Locks When One Actor Must Decide First
Sometimes the simplest correct model is:
one transaction gets to inspect and change this row while competing writers wait.
PostgreSQL's row-level locking docs explain that SELECT ... FOR UPDATE locks selected rows as though they were going to be updated, blocking other transactions that try to update, delete, or lock those same rows until the current transaction ends. See PostgreSQL explicit locking.
The shape looks like this:
BEGIN;
SELECT id, status, available_seats
FROM events
WHERE id = $1
FOR UPDATE;
-- application decides while holding the row lock
UPDATE events
SET available_seats = available_seats - 1
WHERE id = $1
AND available_seats > 0;
INSERT INTO bookings (event_id, user_id)
VALUES ($1, $2);
COMMIT;
This can be the right tool when the decision depends on several fields, when the row represents a shared state machine, or when allowing two actors to proceed would be expensive.
The trade-off is blocking. Locks increase latency under contention and can create deadlocks if different code paths acquire locks in different orders.
Use row locks deliberately:
- keep transactions short
- acquire locks in a consistent order
- avoid network calls while holding the lock
- handle lock timeouts and deadlock retries
- measure contention after deploy
Locks are a tool for enforcing order. They are not a license to put a long workflow inside one transaction.
Make Background Jobs Replay-Safe
Race conditions often move from the request path into workers.
A queue makes the user-facing endpoint faster, but it also introduces new timing:
- two workers may try to claim related work
- a job may run after state changed
- a retry may repeat a side effect
- a worker may crash after the side effect but before recording completion
- a dead-letter replay may run old assumptions against new code
The protection is the same idea: name the invariant and enforce it durably.
For a job that sends an invoice email, the invariant might be:
one invoice should have at most one successful delivery event for this email type.
That can be protected with a delivery table:
CREATE TABLE invoice_email_deliveries (
invoice_id uuid NOT NULL,
email_type text NOT NULL,
provider_message_id text,
sent_at timestamptz,
PRIMARY KEY (invoice_id, email_type)
);
Now a retried worker has somewhere durable to check and record delivery state. It still has to handle ambiguous provider outcomes, but the workflow has a business key that prevents "try again" from meaning "send again without memory."
The same thinking applies to job claiming, webhook processing, exports, billing runs, and reconciliation jobs. A reliable worker is not one that never retries. It is one that can retry without losing the business invariant.
Choose The Protection By Failure Mode
Different races need different boundaries.
| Failure mode | Typical invariant | Strong starting point |
|---|---|---|
| Last item can be sold twice | Stock must not drop below zero | Atomic conditional update |
| Duplicate order after retry | One key maps to one logical order | Idempotency record with unique key |
| Two active subscriptions | One active row per scope | Unique or partial unique index |
| Stale profile update overwrites newer data | Caller must not write over a newer version | Optimistic locking |
| Booking decision depends on shared row state | One actor decides from current state | Row lock with short transaction |
| Worker repeats side effect | One business event produces one durable outcome | Replay-safe state table |
| Webhook arrives twice | One provider event produces one effect | Durable deduplication key |
The important pattern is that every row in the table begins with the invariant, not the technology.
If a team cannot state the invariant clearly, it probably cannot test the race clearly either.
Test The Overlap, Not Just The Happy Path
Race-condition tests need to make operations overlap.
For an API endpoint, a useful test often looks like this:
const [first, second] = await Promise.all([
request(app).post('/orders').send({ productId }),
request(app).post('/orders').send({ productId }),
])
expect([201, 409]).toContain(first.status)
expect([201, 409]).toContain(second.status)
const orders = await db.order.findMany({ where: { productId } })
const product = await db.product.findUnique({ where: { id: productId } })
expect(orders).toHaveLength(1)
expect(product?.stock).toBe(0)
The exact status codes depend on the API. The invariant does not:
two overlapping purchase attempts for one remaining item must not create two orders.
Good concurrency tests usually assert persistent state after all requests finish. A response-only assertion can miss the broken database state left behind.
For API boundaries, pair this with the broader strategy in How to Write API Integration Tests: test through the real request path, use the real persistence mechanism, and assert the behavior users and downstream systems depend on.
What To Capture During A Race Incident
When production suggests a race condition, collect evidence that reconstructs the interleaving.
Useful evidence includes:
- shared business key: product ID, booking ID, idempotency key, invoice ID
- request IDs and worker IDs
- timestamps for reads, writes, retries, and side effects
- transaction or job attempt numbers
- rows created or updated by each actor
- final database state
- deploy, feature-flag, and retry-policy changes around the first occurrence
The key question is:
Which two actors both believed they were allowed to proceed, and which boundary failed to stop the second one?
That framing keeps the investigation away from "the code is flaky" and toward the exact invariant that needs enforcement.
Review Checklist For Race-Condition Risk
Use this during code review when a change touches shared state, retries, jobs, or external side effects.
- What invariant must remain true under overlap?
- Can two requests, workers, or webhook deliveries reach this path at the same time?
- Is the invariant enforced by the database, or only by application reads?
- Is there a read-check-write gap?
- What happens if the first attempt succeeds but the response times out?
- Can a retry repeat the side effect?
- Does the test make operations overlap?
- Does the test assert durable state after both attempts finish?
- Are locks acquired in a consistent order?
- Is any network call happening while a database transaction stays open?
This checklist catches a large share of backend race conditions before production supplies the unlucky timing.
Final Takeaway
Race conditions are not rare accidents in backend systems. They are a normal consequence of shared state, retries, duplicate delivery, background work, and external side effects.
The durable fix is not "be more careful." It is to make the important invariant independent of timing.
Name the invariant. Enforce it at a durable boundary. Choose the smallest mechanism that protects the failure mode. Then test the overlap directly.
That is how backend systems stay correct when production stops running one request at a time.