Database Concurrency Control: Locks, Isolation, and Deadlocks

Database concurrency control is the set of mechanisms that govern how a database management system coordinates simultaneous access to shared data across multiple transactions. This page covers the principal control models — locking protocols, isolation levels, and deadlock management — along with their formal definitions, operational tradeoffs, and classification boundaries. The topic is foundational to database transactions and ACID properties, directly affecting data integrity, throughput, and fault behavior in any system handling concurrent workloads.


Definition and scope

Concurrency control failures are directly measurable in production systems: the Transaction Processing Performance Council's TPC-C benchmark, a widely cited standard for OLTP workload characterization, structures its entire measurement methodology around concurrent transaction throughput because contention is a primary bottleneck at scale. When two or more transactions access overlapping data without coordination, the result is one of four canonical anomalies — dirty reads, non-repeatable reads, phantom reads, or lost updates — each of which can corrupt application state or violate business rules.

The scope of concurrency control spans every layer of the database stack. At the storage engine level, it governs granular access to pages and rows. At the transaction manager level, it enforces the Isolation property of the ACID model. At the application level, it determines what consistency guarantees developers can rely upon when designing transactional logic. The SQL standard (ISO/IEC 9075, most recently revised as SQL:2023) defines 4 named isolation levels that form the normative reference frame for the entire domain.

Concurrency control is closely related to database indexing (since index structures are themselves subject to concurrent modification), database query optimization (since lock contention affects query execution plans), and distributed database systems (where concurrency spans multiple nodes).


Core mechanics or structure

Locking Protocols

Locking is the most widely implemented concurrency control mechanism. A lock is a transactional marker that restricts how other transactions can access a data item. The 2 fundamental lock types are:

Two-Phase Locking (2PL), formalized in the seminal 1976 paper by Eswaran, Gray, Lorie, and Traiger published in Communications of the ACM, establishes the theoretical foundation: a transaction must acquire all locks before releasing any. This produces a growing phase and a shrinking phase, and it guarantees serializability — the gold standard of isolation. Strict 2PL extends this by holding all X-locks until commit, preventing dirty reads by other transactions.

Lock granularity defines the unit being locked: the 5 standard granularities are database, table, page, row, and cell. Finer granularity increases concurrency but raises the overhead of tracking large numbers of active locks. PostgreSQL implements row-level locking as its default; Oracle Database uses both row-level and table-level locks depending on the operation.

Intention locks (IS, IX, SIX) allow a transaction to signal intent to lock at a finer granularity, enabling the system to reject incompatible coarse-grained locks without scanning the entire lock table. IBM DB2 and Oracle Database both implement full intention-lock hierarchies.

Multiversion Concurrency Control (MVCC)

MVCC maintains multiple timestamped versions of each data item, allowing readers to access a consistent snapshot without acquiring read locks. Writers create new versions rather than overwriting existing ones. This eliminates read-write contention entirely for snapshot reads. PostgreSQL, Oracle Database, MySQL InnoDB, and Microsoft SQL Server (in snapshot isolation mode) all implement MVCC. The tradeoff is storage overhead from maintaining version chains and the need for a garbage collection process (called VACUUM in PostgreSQL) to reclaim obsolete versions.

Optimistic Concurrency Control (OCC)

OCC assumes that conflicts are rare. Transactions execute without acquiring locks, validate at commit time that no conflicting write has occurred, and abort-and-retry if validation fails. OCC performs well under low-contention workloads but degrades under high contention because the abort rate climbs. The NIST Database Technology Bulletin and foundational transaction-processing literature (Gray & Reuter, Transaction Processing: Concepts and Techniques, 1992) both distinguish OCC as a structurally distinct model from locking-based approaches.

Deadlock Detection and Resolution

A deadlock occurs when transaction T1 holds a lock that T2 needs, and T2 holds a lock that T1 needs, creating a circular wait. Database engines detect deadlocks by constructing a wait-for graph: nodes are active transactions; edges point from a waiting transaction to the transaction holding the needed lock. A cycle in this graph indicates a deadlock. Most RDBMS engines run a background deadlock detector at an interval between 100 milliseconds and 1 second. Upon detection, the engine selects a victim — typically the transaction with the lowest cost to roll back — and aborts it, allowing the cycle to break.

Deadlock prevention (as opposed to detection) uses techniques such as wait-die and wound-wait priority schemes, where older transactions are given priority and younger ones are either forced to wait or preemptively aborted.


Causal relationships or drivers

Concurrency anomalies arise from specific structural conditions, not from implementation defects:

Dirty reads occur when Transaction T2 reads data written by T1 before T1 commits. If T1 subsequently aborts, T2 has read data that never officially existed.

Non-repeatable reads occur when T2 reads the same row twice and obtains different values because T1 committed a modification between the two reads.

Phantom reads occur when T2 re-executes a range query and obtains a different set of rows because T1 inserted or deleted matching rows between the two executions. Preventing phantoms requires predicate locks or gap locks, not just row locks — a distinction missed by many locking implementations that only lock existing rows.

Lost updates occur when T1 and T2 both read a value, both modify it independently, and T2's write overwrites T1's write, erasing T1's change. This anomaly is particularly common in read-modify-write patterns executed without SELECT FOR UPDATE semantics.

The ANSI/ISO SQL standard's isolation level definitions are specifically constructed around preventing these 4 anomaly categories, as documented in the SQL:1992 standard and revisited critically in the 1995 paper "A Critique of ANSI SQL Isolation Levels" by Berenson et al. (ACM SIGMOD), which identified additional anomalies — notably write skew — that the original ANSI definitions fail to preclude.

Understanding these causal chains is essential for database performance tuning work, since lock wait times are a leading contributor to query latency in OLTP workloads. They also inform database schema design decisions — narrow transactions with predictable access patterns generate less contention than wide transactions touching many tables.


Classification boundaries

Pessimistic vs. Optimistic Models

Pessimistic control (locking-based) acquires coordination rights before executing operations. Optimistic control defers conflict detection to commit time. The boundary is the timing of conflict resolution: pre-execution versus post-execution.

Read-Write Symmetry

Some protocols treat reads and writes symmetrically (locking models), while MVCC treats them asymmetrically — readers never block writers and writers never block readers. This asymmetry is a structural architectural feature, not a configuration option.

Level of Isolation

The ISO SQL standard defines 4 isolation levels with explicit anomaly-prevention guarantees:

  1. Read Uncommitted — permits dirty reads; lowest isolation, highest concurrency.
  2. Read Committed — prevents dirty reads; default in PostgreSQL, Oracle Database, and SQL Server.
  3. Repeatable Read — prevents dirty and non-repeatable reads; default in MySQL InnoDB.
  4. Serializable — prevents all standard anomalies; equivalent to serial execution in terms of outcome.

Snapshot Isolation (SI), used by PostgreSQL's default "Repeatable Read" implementation and Oracle's default, sits between Repeatable Read and Serializable by preventing most anomalies but not write skew without additional predicate locking.

Granularity Classification

Lock granularity forms a containment hierarchy: database ⊃ table ⊃ page ⊃ row ⊃ cell. Escalation — when a transaction acquires enough row-level locks that the engine promotes them to a table-level lock — crosses granularity boundaries automatically in SQL Server and DB2.

These classification distinctions are directly relevant when evaluating NoSQL database systems, which frequently abandon SQL isolation semantics in favor of application-level consistency models (eventual consistency, causal consistency), and in the context of distributed database systems and the constraints documented in CAP theorem.


Tradeoffs and tensions

Isolation vs. Throughput: Higher isolation levels reduce anomaly risk but increase lock contention, raising latency and reducing transaction-per-second throughput. This tradeoff is the central operational tension in OLTP design. Systems processing financial transactions typically require Serializable or Snapshot Isolation; high-volume analytics reads often tolerate Read Committed. See OLTP vs. OLAP for the workload-level implications.

Lock Granularity vs. Overhead: Row-level locking maximizes concurrency but creates high memory and CPU overhead tracking thousands of simultaneous locks. Table-level locking is cheap to manage but serializes all access to a table. Page-level locking is a middle ground implemented in some storage engines.

MVCC Version Overhead vs. Read Performance: MVCC eliminates read-write lock contention but generates version chains that consume storage proportional to write volume and transaction duration. Long-running transactions in PostgreSQL prevent VACUUM from reclaiming dead tuples, causing table bloat — a real operational failure mode requiring active database monitoring and observability practices.

Deadlock Frequency vs. Retry Logic: Systems that use fine-grained locking suffer more deadlocks than those using coarser locks. Applications must implement retry logic on deadlock errors (SQL Server error 1205; PostgreSQL SQLSTATE 40P01), adding application complexity. Over-retry without backoff can create livelock conditions.

Optimistic Failure Rate vs. Latency: OCC delivers lower latency under low contention by eliminating lock acquisition overhead. Under high contention, the abort-and-retry cycle can consume more total resources than a pessimistic approach. The break-even point depends on conflict probability and transaction cost.

These tensions are directly relevant to database high availability architectures, where replication lag introduces distributed concurrency challenges beyond what single-node isolation levels address. The database connection pooling layer also intersects with concurrency control, since pool sizing affects the number of simultaneous transactions competing for locks.


Common misconceptions

Misconception 1: Serializable isolation prevents all data anomalies.
Serializable isolation prevents the anomalies defined in ISO SQL — dirty reads, non-repeatable reads, and phantoms. The 1995 Berenson et al. critique identified that ANSI's definition of Serializable does not formally preclude all conceivable anomalies in every implementation. True serializability requires that the execution history be equivalent to some serial schedule, which Strict 2PL and Serializable Snapshot Isolation (SSI, introduced in PostgreSQL 9.1) achieve rigorously. Not all vendor implementations labeled "Serializable" provide identical guarantees.

Misconception 2: MVCC eliminates the need for locks entirely.
MVCC eliminates read-write lock contention for snapshot reads. It does not eliminate write-write locking. When two concurrent writers target the same row, an exclusive lock is still required regardless of MVCC. DDL operations (ALTER TABLE, CREATE INDEX) also acquire table-level locks in most MVCC systems.

Misconception 3: Deadlocks indicate application bugs.
Deadlocks are a structural property of any concurrent locking system and can arise from correct, well-structured transactions accessing shared resources in different orders. They are expected system behavior managed by the engine, not exclusively a sign of defective application logic. However, consistent deadlock patterns on the same resource pairs do indicate access-order inconsistencies worth addressing.

Misconception 4: Higher isolation levels always mean safer applications.
An application relying on Read Committed isolation and implementing its own application-level consistency checks may be more correct than one relying on Serializable isolation but implementing retry logic incorrectly. Isolation level is one input to correctness; it does not substitute for correct transactional application design.

Misconception 5: Optimistic concurrency is always faster.
OCC is faster than pessimistic locking only when actual conflict rates are low — typically below 5–10% of transaction pairs. Above that threshold, the abort overhead exceeds the lock acquisition savings. Workloads with high write contention, such as inventory reservation or seat booking systems, are poor candidates for pure OCC.

For practitioners managing these tradeoffs daily, the database administrator role covers the professional responsibilities encompassing lock monitoring, isolation tuning, and deadlock response procedures.


Checklist or steps

Concurrency Control Configuration and Diagnosis Sequence

The following sequence represents the standard operational steps applied when assessing or configuring concurrency control in a production relational database. This is a reference of the process structure, not a prescriptive recommendation.

  1. Identify the workload profile — Classify transaction mix as read-heavy, write-heavy, or mixed. Determine whether the workload is OLTP, analytical, or hybrid. Confirm average transaction duration.

  2. Audit the current isolation level — Query the system catalog (e.g., SHOW TRANSACTION ISOLATION LEVEL in PostgreSQL; sys.databases in SQL Server) to confirm the default and any session-level overrides.

  3. Collect lock wait metrics — Use engine-native tools: pg_locks and pg_stat_activity in PostgreSQL; Dynamic Management Views (sys.dm_exec_requests, sys.dm_os_wait_stats) in SQL Server; SHOW ENGINE INNODB STATUS in MySQL InnoDB; Oracle's V$LOCK and V$SESSION views.

  4. Identify the top lock-wait pairs — Determine which transaction-resource pairs account for the highest cumulative wait time. These are the primary contention points.

  5. Examine the deadlock log — PostgreSQL: set log_lock_waits = on and deadlock_timeout; SQL Server: trace flag 1222 or Extended Events; MySQL: SHOW ENGINE INNODB STATUS deadlock section.

  6. Reconstruct the wait-for graph — Map the circular dependency for each recorded deadlock. Identify the access-order pattern causing the cycle.

  7. Assess granularity appropriateness — Determine whether lock escalation events are occurring (SQL Server: sys.dm_os_wait_stats for LCK_M_* waits). Evaluate whether row-level locks are being unnecessarily promoted to table locks.

  8. Evaluate MVCC version chain length — In PostgreSQL, query pg_stat_user_tables for n_dead_tup to detect table bloat from stale versions. Confirm autovacuum activity.

  9. Test isolation level adjustments — In a non-production environment, reproduce the target workload under the candidate isolation level and measure throughput, latency, and anomaly occurrence rate.

  10. Document the configuration baseline — Record the chosen isolation level, lock timeout settings, deadlock retry logic, and MVCC maintenance schedule in the database's operational runbook.

This sequence applies across PostgreSQL, SQL Server, Oracle Database, and MySQL InnoDB, with engine-specific tool substitutions at steps 3–5. It connects directly to database performance tuning workflows and informs database backup and recovery planning, since long-held locks can affect backup consistency windows.


Reference table or matrix

ISO SQL Isolation Levels: Anomaly Prevention Matrix

| Isolation Level | Dirty Read | Non-Repeatable Read | Phantom Read | Write Skew | Typical Default In |
|