Database Systems Glossary: Key Terms and Definitions

The database systems field operates across a dense technical vocabulary that shapes professional communication, system design decisions, regulatory compliance documentation, and procurement language. This page defines the foundational and advanced terms used across relational, NoSQL, distributed, and cloud database architectures. The definitions reflect established usage from standards bodies including ISO, ANSI, and NIST, as well as widely adopted practitioner conventions. Professionals consulting the broader database systems reference at /index will encounter these terms throughout every functional area of the discipline.


Definition and scope

A database glossary in the technology sector covers the formal and operational definitions of terms spanning data modeling, query execution, storage architecture, transaction management, security, and administration. The scope of database terminology divides into at least 8 structural domains: schema and modeling, query languages, transaction and concurrency control, storage and indexing, distribution and replication, security and compliance, administration and operations, and platform-specific nomenclature.

Authoritative definition sources include the ISO/IEC 9075 standard for SQL, NIST SP 800-111 for storage encryption guidance applicable to database systems, and the ANSI X3.135 standard that originally codified relational query language. The ACM Computing Classification System (ACM CCS) provides a hierarchical taxonomy under which database management systems, query processing, and data models are each independently classified.

Key term categories mapped within this glossary include:

  1. Data modeling terms — entity, attribute, relationship, cardinality, schema, instance, normalization form
  2. Query language terms — SQL, DML, DDL, DCL, TCL, predicate, join type, subquery
  3. Transaction and concurrency terms — ACID properties, isolation level, deadlock, lock granularity, MVCC
  4. Storage and performance terms — index type, execution plan, buffer pool, page, tablespace, query optimizer
  5. Distribution terms — replication, sharding, partitioning, CAP theorem, eventual consistency, quorum
  6. Security terms — role-based access control, encryption at rest, encryption in transit, audit log, privilege escalation
  7. Administration terms — backup, recovery point objective (RPO), recovery time objective (RTO), failover, high availability
  8. Architectural pattern terms — OLTP, OLAP, data warehouse, data lake, federated database, multi-model database

The relational model — which underlies platforms including PostgreSQL, Oracle Database, and Microsoft SQL Server — uses ISO-standardized terminology most consistently. NoSQL database systems introduced an extended vocabulary covering document stores, key-value stores, column-family stores, and graph databases, each with distinct architectural semantics.


How it works

Database terminology functions as a precision layer: a single ambiguous term in a schema design discussion, service-level agreement, or compliance audit can produce divergent implementations across teams. The following definitions represent the most operationally consequential terms across the discipline.

ACID — Atomicity, Consistency, Isolation, Durability. The four properties that guarantee reliable processing of database transactions. Defined rigorously in Jim Gray and Andreas Reuter's Transaction Processing: Concepts and Techniques (Morgan Kaufmann, 1992), the ACID model underpins every major relational database engine.

Schema — The formal structure definition of a database, specifying tables, columns, data types, constraints, and relationships. Schema design decisions have long-term performance and maintainability consequences; database schema design is a distinct professional discipline.

Normalization — The process of organizing relational tables to reduce data redundancy and improve integrity, following defined normal forms (1NF through BCNF and beyond). Normalization and denormalization are inversely applied depending on read vs. write optimization requirements.

Index — A data structure that improves the speed of data retrieval operations at the cost of additional storage and write overhead. Common index types include B-tree, hash, GiST, and bitmap indexes. Database indexing strategy is one of the primary levers in database query optimization.

CAP Theorem — A theorem formalized by Eric Brewer in 2000, stating that a distributed data system can guarantee at most 2 of 3 properties: Consistency, Availability, and Partition tolerance. CAP theorem governs architectural tradeoffs in distributed database systems.

MVCC (Multiversion Concurrency Control) — A concurrency management technique that maintains multiple versions of data records so readers do not block writers. PostgreSQL, Oracle, and MySQL InnoDB all implement MVCC variants. Database concurrency control encompasses MVCC alongside locking-based approaches.

RPO / RTO — Recovery Point Objective and Recovery Time Objective. RPO defines the maximum acceptable data loss measured in time; RTO defines the maximum acceptable downtime. These metrics drive database backup and recovery architecture and database disaster recovery planning.

Sharding — Horizontal partitioning of data across 2 or more independent database nodes to distribute load and scale storage. Database sharding differs from database replication, which copies data for redundancy rather than partitioning it for distribution.


Common scenarios

Database terminology confusion most frequently surfaces in 4 operational contexts:

Vendor documentation misalignment — The term "schema" means a namespace-level object in Oracle and SQL Server but refers to the entire structural definition of a database in PostgreSQL and common academic usage. Teams migrating across platforms via database migration must explicitly reconcile these semantic differences.

Compliance documentationDatabase auditing and compliance frameworks, including those derived from NIST SP 800-53 (Rev. 5), use terms like "least privilege," "audit trail," and "data at rest encryption" with specific technical meanings that must align precisely with implemented database security and access control mechanisms.

Performance tuning discussions — Conflating "query optimization" (cost-based plan selection by the database engine) with "performance tuning" (the broader operational activity of improving system throughput) leads to misdirected engineering effort. Database performance tuning and database monitoring and observability are related but structurally distinct activities.

Architecture selection — The distinction between OLTP vs. OLAP workloads, between in-memory databases and disk-backed systems, and between columnar databases and row-oriented stores determines which platform is appropriate for a given use case. Misapplying these terms at the architecture stage produces systems that fail to meet throughput or latency requirements.


Decision boundaries

Selecting the correct term — and understanding where definitional boundaries exist — determines precision in system documentation, RFP language, regulatory submissions, and architectural records.

Relational vs. non-relational scope — Terms like "table," "row," "column," and "foreign key" are specific to the relational model. Relational database systems use this vocabulary exclusively. Document databases, graph databases, and time-series databases each employ distinct structural vocabularies (collections/documents, nodes/edges, measurements/tags respectively) that do not map 1-to-1 onto relational terminology.

Logical vs. physical distinction — "View," "table," and "index" each have both a logical definition (what they represent to the query layer) and a physical implementation (how they are stored on disk or in memory). Database views are logical constructs; materialized views carry physical storage implications. This distinction is critical in database version control and database change data capture contexts.

Consistency models — "Consistency" in the ACID context (data is always in a valid state after a transaction) differs from "consistency" in the CAP theorem context (all nodes see the same data at the same time). These homonyms generate the most persistent misunderstanding in distributed systems design, particularly when evaluating NewSQL databases that claim both relational consistency and horizontal scalability.

Cloud and managed service terminology — Cloud database services and database-as-a-service (DBaaS) providers introduce proprietary terminology that overlaps imprecisely with vendor-neutral standards. AWS Aurora's "cluster endpoints," Google Spanner's "interleaved tables," and Azure SQL's "elastic pools" each represent platform-specific implementations of concepts (replication topology, co-located storage, resource sharing) that have neutral equivalents in the base vocabulary. Practitioners evaluating popular database platforms compared must anchor platform-specific terms to their standards-based equivalents to enable objective comparison.

The database administrator role and database developer role each emphasize different subsets of this vocabulary in daily practice, and database certifications from vendors and standards bodies test against defined term sets that reflect these role boundaries.


References