top of page

Fundamentals of DBMS

Database systems are used for a variety of use cases

  • Temporary hot data storage.

  • Long-lived cold storage.

  • Complex analytical operations

  • Access value by key

  • Time-series data storage

  • Storage large blob

  • Graph-based data

There are different ways to group the databases, one of the popular way group databases into 3 categories

  • OLTP (Online Transaction Processing) databases: Handles a large number of user-facing requests and transactions. Queries are often short-lived.

  • OLAP (Online Analytical Processing) databases: Handles complex aggregations. Used for analytics queries.

  • HTAP (Hybrid Transactional & Analytical Processing) databases

Another way of categorization is

  • Key-value stores

  • Relational databases

  • Document oriented stores

  • Graph databases

DBMS architecture

DBMS architecture is based on a client-server model

  • Nodes are server

  • Applications are client

Transport layer

It is responsible for

  • Communication across the nodes in the cluster.

  • Communication between client to the database server.

Query processor

  • Query Parser: Parse and validate the query and perform an access control check.

  • Query Optimizer: Optimizes the query and generates the execution plan.

Execution Engine

  • For remote execution send the query to the remote nodes and collect results.

  • For local execution send the execution plan to the storage engine to execute.

Storage Engine

A storage engine is responsible for storing and access of the data.

It has the following components:

  • Transaction Manager: Manages the transaction and makes sure the database is always in a consistent state.

  • Lock Manager: Lock the database objects and make sure database integrity.

  • Access Method: Manage access and organization of the data on the disk.

  • Buffer management: Cache data pages in the memory

  • Recovery Manager: Responsible for recovering database from recovery from crash.

Memory vs Disk-based DBMS

Database system stores the data in memory as well as on the disk.

  • In-memory databases: Store data primarily in memory and use disk for logging and recovery.

  • Disk-based databases: Store data primarily on disk and use memory for caching purposes.

Durability in Datastores

Durability: Once data is stored in the database it shouldn’t be lost.

In-memory databases maintain the backup on the disk to provide durability guarantees and prevent the loss of the volatile data.

Whenever the database receives an operation to process, it first stores it in the log, eg. WAL logs in B-Tree based databases.

To avoid a complete replay of logs in case of failure or crash, the database also creates a backup copy.

Logs are applied to backup copy in batches. After the batch of logs is processed, the database marks this point as a snapshot, and logs up to this point can be discarded. This process is called checkpointing.

Column Vs Row oriented database

Most of the data systems store data as a set of records, which consists of rows and columns in a table.

  • The field is the intersection of the column.

  • A row represents an entity with a set of properties(Fields)

There are two ways of storing the table in the disk:

  • Horizontally: Store values belonging to the same row together.

  • Vertically: Store values belonging to the same column together.

Row oriented databases are MYSQL, PostgreSQL, SQLServer, etc.

Column-oriented databases are MonetDB and C-Store.

Row oriented data layout

Row oriented databases store data in rows.

This approach is better in scenarios where you want to access a row (entity) based on the keys. In such scenarios storing rows together improves spatial locality.

Column-oriented data layout

Column-oriented datastores partition data vertically (by column). Values for the same column are stored contiguously on disk.

Column-oriented databases are a good fit for analytical purposes.

Column-oriented datastores are Apache Kudu, ClickHouse, etc.

Data files and Index Files

The primary goal of the database is to store the data and allow to access it.

Why database don’t store the data in plain files?

Database store data in specialized files because

  • Storage Efficiency: The specialized files organizes data in an optimized way.

  • Access Efficiency: Data can be accessed at the smallest possible time.

  • Update efficiency: Update should be optimized to minimize the number of changes on the disk.

In the database, data is stored in the tables, and each table is stored in a separate file.

Lookup on data is done via a search key.

Databases use an additional data structure known as an index, which helps in locating data efficiently without scanning the entire table.

Data files store the data and Index files store the records metadata and use it to locate the record.

Files are partitioned into pages, which typically have the size of a single or multiple disk blocks.

Data files

Following are different ways to implement the data file:

  • Index organized tables

  • Heap organized files

  • Hash organized files

Index files

An index on the primary data file is called the primary index.

If the order of the data follows the order of the search key, then the index is known as the clustered index.

If the order of data doesn’t follow the order of keys, then the index is known as an unclustered index.

Source: Medium

The Tech Platform



bottom of page