Fundamentals of DBMS

The Tech Platform
May 14, 2021
3 min read

Database systems are used for a variety of use cases

Temporary hot data storage.
Long-lived cold storage.
Complex analytical operations
Access value by key
Time-series data storage
Storage large blob
Graph-based data

There are different ways to group the databases, one of the popular way group databases into 3 categories

OLTP (Online Transaction Processing) databases: Handles a large number of user-facing requests and transactions. Queries are often short-lived.
OLAP (Online Analytical Processing) databases: Handles complex aggregations. Used for analytics queries.
HTAP (Hybrid Transactional & Analytical Processing) databases

Another way of categorization is

Key-value stores
Relational databases
Document oriented stores
Graph databases

DBMS architecture

DBMS architecture is based on a client-server model

Nodes are server
Applications are client

Transport layer

It is responsible for

Communication across the nodes in the cluster.
Communication between client to the database server.

Query processor

Query Parser: Parse and validate the query and perform an access control check.
Query Optimizer: Optimizes the query and generates the execution plan.

Execution Engine

For remote execution send the query to the remote nodes and collect results.
For local execution send the execution plan to the storage engine to execute.

Storage Engine

A storage engine is responsible for storing and access of the data.

It has the following components:

Transaction Manager: Manages the transaction and makes sure the database is always in a consistent state.
Lock Manager: Lock the database objects and make sure database integrity.
Access Method: Manage access and organization of the data on the disk.
Buffer management: Cache data pages in the memory
Recovery Manager: Responsible for recovering database from recovery from crash.

Memory vs Disk-based DBMS

Database system stores the data in memory as well as on the disk.

In-memory databases: Store data primarily in memory and use disk for logging and recovery.
Disk-based databases: Store data primarily on disk and use memory for caching purposes.

Durability in Datastores

Durability: Once data is stored in the database it shouldn’t be lost.

In-memory databases maintain the backup on the disk to provide durability guarantees and prevent the loss of the volatile data.

Whenever the database receives an operation to process, it first stores it in the log, eg. WAL logs in B-Tree based databases.

To avoid a complete replay of logs in case of failure or crash, the database also creates a backup copy.

Logs are applied to backup copy in batches. After the batch of logs is processed, the database marks this point as a snapshot, and logs up to this point can be discarded. This process is called checkpointing.

Column Vs Row oriented database

Most of the data systems store data as a set of records, which consists of rows and columns in a table.

The field is the intersection of the column.
A row represents an entity with a set of properties(Fields)

There are two ways of storing the table in the disk:

Horizontally: Store values belonging to the same row together.
Vertically: Store values belonging to the same column together.

Row oriented databases are MYSQL, PostgreSQL, SQLServer, etc.

Column-oriented databases are MonetDB and C-Store.

Row oriented data layout

Row oriented databases store data in rows.

This approach is better in scenarios where you want to access a row (entity) based on the keys. In such scenarios storing rows together improves spatial locality.

Column-oriented data layout

Column-oriented datastores partition data vertically (by column). Values for the same column are stored contiguously on disk.

Column-oriented databases are a good fit for analytical purposes.

Column-oriented datastores are Apache Kudu, ClickHouse, etc.

Data files and Index Files

The primary goal of the database is to store the data and allow to access it.

Why database don’t store the data in plain files?

Database store data in specialized files because

Storage Efficiency: The specialized files organizes data in an optimized way.
Access Efficiency: Data can be accessed at the smallest possible time.
Update efficiency: Update should be optimized to minimize the number of changes on the disk.

In the database, data is stored in the tables, and each table is stored in a separate file.

Lookup on data is done via a search key.

Databases use an additional data structure known as an index, which helps in locating data efficiently without scanning the entire table.

Data files store the data and Index files store the records metadata and use it to locate the record.

Files are partitioned into pages, which typically have the size of a single or multiple disk blocks.

Data files

Following are different ways to implement the data file: