top of page

Introduction to Database Design.

Database Design is a collection of processes that facilitate the designing, development, implementation and maintenance of enterprise data management systems. Properly designed database are easy to maintain, improves data consistency and are cost effective in terms of disk storage space. The database designer decides how the data elements correlate and what data must be stored.


The main objectives of database design in DBMS are to produce logical and physical designs models of the proposed database system.


The logical model concentrates on the data requirements and the data to be stored independent of physical considerations. It does not concern itself with how the data will be stored or where it will be stored physically.


The physical data design model involves translating the logical DB design of the database onto physical media using hardware resources and software systems such as database management systems (DBMS).



Database Design Life Cycle


Although, the life cycle of a database is not an important discussion that has to be taken forward in this article because we are focused on the database design. But, before jumping directly on the designing models constituting database design it is important to understand the overall workflow and life-cycle of the database.


Requirement Analysis

First of all, the planning has to be done on what are the basic requirements of the project under which the design of the database has to be taken forward. Thus, they can be defined as:-

  • Planning - This stage is concerned with planning the entire DDLC (Database Development Life Cycle). The strategic considerations are taken into account before proceeding.

  • System definition - This stage covers the boundaries and scopes of the proper database after planning.


Database Designing

The next step involves designing the database considering the user-based requirements and splitting them out into various models so that load or heavy dependencies on a single aspect are not imposed. Therefore, there has been some model-centric approach and that's where logical and physical models play a crucial role.

  • Physical Model - The physical model is concerned with the practices and implementations of the logical model.

  • Logical Model - This stage is primarily concerned with developing a model based on the proposed requirements. The entire model is designed on paper without any implementation or adopting DBMS considerations.


Implementation

The last step covers the implementation methods and checking out the behavior that matches our requirements. It is ensured with continuous integration testing of the database with different data sets and conversion of data into machine understandable language. The manipulation of data is primarily focused on these steps where queries are made to run and check if the application is designed satisfactorily or not.

  • Data conversion and loading - This section is used to import and convert data from the old to the new system.

  • Testing - This stage is concerned with error identification in the newly implemented system. Testing is a crucial step because it checks the database directly and compares the requirement specifications.



Database Design Models

There are many kinds of data models. Some of the most common ones include:

  • Hierarchical database model

  • Relational model

  • Network model

  • Object-oriented database model

  • Entity-relationship model

  • Document model

  • Star schema


1. Relational model

The most common model, the relational model sorts data into tables, also known as relations, each of which consists of columns and rows. Each column lists an attribute of the entity in question, such as price, zip code, or birth date. Together, the attributes in a relation are called a domain. A particular attribute or combination of attributes is chosen as a primary key that can be referred to in other tables, when it’s called a foreign key.

Each row, also called a tuple, includes data about a specific instance of the entity in question, such as a particular employee.

The model also accounts for the types of relationships between those tables, including one-to-one, one-to-many, and many-to-many relationships. Here’s an example:


Within the database, tables can be normalized, or brought to comply with normalization rules that make the database flexible, adaptable, and scalable. When normalized, each piece of data is atomic, or broken into the smallest useful pieces.

Relational databases are typically written in Structured Query Language (SQL). The model was introduced by E.F. Codd in 1970.


Advantage

  • Relational model is one of the most popular used database model.

  • In relational model, changes in the database structure do not affect the data access.

  • The revision of any information as tables consisting of rows and columns is much easier to understand.

  • The relational database supports both data independence and structure independence concept which makes the database design, maintenance, administration and usage much easier than the other models.

  • In this we can write complex query to accesses or modify the data from database.

  • It is easier to maintain security as compare to other models.


Disadvantages

  • Mapping of objects in relational database is very difficult.

  • Object oriented paradigm is missing in relation model.

  • Data Integrity is difficult to ensure with Relational database.

  • Relational Model is not suitable for huge database but suitable for small database.

  • Hardware overheads are incurred which make it costly.

  • Ease of design can lead to bad design.

  • Relational database system hides the implementation complexities and the physical data storage details from the users.


2. Hierarchical model

The hierarchical model organizes data into a tree-like structure, where each record has a single parent or root. Sibling records are sorted in a particular order. That order is used as the physical order for storing the database. This model is good for describing many real-world relationships.

This model was primarily used by IBM’s Information Management Systems in the 60s and 70s, but they are rarely seen today due to certain operational inefficiencies.


Advantages:

The model allows us easy addition and deletion of new information.

  • Data at the top of the Hierarchy is very fast to access.

  • It worked well with linear data storage mediums such as tapes.

  • It relates well to anything that works through a one to many relationships.


Disadvantages

  • It requires data to be repetitively stored in many different entities.

  • Now a day there is no longer use of linear data storage mediums such as tapes.

  • Searching for data requires the DBMS to run through the entire model from top to bottom until the required information is found, making queries very slow.

  • This model support only one to many relationships, many to many relationships are not supported.


3. Network model

The network model builds on the hierarchical model by allowing many-to-many relationships between linked records, implying multiple parent records. Based on mathematical set theory, the model is constructed with sets of related records. Each set consists of one owner or parent record and one or more member or child records. A record can be a member or child in multiple sets, allowing this model to convey complex relationships.

It was most popular in the 70s after it was formally defined by the Conference on Data Systems Languages (CODASYL).


Advantage

  • The network model is conceptually simple and easy to design.

  • The network model can represent redundancy in data more effectively than in the hierarchical model.

  • The network model can handle the one to many and many to many relationships which is real help in modelling the real-life situations.

  • The data access is easier and flexible than the hierarchical model.

  • The network model is better than the hierarchical model in isolating the programs from the complex physical storage details.


Disadvantage:

  • All the records are maintained using pointers and hence the whole database structure becomes very complex.

  • The insertion, deletion and updating operations of any record require the large number of pointers adjustments.

  • The structural changes to the database is very difficult.

4. Object-oriented database model

This model defines a database as a collection of objects, or reusable software elements, with associated features and methods. There are several kinds of object-oriented databases:

  • A multimedia database incorporates media, such as images, that could not be stored in a relational database.

  • A hypertext database allows any object to link to any other object. It’s useful for organizing lots of disparate data, but it’s not ideal for numerical analysis.

The object-oriented database model is the best known post-relational database model, since it incorporates tables, but isn’t limited to tables. Such models are also known as hybrid database models.

Advantages

  • Object database can handle different types of data while relational data base handles a single data. Unlike traditional databases like hierarchical, network or relational, the object-oriented databases can handle the different types of data, for example, pictures, voice video, including text, numbers and so on.

  • Object-oriented databases provide us code reusability, real world modelling, and improved reliability and flexibility.

  • The object-oriented database is having low maintenance costs as compared to other model because most of the tasks within the system are encapsulated, they may be reused and incorporated into new tasks.


Disadvantages

  • There is no universally defined data model for an OODBMS, and most models lack a theoretical foundation.

  • In comparison to RDBMSs the use of OODBMS is still relatively limited.

  • There is a Lack of support for security in OODBMSs that do not provide adequate security mechanisms.

  • The system more complex than that of traditional DBMSs.

Object-relational model

This hybrid database model combines the simplicity of the relational model with some of the advanced functionality of the object-oriented database model. In essence, it allows designers to incorporate objects into the familiar table structure.

Languages and call interfaces include SQL3, vendor languages, ODBC, JDBC, and proprietary call interfaces that are extensions of the languages and interfaces used by the relational model.


5. Entity-relationship model

This model captures the relationships between real-world entities much like the network model, but it isn’t as directly tied to the physical structure of the database. Instead, it’s often used for designing a database conceptually.

Here, the people, places, and things about which data points are stored are referred to as entities, each of which has certain attributes that together make up their domain. The cardinality, or relationships between entities, are mapped as well.

A common form of the ER diagram is the star schema, in which a central fact table connects to multiple dimensional tables.


Advantages:

  • ER model is very simple because if we know relationship between entities and attributes, then we can easily draw an ER diagram.

  • ER model is a diagrammatic representation of any logical structure of database. By seeing ER diagram, we can easily understand relationship among entities and relationship.

  • It is an effective communication tool for database designer.

  • ER model can be easily converted into relational model by simply converting ER model into tables.

  • ER model can be easily converted into another data model like hierarchical data model, network data model and so on.


Disadvantages

  • Limited constraints and specification

  • Some information be lost or hidden in ER model

  • ER model represents limited relationship as compared to another data models like relational model etc.

  • It is difficult to show data manipulation in ER model.

  • ER model is very popular for designing high level design




6. Document Database Model

A document database is a type of NoSQL database which stores data as JSON documents instead of columns and rows. JSON is a native language used to both store and query data. These documents can be grouped together into collections to form database systems.



Each document consists of a number of key-value pairs. Here is an example of a document that consists of 4 key value pairs:

{
"ID" : "001",
"Book" : "Java: The Complete Reference",
"Genre" : "Reference work",
"Author" : "Herbert Schildt",
}

Using JSON enables app developers to store and query data in the same document-model format that they use to organize their app’s code. The object model can be converted into other formats, such as JSON, BSON and XML.


Advantages

  • Schema-less. There are no restrictions in the format and structure of data storage. This is good for retaining existing data at massive volumes and different structural states, especially in a continuously transforming system.

  • Faster creation and care. Minimal maintenance is required once you create the document, which can be as simple as adding your complex object once.

  • No foreign keys. With the absence of this relationship dynamic, documents can be independent of one another.

  • Open formats. A clean build process that uses XML, JSON and other derivatives to describe documents.

  • Built-in versioning. As your documents grow in size they can also grow in complexity. Versioning decreases conflicts.


Disadvantages

  • Consistency-Check Limitations. In the book database use case example above, it would be possible to search for books from a non-existent author. You could search the book collection and find documents that are not connected to an author collection. Each listing may also duplicate author information for each book. These inconsistencies aren’t significant in some contexts, but at upper-tier standards of RDB consistency audits, they seriously hamper database performance.

  • Atomicity weaknesses. Relational systems also let you modify data from one place without the need for JOINs. All new reading queries will inherit changes made to your data via a single command (such as updating or deleting a row). For document databases, a change involving two collections will require you to run two separate queries (per collection). This breaks atomicity requirements.

  • Security. Nearly half of web applications today actively leak sensitive data. Owners of NoSQL databases, therefore, need to pay careful attention to web app vulnerabilities.


7. Star Schema Model

A star schema is a database organizational structure optimized for use in a data warehouse or business intelligence that uses a single large fact table to store transactional or measured data, and one or more smaller dimensional tables that store attributes about the data. It is called a star schema because the fact table sits at the center of the logical diagram, and the small dimensional tables branch off to form the points of the star.


Fact Tables

A table in a star schema which contains facts and connected to dimensions. A fact table has two types of columns: those that include fact and those that are foreign keys to the dimension table. The primary key of the fact tables is generally a composite key that is made up of all of its foreign keys.


A fact table might involve either detail level fact or fact that have been aggregated (fact tables that include aggregated fact are often instead called summary tables). A fact table generally contains facts with the same level of aggregation.


Dimension Tables

A dimension is an architecture usually composed of one or more hierarchies that categorize data. If a dimension has not got hierarchies and levels, it is called a flat dimension or list. The primary keys of each of the dimensions table are part of the composite primary keys of the fact table. Dimensional attributes help to define the dimensional value. They are generally descriptive, textual values. Dimensional tables are usually small in size than fact table.


Fact tables store data about sales while dimension tables data about the geographic region (markets, cities), clients, products, times, channels.


Advantages:

  • simple design;

  • fast read and queries;

  • easy data aggregation; and

  • easy integration with OLAP systems and data cubes.


Disadvantages:

  • redundant data makes for larger storage on disk;

  • potential for data abnormalities, errors and inconsistencies;

  • slower queries;

  • limited flexibility on non-dimensional data.





The Tech Platform

bottom of page