top of page

Data Modelling

Data modeling (data modelling) is the process of creating a data model for the data to be stored in a database. This data model is a conceptual representation of Data objects, the associations between different data objects, and the rules. Data modeling helps in the visual representation of data and enforces business rules, regulatory compliances, and government policies on the data. Data Models ensure consistency in naming conventions, default values, semantics, security while ensuring quality of the data.

Data Models ensure consistency in naming conventions, default values, semantics, security while ensuring quality of the data. Data Model structure helps to define the relational tables, primary and foreign keys and stored procedures. there are three types of data modelling which are discussed below (Conceptual, Logical and Physical). The main aim of conceptual model is to establish the entities, their attributes, and their relationships. Logical data model defines the structure of the data elements and set the relationships between them. A Physical Data Model describes the database specific implementation of the data model.

The main goal of a designing data model is to make certain that data objects offered by the functional team are represented accurately. The biggest drawback is that even smaller change made in structure require modification in the entire application.


Techniques of Data Modelling


There are three basic types of data modelling techniques :

  1. Entity Relationship Diagrams (ER Diagrams)

  2. UML Class Diagram

  3. Data Dictionary


1. Entity Relationship Diagrams

Also referred to as ER diagrams or ERDs. Entity-Relationship modeling is a default technique for modeling and the design of relational (traditional) databases. In this notation architect identifies:

  1. Entities representing objects (or tables in relational database),

  2. Attributes of entities including data type,

  3. Relationships between entities/objects (or foreign keys in a database).

ERDs work well if you want to design a relational (classic) database, Excel databases or CSV files. Basically, any kind of tabular data. They work well for visualization of database schemas and communication of top-level view of data.

ERD created with Dataedo.


Advantages:

  1. Easier to see the big picture

  2. Easier to understand table relations

  3. Possible to use visual cues to communicate information (e.g. location, proximity, color, shape)

Disadvantages:

  1. Doesn't work with large data models due to space constraints and clutter

  2. Supports limited amount of details

  3. May contain very little descriptions (as notes on a diagram)

  4. Requires careful layout and fitting into canvas

  5. Hard to search


2. UML Class Diagram


UML (Unified Modeling Language) is a standardized family of notations for modeling and design of information systems. It was derived from various existing notations to provide a standard for software engineering. It comprises of several different diagrams representing different aspect of the system, and one of them being a Class Diagram that can be used for data modeling. Class diagrams are equivalent of ERDs in relational world and are mostly used to design classes in object-oriented programming languages (such as Java or C#).


In class diagrams architects define:

  1. Classes (equivalent of entity in relational world),

  2. Attributes of a class (same as in an ERD) including data type,

  3. Methods associated to specific class, representing its behavior (in relational world those would be stored procedures),

  4. Relationships grouped into two categories:

    • Relationships between objects (instances of Classes) differentiated into Dependency, Association, Aggregation and Composition (equivalent to relationships in an ERD),

    • Relationships between classes of two kinds Generalization/Inheritance and Realization/Implementation (this has no equivalent in relational world).


You can use class diagrams to design a tabular data (such as in RDBMS), but were designed and are used mostly for object-oriented programs (such as Java or C#).

UML Class Diagram created with Visual Paradigm.


3. Data Dictionary

Last technique I'd like to share with you is a Data Dictionary. Techniques mentioned above were visual and were based on diagrams, and data dictionaries are a tabular definition/representation of data assets. Data dictionary is an inventory of data sets/tables with the list of their attributes/columns.

Core data dictionary elements:

  1. List of data sets/tables,

  2. List of attributes/columns of each table with data type.

Optional data dictionary elements:

  1. Item descriptions,

  2. Relationships between tables/columns,

  3. Additional constraints, such as uniqueness, default values, value constraints or calculated columns.

Data dictionary is suitable as detailed specification of data assets and can be supplemented with ER diagrams, as both serve slightly different purpose.

Data Dictionary created with Dataedo.


Advantages:

  1. May include many data attributes (e.g. list of values, default values, owner, etc.)

  2. Includes detailed descriptions of each element (table, column)

  3. Easily searchable

Disadvantages:

  1. Less visually appealing

  2. More difficult to read



Types of Data Models

There are mainly three different types of data models: conceptual data models, logical data models, and physical data models, and each one has a specific purpose. The data models are used to represent the data and how it is stored in the database and to set the relationship between data items.

  1. Conceptual Data Model: This Data Model defines WHAT the system contains. This model is typically created by Business stakeholders and Data Architects. The purpose is to organize, scope and define business concepts and rules.

  2. Logical Data Model: Defines HOW the system should be implemented regardless of the DBMS. This model is typically created by Data Architects and Business Analysts. The purpose is to developed technical map of rules and data structures.

  3. Physical Data Model: This Data Model describes HOW the system will be implemented using a specific DBMS system. This model is typically created by DBA and developers. The purpose is actual implementation of the database.



1. Conceptual Data Model

A Conceptual Data Model is an organized view of database concepts and their relationships. The purpose of creating a conceptual data model is to establish entities, their attributes, and relationships. In this data modeling level, there is hardly any detail available on the actual database structure. Business stakeholders and data architects typically create a conceptual data model.

The 3 basic tenants of Conceptual Data Model are

  • Entity: A real-world thing

  • Attribute: Characteristics or properties of an entity

  • Relationship: Dependency or association between two entities


Data model example:

  • Customer and Product are two entities. Customer number and name are attributes of the Customer entity

  • Product name and price are attributes of product entity

  • Sale is the relationship between the customer and product


Characteristics of a conceptual data model

  • Offers Organization-wide coverage of the business concepts.

  • This type of Data Models are designed and developed for a business audience.

  • The conceptual model is developed independently of hardware specifications like data storage capacity, location or software specifications like DBMS vendor and technology. The focus is to represent data as a user will see it in the "real world."

Conceptual data models known as Domain models create a common vocabulary for all stakeholders by establishing basic concepts and scope.


2. Logical Data Model

The Logical Data Model is used to define the structure of data elements and to set relationships between them. The logical data model adds further information to the conceptual data model elements. The advantage of using a Logical data model is to provide a foundation to form the base for the Physical model. However, the modeling structure remains generic.


At this Data Modeling level, no primary or secondary key is defined. At this Data modeling level, you need to verify and adjust the connector details that were set earlier for relationships.

Characteristics of a Logical data model

  • Describes data needs for a single project but could integrate with other logical data models based on the scope of the project.

  • Designed and developed independently from the DBMS.

  • Data attributes will have datatypes with exact precisions and length.

  • Normalization processes to the model is applied typically till 3NF.


3. Physical Data Model

A Physical Data Model describes a database-specific implementation of the data model. It offers database abstraction and helps generate the schema. This is because of the richness of meta-data offered by a Physical Data Model. The physical data model also helps in visualizing database structure by replicating database column keys, constraints, indexes, triggers, and other RDBMS features.


Characteristics of a physical data model:

  • The physical data model describes data need for a single project or application though it maybe integrated with other physical data models based on project scope.

  • Data Model contains relationships between tables that which addresses cardinality and nullability of the relationships.

  • Developed for a specific version of a DBMS, location, data storage or technology to be used in the project.

  • Columns should have exact datatypes, lengths assigned and default values.

  • Primary and Foreign keys, views, indexes, access profiles, and authorizations, etc. are defined.


Importance of Data Modeling

  • A clear representation of data makes it easier to analyze the data properly. It provides a quick overview of the data which can then be used by the developers in varied applications.

  • Data modeling represents the data properly in a model. It rules out any chances of data redundancy and omission. This helps in clear analysis and processing.

  • Data modeling improves data quality and enables the concerned stakeholders to make data-driven decisions.


Advantages and Disadvantages of Data Model:


Advantages of Data model:

  • The main goal of a designing data model is to make certain that data objects offered by the functional team are represented accurately.

  • The data model should be detailed enough to be used for building the physical database.

  • The information in the data model can be used for defining the relationship between tables, primary and foreign keys, and stored procedures.

  • Data Model helps business to communicate the within and across organizations.

  • Data model helps to documents data mappings in ETL process

  • Help to recognize correct sources of data to populate the model

Disadvantages of Data model:

  • To develop Data model one should know physical data stored characteristics.

  • This is a navigational system produces complex application development, management. Thus, it requires a knowledge of the biographical truth.

  • Even smaller change made in structure require modification in the entire application.

  • There is no set data manipulation language in DBMS.


Resources: Dataedo, Guu99


The Tech Platform

0 comments
bottom of page