Updated: Jul 5
Scaling Database in Microservices Architecture with different data partitioning strategies: horizontal, vertical, and functional partitioning:
Horizontal Partitioning (Sharding):
In this strategy, data is divided into shards based on a partition key, such as product keys.
Each shard is responsible for storing a specific subset of data, organized according to the sharding keys (e.g., alphabetically).
Sharding distributes the load across different servers, improving system performance by enabling parallel processing and reducing the workload on individual servers.
Horizontal partitioning is widely used in distributed databases to scale the database servers efficiently.
Vertical partitioning involves dividing a table into multiple partitions, where each partition holds a subset of columns.
The partitioning is based on the pattern of column usage, grouping frequently accessed columns together and separating less frequently accessed fields into different partitions.
Vertical partitioning optimizes data retrieval by minimizing unnecessary processing of less frequently accessed data, improving query performance.
Functional partitioning aims to promote loose coupling and enhance scalability in a microservices architecture.
Each microservice independently manages its own data without interfering with other services.
This partitioning approach allows microservices to operate on their specific dataset, promoting autonomy and independent deployment of services.
The mentioned image illustrates the example of horizontally partitioning (sharding) product data based on product keys. Each shard holds a specific subset of data according to the sharding keys, organized alphabetically. By distributing the data across different servers, sharding improves performance by balancing the load.
Scaling Databases in Microservices Architecture
1: Horizontal Partitioning - Sharding
Horizontal partitioning, also known as sharding, is a method of partitioning data where each partition, referred to as a shard, is a separate data store with the same schema. Sharding involves dividing the data based on a specific partition key, such as product keys, and distributing it across multiple shards.
In sharding, the data within each shard is organized according to the sharding keys, often arranged alphabetically or based on another relevant criterion. This distribution of data across shards helps separate the workload and balance the load among different servers, resulting in improved system performance and scalability.
The concept of sharding can be better understood by referring to the image provided in the mentioned link. The product data is divided into multiple shards, and each shard holds a specific subset of data based on the sharding keys. This partitioning allows for efficient distribution of data across different servers, reducing the burden on individual database servers and enhancing performance.
Advantages of horizontal partitioning or sharding in distributed databases:
Scalability: Sharding allows for horizontal scaling by distributing data across multiple servers. As data volumes and workload increase, additional servers can be added to accommodate the growing demands, ensuring the scalability of the database system.
Increased Performance: By dividing data into shards and distributing them across multiple servers, the workload is distributed evenly. This results in improved performance as each server handles a smaller subset of data and can process queries more efficiently.
Efficient Data Management: Sharding enables efficient management of large data volumes. By dividing data based on a partition key, related data can be stored together in a shard. This can enhance query performance and reduce the amount of data that needs to be accessed and processed for a given operation.
Fault Isolation: With horizontal partitioning, if one shard or server fails, it does not affect the availability of the entire system. Each shard operates independently, allowing for fault isolation. This ensures that failures in one shard do not result in the unavailability of the entire database.
Improved Load Balancing: Sharding helps distribute the workload across multiple servers, preventing any single server from being overwhelmed with requests. This balanced distribution of load ensures that resources are utilized efficiently and no single server becomes a bottleneck for the entire system.
Cost-Effectiveness: Horizontal partitioning allows for cost-effective scaling. Instead of investing in a single high-end server, additional commodity servers can be added as needed. This approach provides flexibility in scaling and can be more cost-efficient for managing large and growing datasets.
The diagram shows one way of scaling the database in a microservice architecture, which is called horizontal partitioning or sharding. This means that the database is divided into a set of horizontal partitions or shards, each holding a subset of the data based on a partition key.
For example, the catalog service may have different shards for different product categories, and the ordering service may have different shards for different order statuses. This way, each shard can be hosted on a separate server, reducing the load and improving the performance of the database. Sharding also enables to scale the system by adding new shards according to storage needs.
However, sharding also introduces some challenges, such as data consistency, query complexity, and cross-shard transactions.
2: Vertical Partitioning - Row Splitting
Vertical partitioning, also known as "Row Splitting," is a database optimization technique where a table is divided into multiple partitions, with each partition containing a subset of columns. This division is based on the pattern of column usage within the table.
The purpose of vertical partitioning is to improve query performance by storing frequently accessed columns together in one partition while placing less frequently accessed fields in separate partitions. By doing so, it allows for more efficient data retrieval and minimizes the amount of unnecessary data that needs to be processed during query execution.
To illustrate this concept, let's consider an example:
Suppose we have a table called "Customers" in a database, which contains various columns such as "customer_id," "name," "email," "phone_number," "address," and "last_login."
In this case, we can apply vertical partitioning to divide the table into two partitions based on column usage.
Partition 1 (Frequently Accessed Columns):
Partition 2 (Less Frequently Accessed Columns):
By splitting the table into these partitions, we optimize the retrieval of frequently accessed customer information. For instance, if we have a query that involves retrieving customer names and email addresses, the database engine can focus solely on the first partition, avoiding unnecessary processing of less frequently accessed data.
This approach can significantly enhance the overall performance of the database system, as it reduces the I/O overhead associated with retrieving and processing unnecessary data during query execution. However, it's important to note that vertical partitioning may introduce additional complexity, such as managing the relationships between partitions and maintaining data integrity across the partitions.
Functionally partitioning data based on bounded contexts or subdomains is a common approach to managing data segregation. By aligning the data partitioning with the responsibilities of the bounded contexts, we can effectively decompose microservices and ensure that each service has ownership over its relevant data.
Bounded contexts represent distinct areas of the system where different business functionalities reside. By identifying these bounded contexts, we can partition the data in a way that aligns with the responsibilities of each microservice. This functional partitioning allows us to distribute the data across multiple databases or data stores, ensuring that each microservice has access to the specific data it needs to fulfill its purpose.
Advantages of functionally partitioning data in a microservices architecture:
Loose coupling: Functional partitioning promotes loose coupling between microservices, allowing them to independently manage their data without interfering with other services. This improves overall system performance and flexibility.
Enhanced scalability: Each microservice can operate on its specific dataset, enabling independent scalability. By adding more database instances or shards, horizontal partitioning ensures data distribution and improves system scalability as data volumes increase.
Autonomous microservices: Functional partitioning enables the development of autonomous and independently deployable microservices. Each microservice can operate on its own partitioned data without causing conflicts or dependencies with other services.
Improved performance: By distributing data across multiple nodes or servers using horizontal partitioning, the system can handle increased load more efficiently. This improves performance by distributing the workload and minimizing resource bottlenecks.
Horizontal scaling: Functional partitioning allows for horizontal scaling, where additional resources can be added as needed. This approach avoids the limitations of vertical scaling and provides a more cost-effective and flexible way to handle growing data volumes.
However, it's important to note that functional data partitioning and sharding introduce certain complexities. Managing data consistency, ensuring proper data distribution, and handling data updates across multiple shards require careful consideration and implementation. Additionally, querying data that spans multiple shards can be more challenging and may require additional mechanisms such as distributed joins or aggregations.
The database in this diagram is scaled by using functional partitioning, which means that each microservice has its own database with a specific schema and data model. This way, each microservice can access and modify its own data without affecting other microservices.
Functional partitioning also enables to scale the system by adding new microservices and databases as needed. However, functional partitioning also introduces some challenges, such as data duplication, data consistency, and data integration.
Here are the possible steps on the flow in the diagram:
A user interacts with one of the client apps, such as the web or mobile app, to browse products, add items to the shopping cart, apply discounts, and place orders.
The client app sends the user’s requests to the API gateway, which is a single entry point for all the microservices.
The API gateway validates the requests and forwards them to the corresponding microservice, such as the catalog, shopping cart, discount, or ordering service.
The microservice processes the requests and returns the responses to the API gateway, which then sends them back to the client app.
The microservice also interacts with its own database to store or retrieve data related to its functionality. For example, the catalog service stores and retrieves product information, and the ordering service stores and retrieves order information.
The microservice also publishes or subscribes to messages from the message broker, which is a middleware that enables asynchronous communication between microservices. For example, the ordering service publishes a message when an order is placed, and the discount service subscribes to that message to apply a discount to the order.