How to model your data to work with Amazon Web Services’ NoSQL based DynamoDB.
Why NoSQL?
Nowadays, storage is cheap and computational power is expensive. NoSQL leverages this fact and sacrifices some storage space to allow for computationally easier queries. Essentially, what this means is that when designing your NoSQL data model, you will need to always be thinking of ways to simplify your queries to your database. When used correctly, NoSQL can be a much more cost-effective solution than a relational database.
Why DynamoDB?
Amazon DynamoDB is a key-value and document database that is fully managed, multi-region, and autoscaling so that you don’t have to worry about the infrastructure or datacenter. DynamoDB also offers an “On-Demand Capacity” pricing model. This makes it very accessible for any size application to instantly get started without having to worry about provisioning the capacity or having to upgrade later on.
Understanding the Basics
Unlike relational databases such as MySQL, NoSQL requires you to constantly ask questions about how the data will be queried. Asking these questions leads you down the path of item organization and how to split items up in a way that is conducive to speedy queries. The first step is to create primary keys for your items which are composed of a partition key and sort key.
Note: You can use just the partition key as the primary key, but for most cases, you will also want to leverage a sort key.
Partition Key
DynamoDB tables are split into partitions. DynamoDB uses the partition key as an input to an internal hash function in which the result determines which partition the item will be stored in.
Hot Partitions
It is important to ensure that your partition keys split your items so that your workload is distributed evenly amongst the partitions to avoid the “hot” partition problem.
For example, let’s say your table is split into 3 partitions and that you have provisioned 3 RCUs (Read-Capacity units) to your table. That means that each partition would have access to 1 RCU. If 1 partition is hit much more frequently than the other 2, you risk being throttled since you may consume all of that 1 RCU; meanwhile, you are still paying for 3 RCUs.
You can find more about this on the AWS official docs: Designing Partition Keys to Distribute Your Workload Evenly.
Sort Key
All items with the same partition key are stored together and are ordered by the sort key. By following this pattern, you can very efficiently query for multiple items by using only the partition key.
An Example of Data Modeling
Let’s say that you are designing an application where you need to store information about sports tournaments. We could say that each tournament has teams, players, and matches. The tournament would also have some basic information like location, date, game, and prize.
A very common approach for modeling data in NoSQL is to think in terms of a hierarchy. So what goes on top of our hierarchy? Well, think of it this way: without a tournament, we wouldn’t have teams, players, or matches. The tournament provides the context that connects all of the other items together. So, for each tournament, we want to group all of the items alongside each other so that we can efficiently retrieve all of the tournament data in one query.
We’ll need to partition each of our tournaments based on a unique but uniformly distributed identifier. For this, I would recommend using UUIDv4 to generate unique tournament ids. So let’s take a look at what this could look like in a DynamoDB Table. Our UUIDv4 tournament id acts as our partition key.
As you can see, we have 4 individual items all with the same partition key and sorted by the sort key. You’ll also notice that each of the items is either prefixed with a description or just a hardcoded value. I will further explain why we do this later on. Also, each of these items has its own unique set of attributes, and they can all be retrieved by performing one simple query call to DynamoDB.
{
"TableName": "tournaments",
"KeyConditionExpression": "partitionKey = :tournamentId",
"ExpressionAttributeValues": {
":tournamentId": "983d39a3-bdd6-4b61-88d5-58595d555b81"
}
}
What if you only want the teams for a given tournament id?
This is where the prefix team- comes in handy. Since we prefixed all of the team item sort keys with team- we can perform a special function in our KeyConditionExpression — begins_with. This query call will retrieve all of the teams for a given tournament id (partition key).
{
"TableName": "tournaments",
"KeyConditionExpression": "partitionKey = :tournamentId and begins_with(sortKey, :teamPrefix)",
"ExpressionAttributeValues": {
":teamPrefix": "team-",
":tournamentId": "983d39a3-bdd6-4b61-88d5-58595d555b81"
}
}
What if you only want the basic details?
We can just perform a DynamoDB get item call since we know both the partition key and sort key.
{
"TableName": "tournaments",
"Key": {
"partitionKey": "983d39a3-bdd6-4b61-88d5-58595d555b81",
"sortKey": "tournament-details"
}
}
Wrapping Up
I hope this helps you out in your journey of modeling data for NoSQL databases like DynamoDB. It certainly took me quite a while to wrap my head around some of the patterns and techniques that I’ve tried to outline here. That said, I wanted to share the knowledge I have gained in hopes of giving you a head start when it comes to modeling your data.
Source: Paper.li
Comentarios