Graph Database in SQL Server

The Tech Platform
Feb 1, 2022
8 min read

Graph

A graph is a collection of node and edge tables. Node or edge tables can be created under any schema in the database, but they all belong to one logical graph. A node table is collection of similar type of nodes. For example, a Person node table holds all the Person nodes belonging to a graph.

Graph Database

Graph Database uses graph structure for semantic queries with nodes and edges which is used to represent and store the data. A key concept of the system is graph or edge or relationship. The graph relates the data items in the store to a collection of nodes and edges, the edges representing the relationships between the nodes. The relationships allow data in the store to be linked together directly and, in many cases, retrieved with one operation. Graph databases hold the relationships between data as a priority. Querying relationships is fast because they are perpetually stored in the database. Relationships can be intuitively visualized using graph databases, making them useful for heavily inter-connected data

Some of the important features of a graph databases are:

Edges or relationships are first class entities in a Graph Database and can have attributes or properties associated with them.
A single edge can flexibly connect multiple nodes in a Graph Database.
We can express pattern matching and multi-hop navigation queries easily.
We can express transitive closure and polymorphic queries easily.

Architecture:

Example:

Imagine we have the following social graph. Simon is new to LinkedIn. He has recently signed up and has made connections with Emma and Sacha. We want to suggest connections to Simon based on the people Sacha and Emma know. In the diagram below you can see our social graph. We want to recommend the people in light blue. We can do this by moving one level up in our hierarch away from Simon and collecting the returned results.

Each of the elements on the diagram above has a name. The circles are referred to as Nodes. The lines connecting them are relationships or Edges. You can think of a node as an entity (a thing, a person, an account, a movie, an actor etc). The Edge is what connects them, it is what gives them context. A person knows a person, a movie is directed by a director. Node->Relationship->Node OR Node->Edge->Node. There can be multiple nodes and multiple edges. Simon might know Emma and work with Sacha (know and work are our edges/relationships). With this set up we can then ask questions such “Of the people who know Simon, who do they know?”. This would return our people coloured blue. A query that is relatively trivial in SQL at this level becomes incredibly complex when you add in “now show me who each of the blue nodes knows” and so forth, moving up and up the nodes.

This query pattern is perfect for creating a recommendation engine. Simon bought a book on Lambda Architecture, what did people who bought the same book also buy. That query would be very hard (yes possible, but complicated) to write in SQL. So graph databases were built to solve this problem. There are a few big players already in the Graph world, Neo4j being the big one. Neo4j has its own SQL like syntax for writing these matching processes call Cypher Query Language or CQL. CQL is very mature and we equipped to answer complex graph queries.

What is Node Table?

A node table is collection of similar type of nodes. For example, a Person node table holds all the Person nodes belonging to a graph. Similarly, an edge table is a collection of similar type of edges. For example, a Friends edge table holds all the edges that connect a Person to another Person

What is Edge Table?

An edge table represents a relationship in a graph. Edges are always directed and connect two nodes. An edge table enables users to model many-to-many relationships in the graph. An edge table may or may not have any user-defined attributes in it. Every time an edge table is created, along with the user-defined attributes, three implicit columns are created in the edge table:

$edge_id
$from-id
$to-id

Types of Graph Database

There are three types of Graph Database:

True graph database
triple stores
conventional database

True Graph Database:

Triple Stores :

Graph Database that stores data as network of objects and uses inference to uncover new information out of existing relations. It's flexible and dynamic nature allows linking diverse data, indexing it for semantic search and enriching it via text analysis to build big knowledge graphs.

The RDF triple store is a type of graph database that stores semantic facts. RDF, which stands for Resource Description Framework, is a model for data publishing and interchange on the Web standardized by W3C

Conventional Database:

It refer to databases that may or may not contain temporal data about persistent objects (and that usually do), but that do not contain explicitly bi-temporal tables and instead incorporate temporal data by using variations on one or more of the ad hoc methods

How to Model a Graph Database

Surely, we won’t go into much detail of all the features mentioned. But let’s define the basic graph database for this purpose.

Below are the 2 simple steps:

Define the nodes.
Define the edges.

Let’s start by defining the nodes in this sample:

Restaurant — Restaurants being serviced by the food delivery company.
Locations — Locations of restaurants and customers.
Orders — Orders made by customers to restaurants.
OrderDetails — Details of orders including food or beverages ordered and the quantities.
FoodBeverages — List of food and beverages.
Customers — List of customers.

Now, below are the edges:

isServed — Food is served in Restaurant.
isLocated — Restaurant is located in Location.
willReceiveIn — Customer will receive order in Location.
ordered — Customer made Order.
isIncluded — Food is included in Order.
received — Restaurants received Orders.
include — Order detail includes Food/beverage.

The conceptual model will look like this:

Figure 1: The graph database conceptual model for the food-delivery system

As a comparison, if we design a relational database diagram with all the primary and foreign keys, it will look like this:

Figure 2: The equivalent relational database design for the food delivery system

Notice that we needed the additional table RestaurantLocations to join Restaurants and Locations.

And if you create a database diagram of the conceptual model of the graph database in SSMS, it will look like floating objects with no relationships, just as shown below:

Figure 3: SSMS database diagram of the graph database model of the food delivery system

The database diagram in SSMS is no use if you want to view the relationships between the nodes and the edges graphically. Also, pay attention to the autogenerated columns. As I have already said, do not remove these columns or bother putting data into them.

How to Create Your First SQL Server Graph Database

Next, after you draw the nodes and edges on paper or any diagramming software, create the tables for the nodes and edges in SSMS using the query editor.

Below is the required syntax:

-- For a Node table
CREATE TABLE <tablename>
(
    column1 <datatype>,
    column2 <datatype>,
    columnN <datatype>
) AS NODE-- For an Edge table
CREATE TABLE <tablename>
(
    column1 <datatype>,
    column2 <datatype>,
    columnN <datatype>
) AS EDGE

Notice the keywords AS NODE and AS EDGE. These keywords differentiate a graph table from other types of tables.

But if you prefer, right-click the Tables folder in SSMS and select New -> Graph Table -> Node Table. And for edge tables, select New-> Graph Table -> Edge Table. See the screenshot in Figure 5 below:

Figure 4: Creating node and edge tables in SQL Server Management Studio (SSMS)

Following that, a new query window will appear. Fill in the table name and fields you need, then execute the commands.

Indexes will be created automatically for $node_ids of your node tables and for the $edge_ids of your edge tables. But you must create one more index for the $from_id and $to_id of each of your edge tables for performance reasons.

See the example below:

CREATE UNIQUE INDEX ix_ordered_from_to on ordered ($from_id, $to_id)

Now here’s the complete script of the node and edge tables we need:

CREATE TABLE Restaurants
(
[RestaurantID] INT PRIMARY KEY NOT NULL,
[Name] VARCHAR(20) NOT NULL,
[Description] VARCHAR(50) NOT NULL,
[Opening] TIME NOT NULL,
[Closing] TIME NOT NULL,
[DeliveryCharge] SMALLMONEY DEFAULT 0 NOT NULL
) AS NODE

CREATE TABLE Locations
(
[LocationID] INT PRIMARY KEY NOT NULL,
[Description] VARCHAR(50) NOT NULL,
[GeoLocation] GEOGRAPHY NOT NULL
) AS NODE

CREATE TABLE FoodBeverages
(
[FoodBeverageID] INT PRIMARY KEY NOT NULL,
[Name] VARCHAR(50) NOT NULL,
[Amount] MONEY NOT NULL
) AS NODE

CREATE TABLE Orders
(
[OrderID] INT PRIMARY KEY NOT NULL,
[CustomerID] INT NOT NULL,
[OrderDate] DATETIME NOT NULL,
[TotalCost] MONEY NOT NULL,
[LocationID] INT NOT NULL,
[RestaurantID] INT NOT NULL
) AS NODE

CREATE TABLE OrderDetails
(
[OrderDetailID] INT PRIMARY KEY NOT NULL,
[OrderID] INT NOT NULL,
[FoodBeverageID] INT NOT NULL,
[Quantity] TINYINT NOT NULL,
[UnitCost] MONEY NOT NULL,
[ComputedCost] MONEY NOT NULL
) AS NODE

CREATE TABLE Customers
(
[CustomerID] INT PRIMARY KEY NOT NULL,
[Name] VARCHAR(50) NOT NULL,
[email] VARCHAR(50) NOT NULL,
[password] VARCHAR(10) NOT NULL
) AS NODE

CREATE TABLE isServed AS EDGE
CREATE TABLE isLocated AS EDGE
CREATE TABLE ordered AS EDGE
CREATE TABLE isIncluded AS EDGE
CREATE TABLE received AS EDGE
CREATE TABLE includes AS EDGE
CREATE TABLE willReceiveIn AS EDGE

CREATE UNIQUE INDEX ix_isIncludes_from_to on isIncluded ($from_id, $to_id)
CREATE UNIQUE INDEX ix_includes_from_to on includes ($from_id, $to_id)
CREATE UNIQUE INDEX ix_ordered_from_to on ordered ($from_id, $to_id)
CREATE UNIQUE INDEX ix_received_from_to on received ($from_id, $to_id)
CREATE UNIQUE INDEX ix_isServed_from_to on isServed ($from_id, $to_id)
CREATE UNIQUE INDEX ix_isLocated_from_to on isLocated ($from_id, $to_id)
CREATE UNIQUE INDEX ix_willReceiveIn_from_to on willReceiveIn ($from_id, $to_id)

Inserting Your Initial Data

Preparing your data for the first time is kind of a pain because you need to insert data not just in the node tables but in the edge tables as well. Still, this will be advantageous for shorter and simpler queries. More on this later.

You will use an INSERT statement in the node and edge tables just like you do in other table types. And you can start adding records to node tables, as in the example below:

INSERT INTO Restaurants
VALUES
(1, 'Subway','American * Sandwiches * Fast Food * Salad','10:00','15:59',0)INSERT INTO FoodBeverages
VALUES
(1,'Sliced Chicken and Ham',149.50),
(2,'Subway Melt',184.00),
(3,'Meatball Marinara Regular Meal',281.75),
(4,'Coke in Can',51.75)

For the edge tables, you will need a node ID for the $from_id and another node ID for $to_id, just like the one below:

--  food&beverages served on Subway (RestaurantID = 1)

INSERT INTO isServed
VALUES
-- for the Sliced Chicken and Ham
(
 (SELECT $node_id FROM Restaurants r WHERE r.RestaurantID = 1),
 (SELECT $node_id FROM FoodBeverages fb WHERE fb.FoodBeverageID = 1)  
),
-- for the Subway Melt
(
 (SELECT $node_id FROM Restaurants r WHERE r.RestaurantID = 1),
 (SELECT $node_id FROM FoodBeverages fb WHERE fb.FoodBeverageID = 2) 
),
-- for the Meatball Marinara Regular Meal
(
 (SELECT $node_id FROM Restaurants r WHERE r.RestaurantID = 1),
 (SELECT $node_id FROM FoodBeverages fb WHERE fb.FoodBeverageID = 3)  
),
-- for  the coke in can
(
 (SELECT $node_id FROM Restaurants r WHERE r.RestaurantID = 1),
 (SELECT $node_id FROM FoodBeverages fb WHERE fb.FoodBeverageID = 4)

In the above example, we established a relationship between the restaurants and the food they serve. So, the node IDs of Restaurants and FoodBeverages were used.

And here’s the point: to establish relationships between node tables, you add records to edge tables.

Limitations:

Temporary tables, table type variables, system-versioned temporal tables, and memory-optimized tables cannot be node or edge tables.
In SQL Server Management Studio (SSMS), right-clicking a node or edge table will not show the Design menu item, unlike regular tables.
Creating indexes for $node_id, $from_id, and $to_id using the GUI in SSMS will generate an error. Use CREATE INDEX in the query editor instead.
You cannot update $from_id and $to_id using the UPDATE statement. If you need to change a relationship between 2 nodes, insert a new record with the new $from_id and $to_id, and delete the old record.
Cross-database queries on graph objects are not supported.
You cannot change a node table to an edge table and vice versa.
SQL Server cannot guess which node is related to another node. It’s up to you to provide the relationship of each node.