What is Data Modeling? (Graph DB)
In the last article, we took a look at the data modeling with relational DB using the flight reservation system as an example. As addressed in ‘What is Data Modeling? (RDB)’, this article will cover the introduction of a graph database and examine data modeling with GDB.
This article may be helpful to those who are either new to graph databases or struggling with modeling for analysis.
Data Modeling in GDB
In addition to general social network services, the graph data model can be expressed in various data domains. Let’s look at the flight reservation system as an example.
GDB Components
To better understand the process of designing a graph data model, we must go over the components of the graph database. Property graphs have the following components.
- Vertex (Node)
Vertex (Node) is an entity. A node contains properties that hold the name-value of data. You can also use one or more labels to assign a role or type to a node. It can also be expressed in JSON.
- Edge (Link / Relationship)
Edge is the relationship between the nodes. The properties are characteristics between relationship type and the data relationship.
- Label
A label is a group with similar properties. Things such as people, animals, and automobiles, etc count as a label name.
The modeling process is similar to that of RDB, but there is a slight difference as explained below.
1. Identifying the needs
Same as RDB.
2. Designing conceptual data model(Whiteboard Modeling)
In this process, naming rules are set in consideration of clarity, relevance, uniqueness, and consistency. It is possible to derive the edges between the main nodes and individual nodes for each subject. Proceed freely as if drawing on a whiteboard.
3. Designing a logical data model
Nodes, labels, and edges are defined. This is done by creating and verifying the graph data model design.
4. Designing a physical data model
In this step, the property is defined. All labels (nodes, edges) or properties that require a unique constraint index are defined. Physically approach the designed models, such as the design of system configuration and estimation of data capacity.
GDB Modeling TIP
There is no right or wrong way to model graphs. It is up to the person performing the modeling to decide which method is suitable for prioritization. To find the data model that best suits your needs, it is often helpful to decide on a data model based on a few analytic techniques. Below are some methods to help you determine a data model.
- Writing and prioritizing queries
Knowing what to ask and query about your data can help you determine the structure of your data model. When you know that the query should return a result within a specific date, the date will have to be stored as a separate node or relationship, not as a property of the node. Even if you don’t know the exact query configuration, you can perform a more accurate data modeling by understanding the purpose of the system being built and then configuring the model around the needs.
It is also quite difficult to find the perfect model for any query or function. Certain things can be improved, but the most important thing is that you need to decide which model best suits your needs. Prioritize on a query with the maximum performance or one with a significant role in achieving your goal. If the priorities of the queries are adjusted as needed, the model can be implemented more flexibly.
- Setting node/edge property
Nodes represent existing entities and edges represent relationships and meanings between entities.
- Expressing relationship
The active-passive relationship or the relationship expressed in both directions is better off expressed as one. At this time, create a direction rule using the properties. However, If you want to express one randomly without rules, use either ‘-[]-’ or ‘<-[]->’ when writing a query. In addition, we recommend creating a relationship node when one relationship between multiple nodes occurs in a system.
- Performing a model test
The scenarios you did not recognize in the design stage can be solved through actual model testing. Also, when determining more than one model, you need to create a proof-of-concept test for each model and see how both models work. Load a portion of data and perform tests and queries on the system. Visually verify whether the results are suitable for your requirements or performance.
- Just in case…
First, reduce the edge and solve it with a property to refrain from creating unnecessary nodes and edges. If there are significantly more edges than nodes in an actual graph, GDB modeling is performed using a line graph. Having a high density of nodes may cause a problem. The line graph is when the node-edge relationship is replaced by the edge-node relationship. If the density of nodes increases due to too many types of properties, consider separating it with an edge.
Bayesian network modeling is considered separate from the graph data model. The basic rules can be ignored because the modeling logic is completely different. RDB is more effective when storing logarithmic data or long attribute values.
Fine-grain labels are more efficient in terms of computing than generic labels with properties. There is no need to use GDB because unconnected graphs are inefficient and unproductive patterns to be stored as graphs.
When data modeling in RDB is complete, it is possible to convert to a GDB data model by following specific rules. The following section explains how to convert a relational data model to a graph data model by following certain rules.
RDB to GDB
From a certain point of view, the GDB can be thought of as the next-generation RDBMS that is able to clearly define the relationship unlike that of foreign keys from RDBMS. In the native graph property model, the relationship of each node (or data entity) is directly and physically formed. These relationships consist of direction and have additional relationship-related information. Unlike RDBMS, which requires search-and-match calculation with table joins, the GDB can find data by moving between nodes.
Converting an RDB data model to GDB requires performing the following process. Using the flight reservation system as an example, the converting process of the RDB data model is as follows:
- Each entity table is converted to a node label
- Rows in the table are converted to nodes
- Property values of the table (Columns) are converted to node properties
- Remove the physical primary key in the table (keep the logical key)
- Apply a unique constraint to conversion data’s primary key
- Create an index on required property value
- Convert table foreign key to a relationship and then remove
- If there is a default value in the column, remove it (not required in GDB)
- Denormalized tables created for performance or business purposes shall be separated into separate nodes
- Convert JOIN tables to the relationship. If foreign key and data column exist within JOIN table, convert to relation property
The conversion to the completed GDB data model is as follows.
So far, we have performed data modeling of RDB and GDB with a simple example, looked at the process, and compared the pros and cons of each method. Although the article dealt with a theoretical approach, the modeling method used in practice may differ due to other considerations.