Graph database concepts

Introduction

The guide covers graph database fundamentals.

Neo4j uses a property graph database model. A graph data structure consists of nodes (discrete objects) that can be connected by relationships. Below is the image of a graph with three nodes (the circles) and three relationships (the arrows).

graph concept three nodes arr
Figure 1. Concept of a graph structure

The Neo4j property graph database model consists of:

  • Nodes describe entities (discrete objects) of a domain.

  • Nodes can have zero or more labels to define (classify) what kind of nodes they are.

  • Relationships describe a connection between a source node and a target node.

  • Relationships always have a direction (one direction).

  • Relationships must have a type (one type) to define (classify) what type of relationship they are.

  • Nodes and relationships can have properties (key-value pairs), which further describe them.

In mathematics, graph theory is the study of graphs.

In graph theory:

  • Nodes are also referred to as vertices or points.

  • Relationships are also referred to as edges, links, or lines.

Example graph

The example graph shown below introduces the basic concepts of the property graph:

graph simple arr
Figure 2. Example graph

To create the example graph, use the Cypher® clause CREATE.

CREATE (:Person:Actor {name: 'Tom Hanks', born: 1956})-[:ACTED_IN {roles: ['Forrest']}]->(:Movie {title: 'Forrest Gump', released: 1994})<-[:DIRECTED]-(:Person {name: 'Robert Zemeckis', born: 1951})

Node

Nodes are used to represent entities (discrete objects) of a domain.

The simplest possible graph is a single node with no relationships. Consider the following graph, consisting of a single node.

graph single node arr
Figure 3. Node

The node labels are:

  • Person

  • Actor

The properties are:

  • name: Tom Hanks

  • born: 1956

The node can be created with Cypher using the query:

CREATE (:Person:Actor {name: 'Tom Hanks', born: 1956})

Node labels

Labels shape the domain by grouping (classifying) nodes into sets where all nodes with a certain label belong to the same set.

For example, all nodes representing users could be labeled with the label User. With that in place, you can ask Neo4j to perform operations only on your user nodes, such as finding all users with a given name.

Since labels can be added and removed during runtime, they can also be used to mark temporary states for nodes. A Suspended label could be used to denote bank accounts that are suspended, and a Seasonal label can denote vegetables that are currently in season.

A node can have zero to many labels.

In the example graph, the node labels, Person, Actor, and Movie, are used to describe (classify) the nodes. More labels can be added to express different dimensions of the data.

The following graph shows the use of multiple labels.

graphdb simple labels multi arr
Figure 4. Multiple labels

Relationship

A relationship describes how a connection between a source node and a target node are related. It is possible for a node to have a relationship to itself.

A relationship:

  • Connects a source node and a target node.

  • Has a direction (one direction).

  • Must have a type (one type) to define (classify) what type of relationship it is.

  • Can have properties (key-value pairs), which further describe the relationship.

Relationships organize nodes into structures, allowing a graph to resemble a list, a tree, a map, or a compound entity — any of which may be combined into yet more complex, richly inter-connected structures.

graph example relationship arr
Figure 5. Relationship

The relationship type: ACTED_IN

The properties are:

  • roles: ['Forrest']

  • performance: 5

The roles property has an array value with a single item ('Forrest') in it.

The relationship can be created with Cypher using the query:

CREATE ()-[:ACTED_IN {roles: ['Forrest'], performance: 5}]->()

You must create or reference a source node and a target node to be able to create a relationship.

Relationships always have a direction. However, the direction can be disregarded where it is not useful. This means that there is no need to add duplicate relationships in the opposite direction unless it is needed to describe the data model properly.

A node can have relationships to itself. To express that Tom Hanks KNOWS himself would be expressed as:

graphdb nodes and rel self arr
Figure 6. Relationship to a single node

Relationship type

A relationship must have exactly one relationship type.

Below is an ACTED_IN relationship, with the Tom Hanks node as the source node and Forrest Gump as the target node.

graphdb nodes and rel arr
Figure 7. Relationship type

Observe that the Tom Hanks node has an outgoing relationship, while the Forrest Gump node has an incoming relationship.

Properties

Properties are key-value pairs that are used for storing data on nodes and relationships.

The value part of a property:

  • Can hold different data types, such as number, string, or boolean.

  • Can hold a homogeneous list (array) containing, for example, strings, numbers, or boolean values.

Example 1. Number
CREATE (:Example {a: 1, b: 3.14})
  • The property a has the type integer with the value 1.

  • The property b has the type float with the value 3.14.

Example 2. String and boolean
CREATE (:Example {c: 'This is an example string', d: true, e: false})
  • The property c has the type string with the value 'This is an example string'.

  • The property d has the type boolean with the value true.

  • The property e has the type boolean with the value false.

Example 3. Lists
CREATE (:Example {f: [1, 2, 3], g: [2.71, 3.14], h: ['abc', 'example'], i: [true, true, false]})
  • The property f contains an array with the value [1, 2, 3].

  • The property g contains an array with the value [2.71, 3.14].

  • The property h contains an array with the value ['abc', 'example'].

  • The property i contains an array with the value [true, true, false].

For a thorough description of the available data types, refer to the Cypher manual → Values and types.

Traversals and paths

A traversal is how you query a graph in order to find answers to questions, for example: "What music do my friends like that I don’t yet own?", or "What web services are affected if this power supply goes down?".

Traversing a graph means visiting nodes by following relationships according to some rules. In most cases only a subset of the graph is visited.

Example 4. Path matching.

To find out which movies Tom Hanks acted in according to the tiny example database, the traversal would start from the Tom Hanks node, follow any ACTED_IN relationships connected to the node, and end up with the Movie node Forrest Gump as the result (see the black lines):

graphdb traversal arr

The traversal result could be returned as a path with the length 1:

graphdb path arr

The shortest possible path has length zero. It contains a single node and no relationships.

A path containing only a single node has the length of 0.

graphdb path zero arr
Figure 8. Path of length zero

A path containing one relationship has the length of 1.

graphdb path example loop arr
Figure 9. Path of length one

Schema

A schema in Neo4j refers to indexes and constraints.

Neo4j is often described as schema optional, meaning that it is not necessary to create indexes and constraints. You can create data — nodes, relationships and properties — without defining a schema up front. Indexes and constraints can be introduced when desired, in order to gain performance or modeling benefits.

Indexes

Indexes are used to increase performance. To see examples of how to work with indexes, see Using indexes. For detailed descriptions of how to work with indexes in Cypher, see Cypher Manual → Indexes.

Constraints

Constraints are used to make sure that the data adheres to the rules of the domain. To see examples of how to work with constraints, see Using constraints. For detailed descriptions of how to work with constraints in Cypher, see the Cypher manual → Constraints.

Naming conventions

Node labels, relationship types, and properties (the key part) are case sensitive, meaning, for example, that the property name is different from the property Name.

The following naming conventions are recommended:

Table 1. Naming conventions
Graph entity Recommended style Example

Node label

Camel case, beginning with an upper-case character

:VehicleOwner rather than :vehicle_owner

Relationship type

Upper case, using underscore to separate words

:OWNS_VEHICLE rather than :ownsVehicle

Property

Lower camel case, beginning with a lower-case character

firstName rather than first_name

For the precise naming rules, refer to the Cypher manual → Naming rules and recommendations.