Graph database concepts

Neo4j uses a property graph database model.

A graph data structure consists of nodes (discrete objects) that can be connected by relationships.

Example 1. Concept of a graph structure.

A graph with three nodes (the circles) and three relationships (the arrows).

graph concept three nodes

The Neo4j property graph database model consists of:

  • Nodes describe entities (discrete objects) of a domain.

  • Nodes can have zero or more labels to define (classify) what kind of nodes they are.

  • Relationships describes a connection between a source node and a target node.

  • Relationships always has a direction (one direction).

  • Relationships must have a type (one type) to define (classify) what type of relationship they are.

  • Nodes and relationships can have properties (key-value pairs), which further describe them.

In mathematics, graph theory is the study of graphs.

In graph therory:

  • Nodes are also refered to as vertices or points.

  • Relationships are also refered to as edges, links, or lines.

1. Example graph

The example graph shown below, introduces the basic concepts of the property graph:

Example 2. Example graph.
graph simple
Example 3. Cypher.

To create the example graph, use the Cypher clause CREATE.

CREATE (:Person:Actor {name: 'Tom Hanks', born: 1956})-[:ACTED_IN {roles: ['Forrest']}]->(:Movie {title: 'Forrest Gump'})<-[:DIRECTED]-(:Person {name: 'Robert Zemeckis', born: 1951})

2. Node

Nodes are used to represent entities (discrete objects) of a domain.

The simplest possible graph is a single node with no relationships. Consider the following graph, consisting of a single node.

Example 4. Node.
graph single node

The node labels are:

  • Person

  • Actor

The properties are:

  • name: Tom Hanks

  • born: 1956

The node can be created with Cypher using the query:

CREATE (:Person:Actor {name: 'Tom Hanks', born: 1956})

3. Node labels

Labels shape the domain by grouping (classifying) nodes into sets where all nodes with a certain label belong to the same set.

For example, all nodes representing users could be labeled with the label User. With that in place, you can ask Neo4j to perform operations only on your user nodes, such as finding all users with a given name.

Since labels can be added and removed during runtime, they can also be used to mark temporary states for nodes. A Suspended label could be used to denote bank accounts that are suspended, and a Seasonal label can denote vegetables that are currently in season.

A node can have zero to many labels.

In the example graph, the node labels, Person, Actor, and Movie, are used to describe (classify) the nodes. More labels can be added to express different dimensions of the data.

The following graph shows the use of multiple labels.

Example 5. Multiple labels.
graphdb simple labels multi

4. Relationship

A relationship describes how a connection between a source node and a target node are related. It is possible for a node to have a relationship to itself.

A relationship:

  • Connects a source node and a target node.

  • Has a direction (one direction).

  • Must have a type (one type) to define (classify) what type of relationship it is.

  • Can have properties (key-value pairs), which further describe the relationship.

Relationships organize nodes into structures, allowing a graph to resemble a list, a tree, a map, or a compound entity — any of which may be combined into yet more complex, richly inter-connected structures.

Example 6. Relationship.
graph example relationship

The node type: ACTED_IN

The properties are:

  • roles: ['Forrest']

  • performance: 5

The roles property has an array value with a single item ('Forrest') in it.

The relationship can be created with Cypher using the query:

CREATE ()-[:ACTED_IN {roles: ['Forrest'], performance: 5}]->()

You must create or reference a source node and a target node to be able to create a relationship.

Relationships always have a direction. However, the direction can be disregarded where it is not useful. This means that there is no need to add duplicate relationships in the opposite direction unless it is needed to describe the data model properly.

A node can have relationships to itself. To express that Tom Hanks KNOWS himself would be expressed as:

Example 7. Relationship to a single node.
graphdb nodes and rel self

5. Relationship type

A relationship must have exactly one relationship type.

Below is an ACTED_IN relationship, with the Tom Hanks node as the source node and Forrest Gump as the target node.

Example 8. Relationsip type.
graphdb nodes and rel

Observe that the Tom Hanks node has an outgoing relationship, while the Forrest Gump node has an incoming relationship.

6. Properties

Properties are key-value pairs that are used for storing data on nodes and relationships.

The value part of a property:

  • Can hold different data types, such as number, string, or boolean.

  • Can hold a homogeneous list (array) containing, for example, strings, numbers, or boolean values.

Example 9. Number
CREATE (:Example {a: 1, b: 3.14})
  • The property a has the type integer with the value 1.

  • The property b has the type float with the value 3.14.

Example 10. String and boolean
CREATE (:Example {c: 'This is an example string', d: true, e: false})
  • The property c has the type string with the value 'This is an example string'.

  • The property d has the type boolean with the value true.

  • The property e has the type boolean with the value false.

Example 11. Lists
CREATE (:Example {f: [1, 2, 3], g: [2.71, 3.14], h: ['abc', 'example'], i: [true, true, false]})
  • The property f contains an array with the value [1, 2, 3].

  • The property g contains an array with the value [2.71, 3.14].

  • The property h contains an array with the value ['abc', 'example'].

  • The property i contains an array with the value [true, true, false].

For a thorough description of the available data types, refer to the Cypher manual → Values and types.

7. Traversals and paths

A traversal is how you query a graph in order to find answers to questions, for example: "What music do my friends like that I don’t yet own?", or "What web services are affected if this power supply goes down?".

Traversing a graph means visiting nodes by following relationships according to some rules. In most cases only a subset of the graph is visited.

Example 12. Path matching.

To find out which movies Tom Hanks acted in according to the tiny example database, the traversal would start from the Tom Hanks node, follow any ACTED_IN relationships connected to the node, and end up with Forrest Gump as the result (see the dashed lines):

graphdb traversal

The traversal result could be returned as a path with the length 1:

graphdb path

The shortest possible path has length zero. It contains a single node and no relationships.

Example 13. Path of length zero.

A path containing only a single node has the length of 0.

graphdb path zero
Example 14. Path of length one.

A path containing one relationship has the lenght of 1.

graphdb path example loop

8. Schema

A schema in Neo4j refers to indexes and constraints.

Neo4j is often described as schema optional, meaning that it is not necessary to create indexes and constraints. You can create data — nodes, relationships and properties — without defining a schema up front. Indexes and constraints can be introduced when desired, in order to gain performance or modeling benefits.

9. Indexes

Indexes are used to increase performance. To see examples of how to work with indexes, see Using indexes. For detailed descriptions of how to work with indexes in Cypher, see Cypher Manual → Indexes.

10. Constraints

Constraints are used to make sure that the data adheres to the rules of the domain. To see examples of how to work with constraints, see Using constraints. For detailed descriptions of how to work with constraints in Cypher, see the Cypher manual → Constraints.

11. Naming conventions

Node labels, relationship types, and properties (the key part) are case sensitive, meaning, for example, that the property name is different from the property Name.

The following naming conventions are recommended:

Table 1. Naming conventions
Graph entity Recommended style Example

Node label

Camel case, beginning with an upper-case character

:VehicleOwner rather than :vehice_owner

Relationship type

Upper case, using underscore to separate words

:OWNS_VEHICLE rather than :ownsVehicle

Property

Lower camel case, beginning with a lower-case character

firstName rather than first_name

For the precise naming rules, refer to the Cypher manual → Naming rules and recommendations.