Chapter 2. Graph database concepts

This chapter presents an introduction to graph database concepts.

This chapter includes the following sections:

2.1. Example graph

We will use the example graph below to introduce the basic concepts of the property graph:

alt

2.2. Nodes

Nodes are often used to represent entities. The simplest possible graph is a single node.

Consider the graph below, consisting of a single node.

alt

2.3. Labels

Labels are used to shape the domain by grouping nodes into sets where all nodes that have a certain label belongs to the same set.

For example, all nodes representing users could be labeled with the label :User. With that in place, you can ask Neo4j to perform operations only on your user nodes, such as finding all users with a given name.

Since labels can be added and removed during runtime, they can also be used to mark temporary states for nodes. A :Suspended label could be used to denote bank accounts that are suspended, and a :Seasonal label can denote vegetables that are currently in season.

A node can have zero to many labels.

In the example above, the nodes have the labels Person and Movie, which is one possible way of describing the data. But assume that we want to express different dimensions of the data. One way of doing that is to add more labels.

Below is an example showing the use of multiple labels:

alt

2.4. Relationships

A relationship connects two nodes. Relationships organize nodes into structures, allowing a graph to resemble a list, a tree, a map, or a compound entity — any of which may be combined into yet more complex, richly inter-connected structures.

Our example graph will make a lot more sense once we add relationships to it:

alt

2.5. Relationship types

A relationship must have exactly one relationship type.

Our example uses ACTED_IN and DIRECTED as relationship types. The roles property on the ACTED_IN relationship has an array value with a single item in it.

Below is an ACTED_IN relationship, with the Tom Hanks node as the source node and Forrest Gump as the target node.

alt

We observe that the Tom Hanks node has an outgoing relationship, while the Forrest Gump node has an incoming relationship.

Relationships always have a direction. However, you only have to pay attention to the direction where it is useful. This means that there is no need to add duplicate relationships in the opposite direction unless it is needed in order to properly describe your use case.

Note that a node can have relationships to itself. If we want to express that Tom Hanks KNOWS himself, that would be expressed as:

alt

2.6. Properties

Properties are name-value pairs that are used to add qualities to nodes and relationships.

In our example graphs, we have used the properties name and born on Person nodes, title and released on Movie nodes, and the property roles on the :ACTED_IN relationship.

The value part of the property can hold different data types such as number, string and boolean. For a thorough description of the available data types, refer to the Cypher manual.

2.7. Traversals and paths

A traversal is how you query a graph in order to find answers to questions, for example: "What music do my friends like that I don’t yet own?", or "What web services are affected if this power supply goes down?".

Traversing a graph means visiting nodes by following relationships according to some rules. In most cases only a subset of the graph is visited.

If we want to find out which movies Tom Hanks acted in according to our tiny example database, the traversal would start from the Tom Hanks node, follow any :ACTED_IN relationships connected to the node, and end up with Forrest Gump as the result (see the dashed lines):

alt

The traversal result could be returned as a path with the length one:

alt

The path above has length one.

The shortest possible path has length zero. It contains a single node and no relationships. For example:

alt

This path has length one:

alt

2.8. Schema

A schema in Neo4j refers to indexes and constraints.

Neo4j is often described as schema optional, meaning that it is not necessary to create indexes and constraints. You can create data — nodes, relationships and properties — without defining a schema up front. Indexes and constraints can be introduced when desired, in order to gain performance or modeling benefits.

2.8.1. Indexes

Indexes are used to increase performance. To see examples of how to work with indexes, see Using indexes. For detailed descriptions of how to work with indexes in Cypher, see Cypher manual → Indexes.

2.8.2. Constraints

Constraints are used to make sure that the data adheres to the rules of the domain. To see examples of how to work with indexes, see Using constraints. For detailed descriptions of how to work with constraints in Cypher, see the Cypher manual → Constraints.

2.9. Naming rules and recommendations

Node labels, relationship types and properties are case sensitive, meaning for example that the property name means something different than the property Name. It is recommended to follow the naming conventions described in the following table:

Table 2.1. Naming conventions
Graph entity Recommended style Example

Node label

Camel case, beginning with an upper-case character

:VehicleOwner rather than :vehice_owner

Relationship type

Upper case, using underscore to separate words

:OWNS_VEHICLE rather than :ownsVehicle

Property

Lower camel case, beginning with a lower-case character

firstName rather than first_name

For the precise naming rules, refer to the Cypher manual → Naming rules and recommendations.