1.2. Graph database concepts

This chapter contains an introduction to the graph data model.

1.2.1. The Neo4j graph database

A graph database stores data in a graph, the most generic of data structures, capable of elegantly representing any kind of data in a highly accessible way. The Neo4j graph is based on the property graph model.

For graph database terminology, see Appendix B, Terminology.

Here’s an example graph which we will approach step by step in the following sections:

alt

1.2.1.1. Nodes

A node in Neo4j is a node as described in the property graph model, with properties and labels.

Nodes are often used to represent entities, but depending on the domain relationships may be used for that purpose as well.

The simplest possible graph is a single node. Consider the graph below, consisting of one node with a single property title:

alt

Let’s add two more nodes and one more property on the node in the previous example:

alt

1.2.1.2. Relationships

A relationship in Neo4j is a relationship as described in the property graph model, with a relationship type and properties.

Relationships between nodes are the key feature of graph databases, as they allow for finding related data. A relationship connects two nodes, and is guaranteed to have a valid source and target node.

Relationships organize nodes into arbitrary structures, allowing a graph to resemble a list, a tree, a map, or a compound entity — any of which may be combined into yet more complex, richly inter-connected structures.

Our example graph will make a lot more sense once we add relationships to it:

alt

Our example uses ACTED_IN and DIRECTED as relationship types. The roles property on the ACTED_IN relationship has an array value with a single item in it.

Below is an ACTED_IN relationship, with the Tom Hanks node as the source node and Forrest Gump as the target node.

alt

We observe that the Tom Hanks node has an outgoing relationship, while the Forrest Gump node has an incoming relationship.

This means that there is no need to add duplicate relationships in the opposite direction (with regard to traversal or performance).

While relationships always have a direction, you can ignore the direction where it is not useful in your application.

Note that a node can have relationships to itself as well:

alt

The example above would mean that Tom Hanks KNOWS himself.

Let’s have a look at what can be found by simply following the relationships of a node in our example graph:

alt
Table 1.1. Using relationship direction and type
What we want to know Start from Relationship type Direction

get actors in movie

:Movie node

:ACTED_IN

incoming

get movies with actor

:Person node

:ACTED_IN

outgoing

get directors of movie

:Movie node

:DIRECTED

incoming

get movies directed by

:Person node

:DIRECTED

outgoing

1.2.1.3. Properties

A property in Neo4j is a property as described in the property graph model. Both nodes and relationships may have properties.

Properties are named values where the name (or key) is a string. The supported property values are:

  • Numeric values
  • String values
  • Boolean values
  • Lists of any of the above values

null is not a valid property value. Instead of storing it in the database, null can be modeled by the absence of a property key.

Table 1.2. Property value types
Type Description Value range

boolean

binary logic value

true/false

integer

64-bit integer

-9223372036854775808 to 9223372036854775807, inclusive

float

64-bit IEEE 754 floating-point number

-

String

sequence of Unicode characters

infinite

For further details on types and values, see the Cypher type system CIP.

1.2.1.4. Labels

A label in Neo4j is a label as described in the property graph model. Labels assign roles or types to nodes.

A label is a named graph construct that is used to group nodes into sets; all nodes labeled with the same label belongs to the same set. Many database queries can work with these sets instead of the whole graph, making queries easier to write and more efficient to execute. A node may be labeled with any number of labels, including none, making labels an optional addition to the graph.

Labels are used when defining constraints and adding indexes for properties (see Section 1.2.1.7, “Schema”).

For example, all nodes representing users could be labeled with the label :User. With that in place, you can ask Neo4j to perform operations only on your user nodes, such as finding all users with a given name.

However, you can use labels for much more. For instance, since labels can be added and removed during runtime, they can be used to mark temporary states for your nodes. A :Suspended label could be used to denote bank accounts that are suspended, a :Seasonal label to denote vegetables that are currently in season, and so on.

In our example, we’ll add :Person and :Movie labels to our graph:

alt

To exemplify how nodes may have multiple labels, let’s add an :Actor label to the Tom Hanks node.

alt
Label names

Any non-empty Unicode string can be used as a label name. In Cypher, you may need to use the backtick (`) syntax to avoid clashes with Cypher identifier rules or to allow non-alphanumeric characters in a label. By convention, labels are written with CamelCase notation, with the first letter in upper case; for instance, User or CarOwner. For more information on styling Cypher queries, refer to the Cypher style guide.

Labels have an id space of an int, meaning the maximum number of labels the database can contain is roughly 2 billion.

1.2.1.5. Traversal

A traversal navigates through a graph to find paths.

A traversal is how you query a graph, navigating from starting nodes to related nodes, finding answers to questions like "what music do my friends like that I don’t yet own," or "if this power supply goes down, what web services are affected?"

Traversing a graph means visiting its nodes, following relationships according to some rules. In most cases only a subgraph is visited, as you already know where in the graph the interesting nodes and relationships are found.

Cypher provides a declarative way to query the graph powered by traversals and other techniques. See Chapter 3, Cypher for more information.

If we want to find out which movies Tom Hanks acted in according to our tiny example database, the traversal would start from the Tom Hanks node, follow any :ACTED_IN relationships connected to the node, and end up with Forrest Gump as the result (see the dashed lines):

alt

1.2.1.6. Paths

A path in Neo4j is a path as described in the property graph model. Paths are retrieved from a Cypher query or traversal.

In the previous example, the traversal result could be returned as a path:

alt

The path above has length one.

The shortest possible path has length zero — that is, it contains only a single node and no relationships — and can look like this:

alt

This path has length one:

alt

1.2.1.7. Schema

Neo4j is a schema-optional graph database.

You can use Neo4j without any schema. Optionally, you can introduce it in order to gain performance or modeling benefits. This allows a way of working where the schema does not get in your way until you are at a stage where you want to reap the benefits of having one.

Schema commands can only be applied on the master machine in a Neo4j cluster. If you apply them on a slave you will receive a Neo.ClientError.Transaction.InvalidType error code (see Section A.1, “Neo4j Status Codes”).

Indexes

Performance is gained by creating indexes, which improve the speed of looking up nodes in the database.

Once you have specified which properties to index, Neo4j will make sure your indexes are kept up to date as your graph evolves. Any operation that looks up nodes by the newly indexed properties will see a significant performance boost.

Indexes in Neo4j are eventually available. That means that when you first create an index the operation returns immediately. The index is populating in the background and so is not immediately available for querying. When the index has been fully populated it will eventually come online. That means that it is now ready to be used in queries.

If something should go wrong with the index, it can end up in a failed state. When it is failed, it will not be used to speed up queries. To rebuild it, you can drop and recreate the index. Look at logs for clues about the failure.

For working with indexes in Cypher, see Section 3.5.1, “Indexes”.

Constraints

Neo4j can help keep your data clean. It does so using constraints. Constraints allow you to specify the rules for what your data should look like. Any changes that break these rules will be denied.

For working with constraints in Cypher, see Section 3.5.2, “Constraints”.