This chapter contains an introduction to the graph data model.
A graph database stores data in a graph, the most generic of data structures, capable of elegantly representing any kind of data in a highly accessible way. The Neo4j graph is based on the property graph model.
For graph database terminology, see Appendix B, Terminology.
Here’s an example graph which we will approach step by step in the following sections:
Nodes are often used to represent entities, but depending on the domain relationships may be used for that purpose as well.
The simplest possible graph is a single node.
Consider the graph below, consisting of one node with a single property
Let’s add two more nodes and one more property on the node in the previous example:
Relationships between nodes are the key feature of graph databases, as they allow for finding related data. A relationship connects two nodes, and is guaranteed to have a valid source and target node.
Relationships organize nodes into arbitrary structures, allowing a graph to resemble a list, a tree, a map, or a compound entity — any of which may be combined into yet more complex, richly inter-connected structures.
Our example graph will make a lot more sense once we add relationships to it:
Our example uses
DIRECTED as relationship types.
roles property on the
ACTED_IN relationship has an array value with a single item in it.
Below is an
ACTED_IN relationship, with the
Tom Hanks node as the source node and
Forrest Gump as the target node.
We observe that the
Tom Hanks node has an outgoing relationship, while the
Forrest Gump node has an incoming relationship.
|Relationships are equally well traversed in either direction.|
This means that there is no need to add duplicate relationships in the opposite direction (with regard to traversal or performance).
While relationships always have a direction, you can ignore the direction where it is not useful in your application.
Note that a node can have relationships to itself as well:
The example above would mean that
Let’s have a look at what can be found by simply following the relationships of a node in our example graph:
|What we want to know||Start from||Relationship type||Direction|
get actors in movie
get movies with actor
get directors of movie
get movies directed by
A property in Neo4j is a property as described in the property graph model. Both nodes and relationships may have properties.
Properties are named values where the name (or key) is a string. The supported property types are:
Number, an abstract type, which has the following subtypes:
A label in Neo4j is a label as described in the property graph model. Labels assign roles or types to nodes.
A label is a named graph construct that is used to group nodes into sets; all nodes labeled with the same label belongs to the same set. Many database queries can work with these sets instead of the whole graph, making queries easier to write and more efficient to execute. A node may be labeled with any number of labels, including none, making labels an optional addition to the graph.
Labels are used when defining constraints and adding indexes for properties (see Section 220.127.116.11, “Schema”).
For example, all nodes representing users could be labeled with the label
With that in place, you can ask Neo4j to perform operations only on your user nodes, such as finding all users with a given
However, you can use labels for much more.
For instance, since labels can be added and removed during runtime, they can be used to mark temporary states for your nodes.
:Suspended label could be used to denote bank accounts that are suspended, a
:Seasonal label to denote vegetables that are currently in season, and so on.
In our example, we’ll add
:Movie labels to our graph:
To exemplify how nodes may have multiple labels, let’s add an
:Actor label to the
Tom Hanks node.
Any non-empty Unicode string can be used as a label name.
In Cypher, you may need to use the backtick (`) syntax to avoid clashes with Cypher identifier rules or to allow non-alphanumeric
characters in a label.
By convention, labels are written with CamelCase notation, with the first letter in upper case; for instance,
For more information on styling Cypher queries, refer to the Cypher style guide.
Labels have an id space of an int, meaning the maximum number of labels the database can contain is roughly 2 billion.
A traversal navigates through a graph to find paths.
A traversal is how you query a graph, navigating from starting nodes to related nodes, finding answers to questions like "what music do my friends like that I don’t yet own," or "if this power supply goes down, what web services are affected?"
Traversing a graph means visiting its nodes, following relationships according to some rules. In most cases only a subgraph is visited, as you already know where in the graph the interesting nodes and relationships are found.
Cypher provides a declarative way to query the graph powered by traversals and other techniques. See Chapter 3, Cypher for more information.
If we want to find out which movies Tom Hanks acted in according to our tiny example database, the traversal would start from
Tom Hanks node, follow any
:ACTED_IN relationships connected to the node, and end up with
Forrest Gump as the result (see the dashed lines):
A path in Neo4j is a path as described in the property graph model. Paths are retrieved from a Cypher query or traversal.
In the previous example, the traversal result could be returned as a path:
The path above has length one.
The shortest possible path has length zero — that is, it contains only a single node and no relationships — and can look like this:
This path has length one:
Neo4j is a schema-optional graph database.
You can use Neo4j without any schema. Optionally, you can introduce it in order to gain performance or modeling benefits. This allows a way of working where the schema does not get in your way until you are at a stage where you want to reap the benefits of having one.
Schema commands can only be applied on the master machine in a Neo4j cluster.
If you apply them on a slave you will receive a
Performance is gained by creating indexes, which improve the speed of looking up nodes in the database.
Once you have specified which properties to index, Neo4j will make sure your indexes are kept up to date as your graph evolves. Any operation that looks up nodes by the newly indexed properties will see a significant performance boost.
Indexes in Neo4j are eventually available. That means that when you first create an index the operation returns immediately. The index is populating in the background and so is not immediately available for querying. When the index has been fully populated it will eventually come online. That means that it is now ready to be used in queries.
If something should go wrong with the index, it can end up in a failed state. When it is failed, it will not be used to speed up queries. To rebuild it, you can drop and recreate the index. Look at logs for clues about the failure.
For working with indexes in Cypher, see Section 3.5.1, “Indexes”.
Neo4j can help keep your data clean. It does so using constraints. Constraints allow you to specify the rules for what your data should look like. Any changes that break these rules will be denied.
For working with constraints in Cypher, see Section 3.5.2, “Constraints”.