Defining a schema

1. Example graph

First we create some data to use for our examples:

CREATE (matrix:Movie {title: 'The Matrix', released: 1997})
CREATE (cloudAtlas:Movie {title: 'Cloud Atlas', released: 2012})
CREATE (forrestGump:Movie {title: 'Forrest Gump', released: 1994})
CREATE (keanu:Person {name: 'Keanu Reeves'})
CREATE (robert:Person {name: 'Robert Zemeckis', born: 1951})
CREATE (tom:Person {name: 'Tom Hanks', born: 1956})
CREATE (tom)-[:ACTED_IN {roles: ['Forrest']}]->(forrestGump)
CREATE (tom)-[:ACTED_IN {roles: ['Zachry']}]->(cloudAtlas)
CREATE (robert)-[:DIRECTED]->(forrestGump)

This is the resulting graph:

cypher intro schema01

2. Using indexes

The main reason for using indexes in a graph database is to find the starting point of a graph traversal. Once that starting point is found, the traversal relies on in-graph structures to achieve high performance.

Indexes can be added at any time. Note, however, that if there is existing data in the database, it will take some time for an index to come online.

In this case we want to create an index to speed up finding actors by name in the database:


In most cases it is not necessary to specify indexes when querying for data, as the appropriate indexes will be used automatically. For example, the following query will automatically use the index defined above:

MATCH (actor:Actor {name: 'Tom Hanks'})
RETURN actor

A composite index is an index on multiple properties for all nodes that have a particular label. For example, the following statement will create a composite index on all nodes labeled with Actor and which have both a name and a born property. Note that since the node with the Actor label that has a name of "Keanu Reeves" does not have the born property. Therefore that node will not be added to the index.

CREATE INDEX FOR (a:Actor) ON (, a.born)

We can inspect our database to find out what indexes are defined. We do this by calling the built-in procedure db.indexes:

CALL db.indexes
YIELD description, tokenNames, properties, type;
Rows: 2

| description                   | tokenNames | properties       | type                  |
| 'INDEX ON :Actor(name)'       | ['Actor']  | ['name']         | 'node_label_property' |
| 'INDEX ON :Actor(name, born)' | ['Actor']  | ['name', 'born'] | 'node_label_property' |

Learn more about indexes in Cypher Manual → Indexes.

It is possible to specify which index to use in a particular query, using index hints. This is one of several options for query tuning, described in detail in Cypher manual → Query tuning.

3. Using constraints

Constraints are used to make sure that the data adheres to the rules of the domain. For example: "If a node has a label of Actor and a property of name, then the value of name must be unique among all nodes that have the Actor label".

To create a constraint that makes sure that our database will never contain more than one node with the label Movie and the property title, we use the IS UNIQUE syntax:


Adding the unique constraint will implicitly add an index on that property. If the constraint is dropped, but the index is still needed, the index will have to be created explicitly.

Constraints can be added to database that already has data in it. This requires that the existing data complies with the constraint that is being added.

We can inspect our database to find out what constraints are defined. We do this by calling the built-in procedure db.constraints:

CALL db.constraints
Rows: 1

| description                                                |
| 'CONSTRAINT ON (movie:Movie) ASSERT movie.title IS UNIQUE' |

The constraint described above is available for all editions of Neo4j. Additional constraints are available for Neo4j Enterprise Edition.

Learn more about constraints in Cypher manual → Constraints.