If you have used SQL and want to learn Cypher, this chapter is for you! We won’t dig very deep into either of the languages, but focus on bridging the gap.

Data Model

For our example, we will use data about persons who act in, direct, and produce movies.

Here’s an entity-relationship model for the example:

"Person" -> "Movie" [label="acted in" arrowhead="crow" arrowtail="crow" dir="both"]
"Person" -> "Movie" [label="directed" arrowhead="crow" arrowtail="crow" dir="both"]
"Person" -> "Movie" [label="produced" arrowhead="crow" arrowtail="crow" dir="both"]

We have Person and Movie entities, which are related in three different ways, each of which have many-to-many cardinality.

In a RDBMS we would use tables for the entities as well as for the associative entities (join tables) needed. In this case we decided to go with the following tables: movie, person, acted_in, directed, produced. You’ll find the SQL for this below.

In Neo4j, the basic data units are nodes and relationships. Both can have properties, which correspond to attributes in a RDBMS.

Nodes can be grouped by putting labels on them. In the example, we will use the labels Movie and Person.

When using Neo4j, related entities can be represented directly by using relationships. There’s no need to deal with foreign keys to handle the relationships, the database will take care of such mechanics. Also, the relationships always have full referential integrity. There are no constraints to enable for this, as it’s not optional; it’s really part of the underlying data model. Relationships always have a type, and we will differentiate the different kinds of relationships by using the types ACTED_IN, DIRECTED, PRODUCED.

Sample Data

First off, let’s see how to set up our example data in a RDBMS. We’ll start out creating a few tables and then go on to populate them.

  title VARCHAR(100),
  released INTEGER,
  tagline VARCHAR(100)
  name VARCHAR(100),
  born INTEGER
CREATE TABLE acted_in (
  role varchar(100),
  person_id INTEGER,
  movie_id INTEGER
CREATE TABLE directed (
  person_id INTEGER,
  movie_id INTEGER
CREATE TABLE produced (
  person_id INTEGER,
  movie_id INTEGER

Populating with data:

INSERT INTO movie (id, title, released, tagline)
  (1, 'The Matrix', 1999, 'Welcome to the Real World'),
  (2, 'The Devil''s Advocate', 1997, 'Evil has its winning ways'),
  (3, 'Monster', 2003, 'The first female serial killer of America')
INSERT INTO person (id, name, born)
  (1, 'Keanu Reeves', 1964),
  (2, 'Carrie-Anne Moss', 1967),
  (3, 'Laurence Fishburne', 1961),
  (4, 'Hugo Weaving', 1960),
  (5, 'Andy Wachowski', 1967),
  (6, 'Lana Wachowski', 1965),
  (7, 'Joel Silver', 1952),
  (8, 'Charlize Theron', 1975),
  (9, 'Al Pacino', 1940),
  (10, 'Taylor Hackford', 1944)
INSERT INTO acted_in (role, person_id, movie_id)
  ('Neo', 1, 1),
  ('Trinity', 2, 1),
  ('Morpheus', 3, 1),
  ('Agent Smith', 4, 1),
  ('Kevin Lomax', 1, 2),
  ('Mary Ann Lomax', 8, 2),
  ('John Milton', 9, 2),
  ('Aileen', 8, 3)
INSERT INTO directed (person_id, movie_id)
  (5, 1),
  (6, 1),
  (10, 2)
INSERT INTO produced (person_id, movie_id)
  (7, 1),
  (8, 3)

Doing this in Neo4j will look quite different. To begin with, we won’t create any schema up front. We’ll come back to schema later, for now it’s enough to know that labels can be used right away without declaring them.

CREATE in Neo4j

In the CREATE statements below, we tell Neo4j what data we want to have in the graph. Simply put, the parentheses denote nodes, while the arrows (-→, or in our case with a relationship type included -[:DIRECTED]→) denote relationships. For the nodes we set identifiers like TheMatrix so we can easily refer to them later on in the statement. Note that the identifiers are scoped to the statement, and not visible to other Cypher statements. We could use identifiers for the relationships as well, but there’s no need for that in this case.

CREATE (TheMatrix:Movie {title:'The Matrix', released:1999, tagline:'Welcome to the Real World'})
CREATE (Keanu:Person    {name:'Keanu Reeves',       born:1964})
CREATE (Carrie:Person   {name:'Carrie-Anne Moss',   born:1967})
CREATE (Laurence:Person {name:'Laurence Fishburne', born:1961})
CREATE (Hugo:Person     {name:'Hugo Weaving',       born:1960})
CREATE (AndyW:Person    {name:'Andy Wachowski',     born:1967})
CREATE (LanaW:Person    {name:'Lana Wachowski',     born:1965})
CREATE (JoelS:Person    {name:'Joel Silver',        born:1952})
  (Keanu)    -[:ACTED_IN {roles:['Neo']}]->         (TheMatrix),
  (Carrie)   -[:ACTED_IN {roles:['Trinity']}]->     (TheMatrix),
  (Laurence) -[:ACTED_IN {roles:['Morpheus']}]->    (TheMatrix),
  (Hugo)     -[:ACTED_IN {roles:['Agent Smith']}]-> (TheMatrix),
  (AndyW)    -[:DIRECTED]->    (TheMatrix),
  (LanaW)    -[:DIRECTED]->    (TheMatrix),
  (JoelS)    -[:PRODUCED]->    (TheMatrix)
CREATE (TheDevilsAdvocate:Movie {title:"The Devil's Advocate", released:1997, tagline: 'Evil has its winning ways'})
CREATE (Monster:Movie {title: 'Monster', released: 2003, tagline: 'The first female serial killer of America'})
CREATE (Charlize:Person {name:'Charlize Theron', born:1975})
CREATE (Al:Person       {name:'Al Pacino',       born:1940})
CREATE (Taylor:Person   {name:'Taylor Hackford', born:1944})
  (Keanu)    -[:ACTED_IN {roles:['Kevin Lomax']}]->    (TheDevilsAdvocate),
  (Charlize) -[:ACTED_IN {roles:['Mary Ann Lomax']}]-> (TheDevilsAdvocate),
  (Al)       -[:ACTED_IN {roles:['John Milton']}]->    (TheDevilsAdvocate),
  (Taylor)   -[:DIRECTED]->                            (TheDevilsAdvocate),
  (Charlize) -[:ACTED_IN {roles:['Aileen']}]->         (Monster),
  (Charlize) -[:PRODUCED {roles:['Aileen']}]->         (Monster)

Simple read of data

Let’s find all entries in the movie table and output their title attribute in our RDBMS:

SELECT movie.title
FROM movie;

Using Neo4j, find all nodes labeled Movie and output their title property:

MATCH (movie:Movie)
RETURN movie.title;

MATCH tells Neo4j to match a pattern in the graph. In this case the pattern is very simple: any node with a Movie label on it. We bind the result of the pattern matching to the identifier movie, for use in the RETURN clause. And as you can see, the RETURN keyword of Cypher is similar to SELECT in SQL.

Now let’s get movies released after 1998.

SELECT movie.title
FROM movie
WHERE movie.released > 1998;

In this case the addition actually looks identical in Cypher.

MATCH (movie:Movie)
WHERE movie.released > 1998
RETURN movie.title;

Note however that the semantics of WHERE in Cypher is somewhat different, see [query-where] for more information.


Let’s list all persons and the movies they acted in.

SELECT, movie.title
FROM person
  JOIN acted_in AS acted_in ON acted_in.person_id =
  JOIN movie ON acted_in.movie_id =;

The same using Cypher:

Here we match a Person and a Movie node, in case they are connected with an ACTED_IN relationship.

MATCH (person:Person)-[:ACTED_IN]->(movie:Movie)
RETURN, movie.title;

Co-Actors of Keanu Reeves

To make things slightly more complex, let’s search for the co-actors of Keanu Reeves. In SQL we use a self join on the person table and join on the acted_in table once for Keanu, and once for the co-actors.

FROM person AS keanu
  JOIN acted_in AS acted_in1 ON acted_in1.person_id =
  JOIN acted_in AS acted_in2 ON acted_in2.movie_id = acted_in1.movie_id
  JOIN person AS co_actor
    ON acted_in2.person_id = AND <>
WHERE = 'Keanu Reeves';

In Cypher, we use a pattern with two paths that target the same Movie node.

MATCH (keanu:Person)-[:ACTED_IN]->(movie:Movie),
WHERE = 'Keanu Reeves'

You may have noticed that we used the <> predicate in SQL only. This is because Neo4j will only match on the ACTED_IN relationship once in the same pattern. If this is not what we want, we can split the pattern up by using two MATCH clauses like this:

MATCH (keanu:Person)-[:ACTED_IN]->(movie:Movie)
MATCH (coActor:Person)-[:ACTED_IN]->(movie)
WHERE = 'Keanu Reeves'

This time Keanu Reeves is included in the result as well

Who are the Actor/Producers?

Next, let’s find out who has both acted in and produced movies.

FROM person
WHERE IN (SELECT person_id FROM acted_in)
  AND IN (SELECT person_id FROM produced)

In Cypher, we use patterns as predicates in this case. That is, we require the relationships to exist, but don’t care about the connected nodes; thus the empty parentheses.

MATCH (person:Person)
WHERE (person)-[:ACTED_IN]->() AND (person)-[:PRODUCED]->()


Now let’s find out a bit about the directors in movies that Keanu Reeves acted in. We want to know how many of those movies each of them directed.

SELECT, count(*)
FROM person keanu
  JOIN acted_in ON = acted_in.person_id
  JOIN directed ON acted_in.movie_id = directed.movie_id
  JOIN person AS director ON directed.person_id =
WHERE = 'Keanu Reeves'
ORDER BY count(*) DESC

Here’s how we’ll do the same in Cypher:

MATCH (keanu:Person {name: 'Keanu Reeves'})-[:ACTED_IN]->(movie:Movie),
RETURN, count(*)
ORDER BY count(*) DESC

As you can see there is no GROUP BY in the Cypher equivalent. Instead, Neo4j will automatically figure out the grouping key.