Composing large statements

Example graph

We continue using the same example data as before:

CREATE (matrix:Movie {title: 'The Matrix', released: 1997})
CREATE (cloudAtlas:Movie {title: 'Cloud Atlas', released: 2012})
CREATE (forrestGump:Movie {title: 'Forrest Gump', released: 1994})
CREATE (keanu:Person {name: 'Keanu Reeves', born: 1964})
CREATE (robert:Person {name: 'Robert Zemeckis', born: 1951})
CREATE (tom:Person {name: 'Tom Hanks', born: 1956})
CREATE (tom)-[:ACTED_IN {roles: ['Forrest']}]->(forrestGump)
CREATE (tom)-[:ACTED_IN {roles: ['Zachry']}]->(cloudAtlas)
CREATE (robert)-[:DIRECTED]->(forrestGump)

This is the resulting graph:

cypher intro results01 arr
Figure 1. Movie graph

UNION

If you want to combine the results of two statements that have the same result structure, you can use UNION [ALL].

For example, the following query lists both actors and directors:

MATCH (actor:Person)-[r:ACTED_IN]->(movie:Movie)
RETURN actor.name AS name, type(r) AS type, movie.title AS title
UNION
MATCH (director:Person)-[r:DIRECTED]->(movie:Movie)
RETURN director.name AS name, type(r) AS type, movie.title AS title
Rows: 3

+-------------------------------------------------+
| name              | type       | title          |
+-------------------------------------------------+
| 'Tom Hanks'       | 'ACTED_IN' | 'Cloud Atlas'  |
| 'Tom Hanks'       | 'ACTED_IN' | 'Forrest Gump' |
| 'Robert Zemeckis' | 'DIRECTED' | 'Forrest Gump' |
+-------------------------------------------------+

Note that the returned columns must be aliased in the same way in all the sub-clauses.

The query above is equivalent to this more compact query:

MATCH (actor:Person)-[r:ACTED_IN|DIRECTED]->(movie:Movie)
RETURN actor.name AS name, type(r) AS type, movie.title AS title

WITH

In Cypher®, you can chain fragments of statements together, similar to how it is done within a data-flow pipeline. Each fragment works on the output from the previous one, and its results can feed into the next one. Only columns declared in the WITH clause are available in subsequent query parts.

The WITH clause is used to combine the individual parts and declare which data flows from one to the other. WITH is similar to the RETURN clause. The difference is that the WITH clause does not finish the query, but prepares the input for the next part. Expressions, aggregations, ordering and pagination can be used in the same way as in the RETURN clause. The only difference is all columns must be aliased.

In the following example, collect the movies someone appeared in and then filter out those which appear in only one movie.

MATCH (person:Person)-[:ACTED_IN]->(m:Movie)
WITH person, count(*) AS appearances, collect(m.title) AS movies
WHERE appearances > 1
RETURN person.name, appearances, movies
Rows: 1

+-------------------------------------------------------------+
| person.name | appearances | movies                          |
+-------------------------------------------------------------+
| 'Tom Hanks' | 2           | ['Cloud Atlas', 'Forrest Gump'] |
+-------------------------------------------------------------+

Using the WITH clause, you can pass values from one section of a query to another. This allows you to perform some intermediate calculations or operations within your query to use later.

The following dataset is used to demonstrate examples below:

To reproduce the graph, run the Cypher code:

CREATE (diana:Person {name: "Diana"})
CREATE (melissa:Person {name: "Melissa", twitter: "@melissa"})
CREATE (dan:Person {name: "Dan", twitter: "@dan", yearsExperience: 6})
CREATE (sally:Person {name: "Sally", yearsExperience: 4})
CREATE (john:Person {name: "John", yearsExperience: 5})
CREATE (jennifer:Person {name: "Jennifer", twitter: "@jennifer", yearsExperience: 5})
CREATE (joe:Person {name: "Joe"})
CREATE (mark:Person {name: "Mark", twitter: "@mark"})
CREATE (ann:Person {name: "Ann"})
CREATE (xyz:Company {name: "XYZ"})
CREATE (x:Company {name: "Company X"})
CREATE (a:Company {name: "Company A"})
CREATE (Neo4j:Company {name: "Neo4j"})
CREATE (abc:Company {name: "ABC"})
CREATE (query:Technology {type: "Query Languages"})
CREATE (etl:Technology {type: "Data ETL"})
CREATE (integrations:Technology {type: "Integrations"})
CREATE (graphs:Technology {type: "Graphs"})
CREATE (dev:Technology {type: "Application Development"})
CREATE (java:Technology {type: "Java"})
CREATE (diana)-[:LIKES]->(query)
CREATE (melissa)-[:LIKES]->(query)
CREATE (dan)-[:LIKES]->(etl)<-[:LIKES]-(melissa)
CREATE (xyz)<-[:WORKS_FOR]-(sally)-[:LIKES]->(integrations)<-[:LIKES]-(dan)
CREATE (sally)<-[:IS_FRIENDS_WITH]-(john)-[:LIKES]->(java)
CREATE (john)<-[:IS_FRIENDS_WITH]-(jennifer)-[:LIKES]->(java)
CREATE (john)-[:WORKS_FOR]->(xyz)
CREATE (sally)<-[:IS_FRIENDS_WITH]-(jennifer)-[:IS_FRIENDS_WITH]->(melissa)
CREATE (joe)-[:LIKES]->(query)
CREATE (x)<-[:WORKS_FOR]-(diana)<-[:IS_FRIENDS_WITH]-(joe)-[:IS_FRIENDS_WITH]->(mark)-[:LIKES]->(graphs)<-[:LIKES]-(jennifer)-[:WORKS_FOR]->(Neo4j)
CREATE (ann)<-[:IS_FRIENDS_WITH]-(jennifer)-[:IS_FRIENDS_WITH]->(mark)
CREATE (john)-[:LIKES]->(dev)<-[:LIKES]-(ann)-[:IS_FRIENDS_WITH]->(dan)-[:WORKS_FOR]->(abc)
CREATE (ann)-[:WORKS_FOR]->(abc)
CREATE (a)<-[:WORKS_FOR]-(melissa)-[:LIKES]->(graphs)<-[:LIKES]-(diana)

You must specify the variables in the WITH clause that you want to use later. Only those variables are passed on to the next part of the query. There are a variety of ways to use this functionality (e.g. count, collect, filter, limit results).

For more information on how to use WITH, see the Cypher Manual section.

//Query1: find and list the technologies people like
MATCH (a:Person)-[r:LIKES]-(t:Technology)
WITH a.name AS name, collect(t.type) AS technologies
RETURN name, technologies;

Query1 results:

Rows: 9

+----------------------------------------------------------+
| name        | technologies                               |
+----------------------------------------------------------+
| 'Sally'     | ['Integrations']                           |
| 'Dan'       | ['Data ETL', 'Integrations']               |
| 'John'      | ['Java', 'Application Development']        |
| 'Diana'     | ['Query Languages', 'Graphs']              |
| 'Jennifer'  | ['Java', 'Graphs']                         |
| 'Ann'       | ['Application Development']                |
| 'Mark'      | ['Graphs']                                 |
| 'Joe'       | ['Query Languages']                        |
| 'Melissa'   | ['Query Languages', 'Data ETL', 'Graphs']  |
+----------------------------------------------------------+
//Query2: find number of friends who have other friends
MATCH (p:Person)-[:IS_FRIENDS_WITH]->(friend:Person)
WITH p, collect(friend.name) AS friendsList, count{(friend)-[:IS_FRIENDS_WITH]-(:Person)} AS numberOfFoFs
WHERE numberOfFoFs > 1
RETURN p.name, friendsList, numberOfFoFs;

Query2 results:

Rows: 3

+---------------------------------------------------+-----------------+
| p.name        | friendList                        | numberOfFoFs    |
+---------------------------------------------------+-----------------+
| 'Joe'         | ['Mark']                          | 2               |
| 'Jennifer'    | ['Sally', 'John', 'Ann', 'Mark']  | 2               |
| 'John'        | ['Sally']                         | 2               |
+---------------------------------------------------+-----------------+

In the first query, the Person name and a collected list of the Technology types are passed. Therefore, only these items can be referenced in the RETURN clause. Neither the relationship (r) nor the Person birthdate can be used because those values were not passed along.

In the second query, only p and any of its properties (name, birthdate, yearsExperience, twitter), the collection of friends (as a whole, not each value), and the number of friend-of-friends can be referenced. Since those values were passed along in the WITH clause, those can be used in WHERE or RETURN clauses.

WITH requires all values passed to have a variable (if they do not already have one). The Person nodes are given a variable (p) in the MATCH clause, so no variable needs to be assigned there.

WITH is also very helpful for setting up parameters before the query. Often useful for parameter keys, URL strings, and other query variables when importing data.

//Find people with 2-6 years of experience
WITH 2 AS experienceMin, 6 AS experienceMax
MATCH (p:Person)
WHERE experienceMin <= p.yearsExperience <= experienceMax
RETURN p