Knowledge Base

Cross Product Cypher queries will not perform well

Just like SQL, if you do not properly connect the parts of your query, it will result in a cross (cartesian) product, which is seldom what you want. Take the following example:

MATCH (p:Person), (m:Movie)
RETURN p, m;

In Cypher, what happens is that p contains all of the nodes in the graph with the :Person label, and m contains all of the nodes in the graph with the :Movie label. Returning both of these results in a combination of each node p being returned with each node m, like so:

If there are three nodes with label Person:

  • Neo,

  • Trinity, and

  • Morpheus

and three nodes with label Movie:

  • The Matrix,

  • The Matrix Reloaded, and

  • The Matrix Revolutions

The result of the above Cypher would be:

p m

Neo

The Matrix

Neo

The Matrix Reloaded

Neo

The Matrix Revolutions

Trinity

The Matrix

Trinity

The Matrix Reloaded

Trinity

The Matrix Revolutions

Morpheus

The Matrix

Morpheus

The Matrix Reloaded

Morpheus

The Matrix Revolutions

Keep in mind, this is a simple example, so the result set is small. With a production size graph, this would be a very large, potentially memory intensive query.

In general, inadvertent cross products happen in more complex queries. They are common in queries with many WITH clauses, and a close look at the query is needed to flush out the issue. By following general performance best practices, this can easily be avoided. Be as specific with your query as possible, make sure to use identifiers to properly tie parts of the query together, and only return the data you need. And profile your slow queries so that you can see where the time and effort is spent.

From Neo4j 2.3 on there is a warning issued in Neo4j browser or if you run your query with EXPLAIN that highlights this issue.