Query tuning

Neo4j aims to execute queries as fast as possible.

However, when optimizing for maximum query execution performance, it may be helpful to rephrase queries using knowledge about the domain and the application.

The overall goal of manual query performance optimization is to ensure that only necessary data is retrieved from the graph. At the very least, data should get filtered out as early as possible in order to reduce the amount of work that has to be done in the later stages of query execution. This also applies to what gets returned: returning whole nodes and relationships ought to be avoided in favour of selecting and returning only the data that is needed. You should also make sure to set an upper limit on variable length patterns, so they don’t cover larger portions of the dataset than needed.

Each Cypher® query gets optimized and transformed into an execution plan by the Cypher query planner. To minimize the resources used for this, try to use parameters instead of literals when possible. This allows Cypher to re-use your queries instead of having to parse and build new execution plans.

To read more about the execution plan operators mentioned in this chapter, see Execution plans.

Cypher query options

Query execution can be fine-tuned through the use of query options. In order to use one or more of these options, the query must be prepended with CYPHER, followed by the query option(s), as exemplified thus: CYPHER query-option [further-query-options] query.

Cypher version

Occasionally, there is a requirement to use a previous version of the Cypher compiler when running a query. Here we detail the available versions:

Query option Description Default?

2.3

This will force the query to use Neo4j Cypher 2.3.

3.4

This will force the query to use Neo4j Cypher 3.4.

3.5

This will force the query to use Neo4j Cypher 3.5. As this is the default version, it is not necessary to use this option explicitly.

X

Cypher query planner

Each query is turned into an execution plan by the Cypher query planner. The execution plan tells Neo4j which operations to perform when executing the query.

Neo4j uses a cost-based execution planning strategy (known as the 'cost' planner): the statistics service in Neo4j is used to assign a cost to alternative plans and picks the cheapest one.

Cypher rule planner

All versions of Neo4j prior to Neo4j 3.2 also included a rule-based planner, which used rules to produce execution plans. This planner considered available indexes, but did not use statistical information to guide the query compilation. The rule planner was removed in Neo4j 3.2 owing to inferior query execution performance when compared with the cost planner. It can still be used, but doing so will fallback to the Neo4j 3.1 rule planner.

Option Description Default?

planner=rule

This will force the query to use the rule planner, and will therefore cause the query to fall back to using Cypher 3.1.

planner=cost

Neo4j 3.5 uses the cost planner for all queries, and therefore it is not necessary to use this option explicitly.

X

It is also possible to change the default planner by using the cypher.planner configuration setting (see Operations Manual → Configuration Settings).

You can see which planner was used by looking at the execution plan.

When Cypher is building execution plans, it looks at the schema to see if it can find indexes it can use. These index decisions are only valid until the schema changes, so adding or removing indexes leads to the execution plan cache being flushed.

Cypher runtime

Using the execution plan, the query is executed — and records returned — by the Cypher runtime. Depending on whether Neo4j Enterprise Edition or Neo4j Community Edition is used, there are three different runtimes available. In Enterprise Edition, the Cypher query planner selects the runtime, falling back to alternative runtimes as follows:

  • Try the compiled runtime first.

  • If the compiled runtime does not support the query, then fall back to use the slotted runtime.

  • Finally, if the slotted runtime does not support the query, fall back to the interpreted runtime. The interpreted runtime supports all queries.

Interpreted

In this runtime, the operators in the execution plan are chained together in a tree, where each non-leaf operator feeds from one or two child operators. The tree thus comprises nested iterators, and the records are streamed in a pipelined manner from the top iterator, which pulls from the next iterator and so on.

Slotted

This is very similar to the interpreted runtime, except that there are additional optimizations regarding the way in which the records are streamed through the iterators. This results in improvements to both the performance and memory usage of the query. In effect, this can be thought of as the 'faster interpreted' runtime.

Compiled

Algorithms are employed to intelligently group the operators in the execution plan in order to generate new combinations and orders of execution which are optimised for performance and memory usage. While this should lead to superior performance in most cases (over both the interpreted and slotted runtimes), it is still under development and does not support all possible operators or queries (the slotted runtime covers all operators and queries).

Option Description Default?

runtime=interpreted

This will force the query planner to use the interpreted runtime.

This is not used in Enterprise Edition unless explicitly asked for. It is the only option for all queries in Community Edition—​it is not necessary to specify this option in Community Edition.

runtime=slotted

This will cause the query planner to use the slotted runtime.

This is the default option for all queries which are not supported by runtime=compiled in Enterprise Edition.

runtime=compiled

This will cause the query planner to use the compiled runtime if it supports the query. If the compiled runtime does not support the query, the planner will fall back to the slotted runtime.

This is the default option for some queries in Enterprise Edition.

Profiling a query

There are two options to choose from when you want to analyze a query by looking at its execution plan:

EXPLAIN

If you want to see the execution plan but not run the statement, prepend your Cypher statement with EXPLAIN. The statement will always return an empty result and make no changes to the database.

PROFILE

If you want to run the statement and see which operators are doing most of the work, use PROFILE. This will run your statement and keep track of how many rows pass through each operator, and how much each operator needs to interact with the storage layer to retrieve the necessary data. Please note that profiling your query uses more resources, so you should not profile unless you are actively working on a query.

See Execution plans for a detailed explanation of each of the operators contained in an execution plan.

Being explicit about what types and labels you expect relationships and nodes to have in your query helps Neo4j use the best possible statistical information, which leads to better execution plans. This means that when you know that a relationship can only be of a certain type, you should add that to the query. The same goes for labels, where declaring labels on both the start and end nodes of a relationship helps Neo4j find the best way to execute the statement.