Query tuning
Neo4j aims to execute queries as fast as possible.
However, when optimizing for maximum query execution performance, it may be helpful to rephrase queries using knowledge about the domain and the application.
The overall goal of manual query performance optimization is to ensure that only necessary data is retrieved from the graph. At the very least, data should get filtered out as early as possible in order to reduce the amount of work that has to be done in the later stages of query execution. This also applies to what gets returned: returning whole nodes and relationships ought to be avoided in favour of selecting and returning only the data that is needed. You should also make sure to set an upper limit on variable length patterns, so they don’t cover larger portions of the dataset than needed.
Each Cypher® query gets optimized and transformed into an execution plan by the Cypher query planner. To minimize the resources used for this, try to use parameters instead of literals when possible. This allows Cypher to re-use your queries instead of having to parse and build new execution plans.
To read more about the execution plan operators mentioned in this chapter, see Execution plans.
Cypher query options
Query execution can be fine-tuned through the use of query options.
In order to use one or more of these options, the query must be prepended with CYPHER
, followed by the query option(s), as exemplified thus: CYPHER query-option [further-query-options] query
.
Cypher version
Occasionally, there is a requirement to use a previous version of the Cypher compiler when running a query. Here we detail the available versions:
Query option | Description | Default |
---|---|---|
|
This will force the query to use Neo4j Cypher 3.5. |
|
|
This will force the query to use Neo4j Cypher 4.1. |
|
|
This will force the query to use Neo4j Cypher 4.2. As this is the default version, it is not necessary to use this option explicitly. |
|
In Neo4j 4.2, the support for Cypher 3.5 is provided only at the parser level. The consequence is that some underlying features available in Neo4j 3.5 are no longer available and will result in runtime errors. Please refer to the discussion in Cypher Compatibility for more information on which features are affected. |
Cypher runtime
Using the execution plan, the query is executed — and records returned — by the Cypher runtime. Depending on whether Neo4j Enterprise Edition or Neo4j Community Edition is used, there are three different runtimes available:
- Interpreted
-
In this runtime, the operators in the execution plan are chained together in a tree, where each non-leaf operator feeds from one or two child operators. The tree thus comprises nested iterators, and the records are streamed in a pipelined manner from the top iterator, which pulls from the next iterator and so on.
- Slotted
-
This is very similar to the interpreted runtime, except that there are additional optimizations regarding the way in which the records are streamed through the iterators. This results in improvements to both the performance and memory usage of the query. In effect, this can be thought of as the 'faster interpreted' runtime.
- Pipelined
-
The pipelined runtime was introduced in Neo4j 4.0 as a replacement for the older compiled runtime used in the Neo4j 3.x versions. It combines some of the advantages of the compiled runtime in a new architecture that allows for support of a wider range of queries.
Algorithms are employed to intelligently group the operators in the execution plan in order to generate new combinations and orders of execution which are optimised for performance and memory usage. While this should lead to superior performance in most cases (over both the interpreted and slotted runtimes), it is still under development and does not support all possible operators or queries (the slotted runtime covers all operators and queries).
Option | Description | Default |
---|---|---|
|
This will force the query planner to use the interpreted runtime. |
This is not used in Enterprise Edition unless explicitly asked for. It is the only option for all queries in Community Edition—it is not necessary to specify this option in Community Edition. |
|
This will cause the query planner to use the slotted runtime. |
This is the default option for all queries which are not supported by |
|
This will cause the query planner to use the pipelined runtime if it supports the query. If the pipelined runtime does not support the query, the planner will fall back to the slotted runtime. |
This is the default option for some queries in Enterprise Edition. |
In Enterprise Edition, the Cypher query planner selects the runtime, falling back to alternative runtimes as follows:
-
Try the pipelined runtime first.
-
If the pipelined runtime does not support the query, then fall back to use the slotted runtime.
-
Finally, if the slotted runtime does not support the query, fall back to the interpreted runtime. The interpreted runtime supports all queries, and is the only option in Neo4j Community Edition.
Cypher planner
The Cypher planner takes a Cypher query and computes an execution plan that solves it. For any given query there is likely a number of execution plan candidates that each solve the query in a different way. The planner uses a search algorithm to find the execution plan with the lowest estimated execution cost.
This table describes the available planner options:
Query option | Description | Default | ||
---|---|---|---|---|
|
Use cost based planning with default limits on plan search space and time. |
|
||
|
Synonym for |
|||
|
Use cost based planning without limits on plan search space and time to perform an exhaustive search for the best execution plan.
|
Cypher connect-components planner
One part of the Cypher planner is responsible for combining sub-plans for separate patterns into larger plans - a task referred to as connecting components.
This table describes the available query options for the connect-components planner:
Query option | Description | Default | ||
---|---|---|---|---|
|
Use a greedy approach when combining sub-plans.
|
|
||
|
Use the cost based IDP search algorithm when combining sub-plans.
|
Cypher update strategy
This option affects the eagerness of updating queries.
The possible values are:
Query option | Description | Default |
---|---|---|
|
Update queries are executed eagerly when needed. |
|
|
Update queries are always executed eagerly. |
Cypher expression engine
This option affects how the runtime evaluates expressions.
The possible values are:
Query option | Description | Default |
---|---|---|
|
Compile expressions and use the compiled expression engine when needed. |
|
|
Always use the interpreted expression engine. |
|
|
Always compile expressions and use the compiled expression engine. Cannot be used together with |
Cypher operator engine
This query option affects whether the pipelined runtime attempts to generate compiled code for groups of operators.
The possible values are:
Query option | Description | Default |
---|---|---|
|
Attempt to generate compiled operators when applicable. |
|
|
Never attempt to generate compiled operators. |
|
|
Always attempt to generate compiled operators. Cannot be used together with |
Cypher interpreted pipes fallback
This query option affects how the pipelined runtime behaves for operators it does not directly support.
The available options are:
Query option | Description | Default | ||
---|---|---|---|---|
|
Equivalent to |
|
||
|
If the plan contains any operators not supported by the pipelined runtime then another runtime is chosen to execute the entire plan. Cannot be used together with |
|||
|
Parts of the execution plan can be executed on another runtime. Only certain operators are allowed to execute on another runtime. Cannot be used together with |
|||
|
Parts of the execution plan may be executed on another runtime. Any operator is allowed to execute on another runtime. Queries with this option set might produce incorrect results, or fail. Cannot be used together with
|
Cypher replanning
Cypher replanning occurs in the following circumstances:
-
When the query is not in the cache. This can either be when the server is first started or restarted, if the cache has recently been cleared, or if dbms.query_cache_size was exceeded.
-
When the time has past the cypher.min_replan_interval value, and the database statistics have changed more than the cypher.statistics_divergence_threshold value.
There may be situations where Cypher query planning can occur at a non-ideal time. For example, when a query must be as fast as possible and a valid plan is already in place.
Replanning is not performed for all queries at once; it is performed in the same thread as running the query, and can block the query. However, replanning one query does not replan any other queries. |
There are three different replan options available:
Option | Description | Default |
---|---|---|
|
This is the planning and replanning option as described above. |
|
|
This will force a replan, even if the plan is valid according to the planning rules. Once the new plan is complete, it replaces the existing one in the query cache. |
|
|
If a valid plan already exists, it will be used even if the planning rules would normally dictate that it should be replanned. |
The replan option is prepended to queries. For example:
CYPHER replan=force MATCH ...
In a mixed workload, you can force replanning by using the Cypher EXPLAIN
commands.
This can be useful to schedule replanning of queries which are expensive to plan, at known times of low load.
Using EXPLAIN
will make sure the query is only planned, but not executed.
For example:
CYPHER replan=force EXPLAIN MATCH ...
During times of known high load, replan=skip
can be useful to not introduce unwanted latency spikes.
Profiling a query
There are two options to choose from when you want to analyze a query by looking at its execution plan:
EXPLAIN
-
If you want to see the execution plan but not run the statement, prepend your Cypher statement with
EXPLAIN
. The statement will always return an empty result and make no changes to the database. PROFILE
-
If you want to run the statement and see which operators are doing most of the work, use
PROFILE
. This will run your statement and keep track of how many rows pass through each operator, and how much each operator needs to interact with the storage layer to retrieve the necessary data. Note that profiling your query uses more resources, so you should not profile unless you are actively working on a query.
See Execution plans for a detailed explanation of each of the operators contained in an execution plan.
Being explicit about what types and labels you expect relationships and nodes to have in your query helps Neo4j use the best possible statistical information, which leads to better execution plans. This means that when you know that a relationship can only be of a certain type, you should add that to the query. The same goes for labels, where declaring labels on both the start and end nodes of a relationship helps Neo4j find the best way to execute the statement. |