Query tuning

Neo4j aims to execute queries as fast as possible. However, when optimizing for maximum query execution performance, it may be helpful to rephrase queries using knowledge about the domain and the application.

This page contains information about how to tune queries using different strategies.

For information about changing the runtime of a query, see the page about Cypher® runtime concepts.

General recommendations

The overall goal of manual query performance optimization is to ensure that only necessary data is retrieved from the graph.

Queries should aim to filter data as early as possible in order to reduce the amount of work that has to be done in the later stages of query execution. This also applies to what gets returned: returning whole nodes and relationships ought to be avoided in favour of selecting and returning only the data that is needed. You should also make sure to set an upper limit on variable-length patterns, so they don’t cover larger portions of the dataset than needed.

Each Cypher query gets optimized and transformed into an execution plan by the Cypher query planner. To minimize the resources used for this, try to use parameters instead of literals when possible. This allows Cypher to re-use queries instead of having to parse and build new execution plans.

To read more about the execution plan operators mentioned in this section, see Operators.

Query options

Query execution can be fine-tuned through the use of query options.

In order to use one or more of these options, the query must be prepended with CYPHER, followed by the query option(s), as exemplified thus:

CYPHER query-option [further-query-options] query

For information about the various runtimes available in Cypher, see Cypher runtimes.

Cypher planner

The Cypher planner takes a Cypher query and computes an execution plan that solves it. For any given query there is likely a number of execution plan candidates that each solve the query in a different way. The planner uses a search algorithm to find the execution plan with the lowest estimated execution cost.

This table describes the available planner options:

Query option Description Default

planner=cost

Use cost based planning with default limits on plan search space and time.

planner=idp

Synonym for planner=cost.

planner=dp

Use cost based planning without limits on plan search space and time to perform an exhaustive search for the best execution plan.

Using this option can significantly increase the planning time of the query.

Cypher connect-components planner

One part of the Cypher planner is responsible for combining sub-plans for separate patterns into larger plans - a task referred to as connecting components.

This table describes the available query options for the connect-components planner:

Query option Description Default

connectComponentsPlanner=greedy

Use a greedy approach when combining sub-plans.

Using this option can significantly reduce the planning time of the query.

connectComponentsPlanner=idp

Use the cost based IDP search algorithm when combining sub-plans.

Using this option can significantly increase the planning time of the query but usually finds better plans.

The Cypher query option connectComponentsPlanner is deprecated and will be removed without a replacement. The product’s default behavior of using a cost-based IDP search algorithm when combining sub-plans will be kept.

Cypher update strategy

This option affects the eagerness of updating queries.

The possible values are:

Query option Description Default

updateStrategy=default

Update queries are executed eagerly when needed.

updateStrategy=eager

Update queries are always executed eagerly.

Cypher expression engine

This option affects how the runtime evaluates expressions.

The possible values are:

Query option Description Default

expressionEngine=default

Compile expressions and use the compiled expression engine when needed.

expressionEngine=interpreted

Always use the interpreted expression engine.

expressionEngine=compiled

Always compile expressions and use the compiled expression engine.

Cypher operator engine

This query option affects whether the pipelined runtime attempts to generate compiled code for groups of operators.

The possible values are:

Query option Description Default

operatorEngine=default

Attempt to generate compiled operators when applicable.

operatorEngine=interpreted

Never attempt to generate compiled operators.

operatorEngine=compiled

Always attempt to generate compiled operators.

Cannot be used together with runtime=slotted.

Cypher interpreted pipes fallback

This query option affects how the pipelined runtime behaves for operators it does not directly support.

The available options are:

Query option Description Default

interpretedPipesFallback=default

Equivalent to interpretedPipesFallback=whitelisted_plans_only.

interpretedPipesFallback=disabled

If the plan contains any operators not supported by the pipelined runtime then another runtime is chosen to execute the entire plan.

Cannot be used together with runtime=slotted.

interpretedPipesFallback=whitelisted_plans_only

Parts of the execution plan can be executed on another runtime. Only certain operators are allowed to execute on another runtime.

Cannot be used together with runtime=slotted.

interpretedPipesFallback=all

Parts of the execution plan may be executed on another runtime. Any operator is allowed to execute on another runtime. Queries with this option set might produce incorrect results, or fail.

Cannot be used together with or runtime=slotted.

This setting is experimental, and using it in a production environment is discouraged.

Cypher replanning

Translating a query string into an efficient execution plan can be an expensive operation. Once an execution plan is obtained for a query, it is placed in a cache. If the exact same query is to be executed again, the planning step is skipped. Instead, the execution plan is obtained from the cache.

"Replanning" refers to cases where a query must be planned again, even though it has been planned before. Cypher replanning occurs in the following circumstances:

  • When the query is not in the cache. This can either be when the server is first started or restarted, if the cache has recently been cleared, or if server.memory.query_cache.per_db_cache_num_entries was exceeded.

  • When the time has passed the dbms.cypher.min_replan_interval value, and the database statistics have changed more than the dbms.cypher.statistics_divergence_threshold value.

  • When the cached query plan has notifications that have become invalid. Consider that a query plan had the notification 01N50: Label does not exist. After adding a node with the label that did not exist before and running the same query again, replanning the query is required.

There may be situations where Cypher query planning can occur at a non-ideal time. For example, when a query must be as fast as possible and a valid plan is already in place.

Replanning is not performed for all queries at once; it is performed in the same thread as running the query, and can block the query. However, replanning one query does not replan any other queries.

There are three different replan options available:

Option Description Default

replan=default

This is the planning and replanning option as described above.

replan=force

This will force a replan, even if the plan is valid according to the planning rules. Once the new plan is complete, it replaces the existing one in the query cache.

replan=skip

If a valid plan already exists, it will be used even if the planning rules would normally dictate that it should be replanned.

The replan option is prepended to queries.

For example:

CYPHER replan=force MATCH ...

In a mixed workload, you can force replanning by using the Cypher EXPLAIN commands. This can be useful to schedule replanning of queries which are expensive to plan, at known times of low load. Using EXPLAIN will make sure the query is only planned, but not executed.

For example:

CYPHER replan=force EXPLAIN MATCH ...

During times of known high load, replan=skip can be useful to not introduce unwanted latency spikes.

When a schema change is committed while a query is being planned, replanning occurs for the query that is being planned. For example, dropping an index is a schema change. The schema change can make the obtained execution plan invalid or inefficient. Instead of continuing with the obtained execution plan, the query will be planned again. The Cypher option replan does not have any effect on replanning due to schema changes.

Cypher infer schema parts

For some queries, the planner can infer predicates such as labels or types from the graph structure, thereby enhancing its ability to estimate the number of rows each operator will produce. (See Understanding execution plans - Reading execution plans for more information about the role of operators and estimated row counts in query execution plans.) The option inferSchemaParts controls the extent to which the planner should infer predicates.

Option Description

inferSchemaParts=off

No predicates are inferred.

inferSchemaParts=most_selective_label

Relationship types are used to infer labels on connected nodes. The label corresponding to the smallest number of nodes is used to estimate rows. Avoiding the inference of multiple labels improves accuracy for nodes with several dependent labels, such as every :Actor being a :Person.

If this query option is not provided, then the value set in Operations Manual → Configuration settings → dbms.cypher.infer_schema_parts will be used.