11.8. Statistics and execution plans

This section describes the configuration options that affect the gathering of statistics, and the replanning of query plans in the Cypher query engine.

When a Cypher query is issued, it gets compiled to an execution plan that can run and answer the query. The Cypher query engine uses available information about the database, such as schema information about which indexes and constraints exist in the database. Neo4j also uses statistical information about the database to optimize the execution plan.

For further details, please see Cypher Manual → Query tuning and Cypher Manual → Execution plans.

The frequency of statistics gathering and the replanning of execution plans are described in the sections below.

11.8.1. Statistics

The statistical information that Neo4j keeps is:

  1. The number of nodes having a certain label.
  2. The number of relationships by type.
  3. The number of relationships by type, ending or starting from a node with a specific label.
  4. Selectivity per index.

Neo4j keeps the statistics up to date in two different ways. For label and relationship counts, the number is updated whenever you set or remove a label from a node. For indexes, however, Neo4j needs to scan the full index to produce the selectivity number. Since this is potentially a very time-consuming operation, these numbers are collected in the background when enough data on the index has been changed.

The following settings allow you to control whether statistics are collected automatically, and at which rate:

Parameter name Default value Description

dbms.index_sampling.background_enabled

true

Controls whether indexes will automatically be re-sampled when they have been updated enough. The Cypher query planner depends on accurate statistics to create efficient plans, so it is important it is kept up to date as the database evolves.

dbms.index_sampling.update_percentage

5

Controls the percentage of the index that has to have been updated before a new sampling run is triggered.

It is possible to manually trigger index sampling, using the following built-in procedures:

db.resampleIndex()
Trigger resampling of an index.
db.resampleOutdatedIndexes()
Trigger resampling of all outdated indexes.
Example 11.5. Manually trigger index resampling

The following example illustrates how to trigger a resampling of the index on the label Person and property name, by calling db.resampleIndex():

CALL db.resampleIndex(":Person(name)");

The following example illustrates how to call db.resampleOutdatedIndexes() in order to trigger a resampling of all outdated indexes:

CALL db.resampleOutdatedIndexes();

11.8.2. Execution plans

Execution plans are cached and will not be replanned until the statistical information used to produce the plan has changed. The following setting enables you to control how sensitive replanning should be to updates of the database:

Parameter name Default value Description

cypher.statistics_divergence_threshold

0.75

The threshold when a plan is considered stale. If any of the underlying statistics used to create the plan have changed more than this value, the plan will be considered stale and will be replanned. Change is calculated as abs(a-b)/max(a,b). This means that a value of 0.75 requires the database to approximately quadruple in size before replanning occurs. A value of 0 means replan as soon as possible, with the soonest being defined by the parameter cypher.min_replan_interval, which defaults to 10s. After this interval the divergence threshold will slowly start to decline, reaching 10% after about 7h. This will ensure that long-running databases will still get query replanning on even modest changes, while not replanning frequently unless the changes are very large.