Statistics and execution plans

This section describes how to configure the Neo4j statistics collection and the query replanning in the Cypher query engine.

When a Cypher query is issued, it gets compiled to an execution plan that can run and answer the query. The Cypher query engine uses the available information about the database, such as schema information about which indexes and constraints exist in the database. Neo4j also uses statistical information about the database to optimize the execution plan. For more information, see Cypher Manual → Query tuning and Cypher Manual → Execution plans.

Configure statistics collection

The Cypher query planner depends on accurate statistics to create efficient plans. Therefore, these statistics are kept up-to-date as the database evolves.

For each database in the DBMS, Neo4j collects the following statistical information and keeps it up-to-date:

For graph entities
  • The number of nodes with a certain label.

  • The number of relationships by type.

  • The number of relationships by type between nodes with a specific label.

These numbers are updated whenever you set or remove a label from a node.

For database schema
  • Selectivity per index.

To produce a selectivity number, Neo4j runs a full index scan in the background. Because this could potentially be a very time-consuming operation, a full index scan is triggered only when the changed data reaches a specified threshold.

Automatic statistics collection

You can control whether and how often statistics are collected automatically by configuring the following settings:

Parameter name Default value Description

dbms.index_sampling.background_enabled

true

Enable the automatic (background) index sampling.

dbms.index_sampling.update_percentage

5

Percentage of index updates of total index size required before sampling of a given index is triggered.

Manual statistics collection

You can manually trigger index resampling by using the built-in procedures db.resampleIndex() and db.resampleOutdatedIndexes().

db.resampleIndex()

Trigger resampling of a specified index.

CALL db.resampleIndex("indexName")
db.resampleOutdatedIndexes()

Trigger resampling of all outdated indexes.

CALL db.resampleOutdatedIndexes()

Configure the replanning of execution plans

Execution plans are cached and are not replanned until the statistical information used to produce the plan changes.

Automatic replanning

You can control how sensitive the replanning should be to database updates by configuring the following settings:

Parameter name Default value Description

cypher.statistics_divergence_threshold

0.75

The threshold for statistics above which a plan is considered stale.
When the changes to the underlying statistics of an execution plan meet the specified threshold, the plan is considered stale and is replanned. Change is calculated as abs(a-b)/max(a,b).
This means that a value of 0.75 requires the database to approximately quadruple in size before replanning occurs. A value of 0 means that the query is replanned as soon as there is a change in the statistics and the replan interval elapses.

cypher.min_replan_interval

10s

The minimum amount of time between two query replanning executions. After this time, the graph statistics are evaluated, and if they have changed more than the value set in cypher.statistics_divergence_threshold, the query is replanned. Each time the statistics are evaluated, the divergence threshold is reduced until it reaches 10% after about 7h. This ensures that even moderately changing databases see query replanning after a sufficiently long time interval.

Manual replanning

You can manually force the database to replan the execution plans that are already in the cache by using the following built-in procedures:

db.clearQueryCaches()

Clear all query caches. Does not change the database statistics.

CALL db.clearQueryCaches()
db.prepareForReplanning()

Completely recalculates all database statistics to be used for any subsequent query planning.

The procedure triggers an index resampling, waits for it to complete, and clears all query caches. Afterwards, queries are planned based on the latest database statistics.

CALL db.prepareForReplanning()

You can use Cypher replanning to specify whether you want to force a replan, even if the plan is valid according to the planning rules, or skip replanning entirely should you wish to use a valid plan that already exists.

For more information, see: