This section describes the configuration options that affect the gathering of statistics, and the replanning of query plans in the Cypher query engine.
When a Cypher query is issued, it gets compiled to an execution plan that can run and answer the query. The Cypher query engine uses available information about the database, such as schema information about which indexes and constraints exist in the database. Neo4j also uses statistical information about the database to optimize the execution plan.
For further details, please see Developer Manual → Query tuning and Developer Manual → Execution plans.
The frequency of statistics gathering and the replanning of execution plans are described in the sections below.
The statistical information that Neo4j keeps is:
Neo4j keeps the statistics up to date in two different ways. For label and relationship counts, the number is updated whenever you set or remove a label from a node. For indexes, however, Neo4j needs to scan the full index to produce the selectivity number. Since this is potentially a very time-consuming operation, these numbers are collected in the background when enough data on the index has been changed.
The following settings allow you to control whether statistics are collected automatically, and at which rate:
Parameter name | Default value | Description |
---|---|---|
|
Controls whether indexes will automatically be re-sampled when they have been updated enough. The Cypher query planner depends on accurate statistics to create efficient plans, so it is important it is kept up to date as the database evolves. |
|
|
Controls the percentage of the index that has to have been updated before a new sampling run is triggered. |
It is possible to trigger manual sampling using the following commands:
schema sample -a
schema sample -l Person -p name
Person
on property name
(if existing).
schema sample -a -f
schema sample -f -l :Person -p name
Execution plans are cached and will not be replanned until the statistical information used to produce the plan has changed. The following setting enables you to control how sensitive replanning should be to updates of the database:
Parameter name | Default value | Description |
---|---|---|
|
Controls how much the statistical information is allowed to change before an execution plan is considered stale and has to be replanned. If the relative change in any of the statistics is larger than this threshold, the plan will be thrown away and a new one will be created. A threshold of 0.0 means always replan, and a value of 1.0 means never replan. |