Execution plans
This page describes the characteristics of query execution plans and provides details about each of the operators.
For information on replanning, see Cypher® replanning. |
The task of executing a query is decomposed into operators, each of which implements a specific piece of work. The operators are combined into a tree-like structure called an execution plan. Each operator in the execution plan is represented as a node in the tree. Each operator takes as input zero or more rows, and produces as output zero or more rows. This means that the output from one operator becomes the input for the next operator. Operators that join two branches in the tree combine input from two incoming streams and produce a single output.
Evaluation of the execution plan begins at the leaf nodes of the tree. Leaf nodes have no input rows and generally comprise operators such as scans and seeks. These operators obtain the data directly from the storage engine, thus incurring database hits. Any rows produced by leaf nodes are then piped into their parent nodes, which in turn pipe their output rows to their parent nodes and so on, all the way up to the root node. The root node produces the final results of the query.
In general, query evaluation is lazy: most operators pipe their output rows to their parent operators as soon as they are produced. This means that a child operator may not be fully exhausted before the parent operator starts consuming the input rows produced by the child.
However, some operators, such as those used for aggregation and sorting, need to aggregate all their rows before they can produce output. Such operators need to complete execution in its entirety before any rows are sent to their parents as input. These operators are called eager operators, and are denoted as such in Execution plan operators. Eagerness can cause high memory usage and may therefore be the cause of query performance issues.
Each operator is assigned a unique ID, which is shown in the execution plan. The IDs can be used to refer unambiguously to operators. There are no guarantees about the order of IDs, although they will usually start with 0 at the root, and will increase towards the leaves of the tree.
Each operator is annotated with statistics.
Rows
-
The number of rows that the operator produced. This is only available if the query was profiled.
EstimatedRows
-
This is the estimated number of rows that is expected to be produced by the operator. The estimate is an approximate number based on the available statistical information. The compiler uses this estimate to choose a suitable execution plan.
DbHits
-
Each operator will ask the Neo4j storage engine to do work such as retrieving or updating data. A database hit is an abstract unit of this storage engine work. The actions triggering a database hit are listed in Database hits.
Page Cache Hits
,Page Cache Misses
,Page Cache Hit Ratio
-
These metrics are only shown for some queries when using Neo4j Enterprise Edition. The page cache is used to cache data and avoid accessing disk, so having a high number of
hits
and a low number ofmisses
will typically make the query run faster. Whenever several operators are fused together for more efficient execution we can no longer associate this metric with a given operator and then nothing will appear here. Time
-
Time
is only shown for some operators when using thepipelined
runtime. The number shown is the time in milliseconds it took to execute the given operator. Whenever several operators are fused together for more efficient execution we can no longer associate a duration with a given operator and then nothing will appear here.
To produce an efficient plan for a query, the Cypher query planner requires information about the Neo4j database. This information includes which indexes and constraints are available, as well as various statistics maintained by the database. The Cypher query planner uses this information to determine which access patterns will produce the best execution plan.
The statistical information maintained by Neo4j is:
-
The number of nodes having a certain label.
-
The number of relationships by type.
-
Selectivity per index.
-
The number of relationships by type, ending with or starting from a node with a specific label.
Information about how the statistics are kept up to date, as well as configuration options for managing query replanning and caching, can be found in the Operations Manual → Statistics and execution plans.
Query tuning describes how to tune Cypher queries. In particular, see Profile a query for how to view the execution plan for a query and Planner hints and the USING keyword for how to use hints to influence the decisions of the planner when building an execution plan for a query.
For a deeper understanding of how each operator works, refer to Execution plan operators and the linked sections per operator. Please remember that the statistics of the particular database where the queries run will decide the plan used. There is no guarantee that a specific query will always be solved with the same plan.
Was this page helpful?