7.1. Introduction

The task of executing a query is decomposed into operators, each of which implements a specific piece of work. The operators are combined into a tree-like structure called an execution plan. Each operator in the execution plan is represented as a node in the tree. Each operator takes as input zero or more rows, and produces as output zero or more rows. This means that the output from one operator becomes the input for the next operator. Operators that join two branches in the tree combine input from two incoming streams and produce a single output.

Evaluation model. Evaluation of the execution plan begins at the leaf nodes of the tree. Leaf nodes have no input rows and generally comprise operators such as scans and seeks. These operators obtain the data directly from the storage engine, thus incurring database hits. Any rows produced by leaf nodes are then piped into their parent nodes, which in turn pipe their output rows to their parent nodes and so on, all the way up to the root node. The root node produces the final results of the query.

Eager and lazy evaluation. In general, query evaluation is lazy: most operators pipe their output rows to their parent operators as soon as they are produced. This means that a child operator may not be fully exhausted before the parent operator starts consuming the input rows produced by the child.

However, some operators, such as those used for aggregation and sorting, need to aggregate all their rows before they can produce output. Such operators need to complete execution in its entirety before any rows are sent to their parents as input. These operators are called eager operators, and are denoted as such in Section 7.2, “Execution plan operators at a glance”. Eagerness can cause high memory usage and may therefore be the cause of query performance issues.

Statistics. Each operator is annotated with statistics.

Rows
The number of rows that the operator produced. This is only available if the query was profiled.
EstimatedRows
This is the estimated number of rows that is expected to be produced by the operator. The estimate is an approximate number based on the available statistical information. The compiler uses this estimate to choose a suitable execution plan.
DbHits
Each operator will ask the Neo4j storage engine to do work such as retrieving or updating data. A database hit is an abstract unit of this storage engine work. The actions triggering a database hit are listed in Section 7.3, “Database hits (DbHits)”.
Page Cache Hits, Page Cache Misses, Page Cache Hit Ratio
These metrics are only shown for some queries when using Neo4j Enterprise Edition. The page cache is used to cache data and avoid accessing disk, so having a high number of hits and a low number of misses will typically make the query run faster.
Time
Time is only shown for some operators when using the pipelined runtime. The number shown is the time in milliseconds it took to execute the given operator.

To produce an efficient plan for a query, the Cypher query planner requires information about the Neo4j database. This information includes which indexes and constraints are available, as well as various statistics maintained by the database. The Cypher query planner uses this information to determine which access patterns will produce the best execution plan.

The statistical information maintained by Neo4j is:

  1. The number of nodes having a certain label.
  2. The number of relationships by type.
  3. Selectivity per index.
  4. The number of relationships by type, ending with or starting from a node with a specific label.

Information about how the statistics are kept up to date, as well as configuration options for managing query replanning and caching, can be found in the Operations Manual → Statistics and execution plans.

Chapter 6, Query tuning describes how to tune Cypher queries. In particular, see Section 6.2, “Profiling a query” for how to view the execution plan for a query and Section 6.6, “Planner hints and the USING keyword” for how to use hints to influence the decisions of the planner when building an execution plan for a query.

For a deeper understanding of how each operator works, refer to Section 7.2, “Execution plan operators at a glance” and the linked sections per operator. Please remember that the statistics of the particular database where the queries run will decide the plan used. There is no guarantee that a specific query will always be solved with the same plan.