The impact of indexes on query performance

Search-performance indexes enable quicker and more efficient pattern matching by solving a particular combination of node label/relationship type and property predicate. They are used automatically by the Cypher® planner in MATCH clauses, usually at the start of a query, to scan the graph for the most appropriate place to start the pattern-matching process.

By examining query execution plans, this page will explain the scenarios in which the various search-performance indexes are used to improve the performance of Cypher queries. It will also provide some general heuristics for when to use indexes, and advice about how to avoid over-indexing.

Example graph

The examples on this page center around finding routes and points of interest in Central Park, New York, based on data provided by OpenStreetMap. The data model contains two node labels:

  • OSMNode (Open Street Map Node) — a junction node with geo-spatial properties linking together routes from specific points.

  • PointOfInterest  — a subcategory of OSMNode. In addition to geospatial properties, these nodes also contain information about specific points of interest, such as statues, baseball courts, etc. in Central Park.

The data model also contains one relationship type: ROUTE, which specifies the distance in meters between the nodes in the graph.

using indexes example graph

In total, the graph contains 69165 nodes (of which 188 have the label PointOfInterest) and 152077 ROUTE relationships.

To recreate the graph, download and import the 5.0 dump file to an empty Neo4j database. Dump files can be imported for both Aura and on-prem instances.

Token lookup indexes

Two token lookup indexes are present by default when creating a Neo4j database. They store copies of all node labels and relationship types in the database and only solve node label and relationship type predicates.

The following query [1], which counts the number of PointOfInterest nodes that have value baseball for the type property, will access the node label lookup index:

Query
PROFILE
MATCH (n:PointOfInterest)
WHERE n.type = 'baseball'
RETURN count(n)
Table 1. Result
count(n)

26

Rows: 1

Execution plan
+-------------------+----+------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| Operator          | Id | Details                | Estimated Rows | Rows | DB Hits | Memory (Bytes) | Page Cache Hits/Misses | Time (ms) | Pipeline            |
+-------------------+----+------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| +ProduceResults   |  0 | `count(n)`             |              1 |    1 |       0 |              0 |                    0/0 |     0.075 | In Pipeline 1       |
| |                 +----+------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| +EagerAggregation |  1 | count(n) AS `count(n)` |              1 |    1 |       0 |             32 |                        |           |                     |
| |                 +----+------------------------+----------------+------+---------+----------------+                        |           |                     |
| +Filter           |  2 | n.type = $autostring_0 |              9 |   26 |     376 |                |                        |           |                     |
| |                 +----+------------------------+----------------+------+---------+----------------+                        |           |                     |
| +NodeByLabelScan  |  3 | n:PointOfInterest      |            188 |  188 |     189 |            376 |                  116/0 |     8.228 | Fused in Pipeline 0 |
+-------------------+----+------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+

Total database accesses: 565, total allocated memory: 472

The following details are worth highlighting in the execution plan:

  • The NodeByLabelScan operator accesses the node label lookup index and produces 188 rows, representing the 188 nodes with the PointOfInterest label in the database.

  • The query required 565 DB hits (each DB hit represents an instance when the query required access to the database).

  • The query completed in just over 8 milliseconds.

Token lookup indexes are very important because they improve the performance of Cypher queries and the population of other indexes, and deleting them will lead to severe performance degradation. In the above example, had a node label lookup index not existed, the NodeByLabelScan operator would have been replaced with AllNodesScan, which would have had to read all 69165 nodes from the database before returning a result.

While useful, token lookup indexes will rarely be sufficient for applications querying databases of a non-trivial size because they cannot solve any property-related predicates.

For more information about the predicates supported by token lookup indexes, see Create, show, and delete indexes → Token lookup indexes: supported predicates.

Range indexes

Range indexes solve most types of predicates, and they are used for efficiently retrieving data based on a range of values. They are particularly useful for dealing with properties that have ordered, comparable values.

The following example first creates a relevant index on the type property for PointOfInterest nodes, and then runs the above query again, counting the number of PointOfInterest nodes that have a baseball type value:

Create a range index
CREATE INDEX range_index_type FOR (n:PointOfInterest) ON (n.type)
If no index type is specified when creating an index, Neo4j will default to create a range index. For more information about creating indexes, see Create, show, and delete indexes → CREATE INDEX.
Rerun query after the creation of a relevant index
PROFILE
MATCH (n:PointOfInterest)
WHERE n.type = 'baseball'
RETURN count(n)
Execution plan
+-------------------+----+----------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| Operator          | Id | Details                                                        | Estimated Rows | Rows | DB Hits | Memory (Bytes) | Page Cache Hits/Misses | Time (ms) | Pipeline            |
+-------------------+----+----------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| +ProduceResults   |  0 | `count(n)`                                                     |              1 |    1 |       0 |              0 |                    0/0 |     0.057 | In Pipeline 1       |
| |                 +----+----------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| +EagerAggregation |  1 | count(n) AS `count(n)`                                         |              1 |    1 |       0 |             32 |                        |           |                     |
| |                 +----+----------------------------------------------------------------+----------------+------+---------+----------------+                        |           |                     |
| +NodeIndexSeek    |  2 | RANGE INDEX n:PointOfInterest(type) WHERE type = $autostring_0 |              5 |   26 |      27 |            376 |                    0/1 |     0.945 | Fused in Pipeline 0 |
+-------------------+----+----------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+

Total database accesses: 27, total allocated memory: 472

Comparing this query plan with the plan generated before the creation of a relevant range index, the following has changed:

  • NodeByLabelScan has been replaced by NodeIndexSeek. This only produces 26 rows (representing the 26 PointOfInterest nodes in the database with a type value set to baseball).

  • The query now only requires 27 DB hits.

  • The query completed in less than 1 millisecond - almost 8 times faster than it took the query to complete without a range index.

These points all illustrate the fundamental point that search-performance indexes can significantly improve the performance of Cypher queries.

For more information about the predicates supported by range indexes, see Create, show, and delete indexes → Range indexes: supported predicates.

Text indexes

Text indexes are used for queries filtering on STRING properties.

If there exists both a range and a text index on a given STRING property, the text index will only be used by the Cypher planner for queries filtering with the CONTAINS or ENDS WITH operators. In all other cases, the range index will be used.

To show this behavior, it is necessary to create a text index and a range index on the same property:

Create a text index
CREATE TEXT INDEX text_index_name FOR (n:PointOfInterest) ON (n.name)
Create a range index
CREATE INDEX range_index_name FOR (n:PointOfInterest) ON (n.name)

The following query filters all PointOfInterest nodes with a name property that CONTAINS 'William':

Query filtering on what a STRING property CONTAINS
PROFILE
MATCH (n:PointOfInterest)
WHERE n.name CONTAINS 'William'
RETURN n.name AS name, n.type AS type
Table 2. Result
name type

"William Shakespeare"

"statue"

"William Tecumseh Sherman"

"equestrian statue"

Rows: 2

Execution plan
+------------------------+----+----------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| Operator               | Id | Details                                                              | Estimated Rows | Rows | DB Hits | Memory (Bytes) | Page Cache Hits/Misses | Time (ms) | Pipeline            |
+------------------------+----+----------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| +ProduceResults        |  0 | name, type                                                           |              1 |    2 |       0 |              0 |                        |           |                     |
| |                      +----+----------------------------------------------------------------------+----------------+------+---------+----------------+                        |           |                     |
| +Projection            |  1 | cache[n.name] AS name, cache[n.type] AS type                         |              1 |    2 |       0 |                |                        |           |                     |
| |                      +----+----------------------------------------------------------------------+----------------+------+---------+----------------+                        |           |                     |
| +CacheProperties       |  2 | cache[n.type], cache[n.name]                                         |              1 |    2 |       6 |                |                        |           |                     |
| |                      +----+----------------------------------------------------------------------+----------------+------+---------+----------------+                        |           |                     |
| +NodeIndexContainsScan |  3 | TEXT INDEX n:PointOfInterest(name) WHERE name CONTAINS $autostring_0 |              1 |    2 |       3 |            248 |                    4/0 |    53.297 | Fused in Pipeline 0 |
+------------------------+----+----------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+

Total database accesses: 9, total allocated memory: 312

The plan shows that the query uses the text index to find all relevant nodes. If, however, the query is changed to use the STARTS WITH operator instead of CONTAINS, the query will use the range index instead:

Query filtering on what a STRING property STARTS WITH
PROFILE
MATCH (n:PointOfInterest)
WHERE n.name STARTS WITH 'William'
RETURN n.name, n.type
Execution plan
+-----------------------+----+-----------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| Operator              | Id | Details                                                                                 | Estimated Rows | Rows | DB Hits | Memory (Bytes) | Page Cache Hits/Misses | Time (ms) | Pipeline            |
+-----------------------+----+-----------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| +ProduceResults       |  0 | `n.name`, `n.type`                                                                      |              1 |    2 |       0 |              0 |                        |           |                     |
| |                     +----+-----------------------------------------------------------------------------------------+----------------+------+---------+----------------+                        |           |                     |
| +Projection           |  1 | cache[n.name] AS `n.name`, n.type AS `n.type`                                           |              1 |    2 |       4 |                |                        |           |                     |
| |                     +----+-----------------------------------------------------------------------------------------+----------------+------+---------+----------------+                        |           |                     |
| +NodeIndexSeekByRange |  2 | RANGE INDEX n:PointOfInterest(name) WHERE name STARTS WITH $autostring_0, cache[n.name] |              1 |    2 |       3 |            248 |                    4/1 |     1.276 | Fused in Pipeline 0 |
+-----------------------+----+-----------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+

Total database accesses: 7, total allocated memory: 312

This is because range indexes store STRING values alphabetically. This means that, while they are very efficient for retrieving exact matches of a STRING, or for prefix matching, they are less efficient for suffix and contains searches, where they have to scan all relevant properties to filter any matches. Text indexes do not store STRING properties alphabetically, and are instead optimized for suffix and contains searches. That said, if no range index had been present on the name property, the previous query would still have been able to utilize the text index. It would have done so less efficiently than a range index, but it still would have been useful.

For more information about range index ordering, see the section on Range index-backed ORDER BY.

Text indexes are only used for exact query matches. To perform approximate matches (including, for example, variations and typos), and to compute a similarity score between STRING values, use semantic full-text indexes instead.

For more information about the predicates supported by text indexes, see Create, show, and delete indexes → Text indexes: supported predicates.

Text indexes and STRING sizes

The size of the indexed STRING properties is also relevant to the planner’s selection between range and text indexes.

Range indexes have a maximum key size limit of around 8 kb. This means that range indexes cannot be used to index STRING values larger than 8 kb. Text indexes, on the other hand, have a maximum key size limit of around 32 kb. As a result, they can be used to index STRING values up to that size.

For information about calculating the size of indexes, see Neo4j Knowledge Base → A method to calculate the size of an index in Neo4j.

Point indexes

Point indexes solve predicates operating on spatial POINT values. Point indexes are optimized for queries filtering for the distance between property values, or for property values within a bounding box.

The following example creates a point index which is then used in a query returning the name and type of all PointOfInterest nodes within a set bounding box:

Create a point index
CREATE POINT INDEX point_index_location FOR (n:PointOfInterest) ON (n.location)
Query using the point.withinBBox() function
PROFILE
MATCH (n:PointOfInterest)
WHERE point.withinBBox(
  n.location,
  point({srid: 4326, x: -73.9723702, y: 40.7697989}),
  point({srid: 4326, x: -73.9725659, y: 40.770193}))
RETURN n.name AS name, n.type AS type
Table 3. Result
name type

"Heckscher Ballfield 3"

"baseball"

"Heckscher Ballfield 4"

"baseball"

"Heckscher Ballfield 1"

"baseball"

"Robert Burns"

"statue"

"Christopher Columbus"

"statue"

"Walter Scott"

"statue"

"William Shakespeare"

"statue"

"Balto"

"statue"

Rows: 8

Execution plan
+-----------------------+----+------------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| Operator              | Id | Details                                                                                              | Estimated Rows | Rows | DB Hits | Memory (Bytes) | Page Cache Hits/Misses | Time (ms) | Pipeline            |
+-----------------------+----+------------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| +ProduceResults       |  0 | `n.name`, `n.type`                                                                                   |              4 |    8 |       0 |              0 |                        |           |                     |
| |                     +----+------------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+                        |           |                     |
| +Projection           |  1 | cache[n.name] AS `n.name`, cache[n.type] AS `n.type`                                                 |              4 |    8 |       0 |                |                        |           |                     |
| |                     +----+------------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+                        |           |                     |
| +CacheProperties      |  2 | cache[n.type], cache[n.name]                                                                         |              4 |    8 |      24 |                |                        |           |                     |
| |                     +----+------------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+                        |           |                     |
| +NodeIndexSeekByRange |  3 | POINT INDEX n:PointOfInterest(location) WHERE point.withinBBox(location, point($autoint_0, $autodoub |              4 |    8 |      10 |            248 |                  302/0 |     2.619 | Fused in Pipeline 0 |
|                       |    | le_1, $autodouble_2), point($autoint_3, $autodouble_4, $autodouble_5))                               |                |      |         |                |                        |           |                     |
+-----------------------+----+------------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+

Total database accesses: 34, total allocated memory: 312

For more information about the predicates supported by point indexes, see Create, show, and delete indexes → Point indexes: supported predicates.

Point index configuration settings

It is possible to configure point indexes to only index properties within a specific geographical area. This is done by specifying either of the following settings in the indexConfig part of the OPTIONS clause when creating a point index:

  • spatial.cartesian.min and spatial.cartesian.max: used for Cartesian 2D coordinate systems.

  • spatial.cartesian-3d.min and spatial.cartesian-3d.max: used for Cartesian 3D coordinate systems.

  • spatial.wgs-84.min and spatial.wgs-84.max: used for WGS-84 2D coordinate systems.

  • spatial.wgs-84-3d.min and spatial.wgs-84-3d.max: used for WGS-84 3D coordinate systems.

The min and max of each setting define the minimum and maximum bounds for the spatial data in each coordinate system.

For example, the following index would only store OSMNodes in the northern half of Central Park:

Create point index with configuration settings
CREATE POINT INDEX central_park_north
FOR (o:OSMNode) ON (o.location)
OPTIONS {
  indexConfig: {
    `spatial.wgs-84.min`:[40.7714, -73.9743],
    `spatial.wgs-84.max`:[40.7855, -73.9583]
  }
}

Restricting the geographic area of a point index improves the performance of spatial queries. This is especially beneficial when dealing with complex, large geo-spatial data, and when spatial queries are a significant part of an application’s functionality.

Composite indexes

It is possible to create a range index on a single property or multiple properties (text and point indexes are single-property only). The latter are called composite indexes and can be useful if queries against a database frequently filter on all the properties indexed by the composite index.

The following example first creates a composite index on PointOfInterest nodes for the properties name and type, and then queries the graph using the shortestPath function to determine both the path length (in terms of traversed relationships in the graph) and geographical distance between the Zoo School and its nearest tennis pitch (note that there are 32 unique PointOfInterest tennis pitch nodes in the graph):

Create composite index
CREATE INDEX composite_index FOR (n:PointOfInterest) ON (n.name, n.type)
Query with a filter on both properties indexed by the composite index
PROFILE
MATCH (tennisPitch: PointOfInterest {name: 'pitch', type: 'tennis'})
WITH tennisPitch
MATCH path = shortestPath((tennisPitch)-[:ROUTE*]-(:PointOfInterest {name: 'Zoo School'}))
WITH path, relationships(path) AS relationships
ORDER BY length(path) ASC
LIMIT 1
UNWIND relationships AS rel
RETURN length(path) AS pathLength, sum(rel.distance) AS geographicalDistance
Table 4. Result
pathLength geographicalDistance

25

2410.4495689536334

Rows: 1

Execution plan
+---------------------+----+------------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+------------------+---------------------+
| Operator            | Id | Details                                                                                              | Estimated Rows | Rows | DB Hits | Memory (Bytes) | Page Cache Hits/Misses | Time (ms) | Ordered by       | Pipeline            |
+---------------------+----+------------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+------------------+---------------------+
| +ProduceResults     |  0 | pathLength, geographicalDistance                                                                     |              1 |    1 |       0 |              0 |                    0/0 |     0.065 |                  |                     |
| |                   +----+------------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+                  |                     |
| +OrderedAggregation |  1 | length(path) AS pathLength, sum(rel.distance) AS geographicalDistance                                |              1 |    1 |      50 |           5140 |                   31/0 |     4.097 | pathLength ASC   | In Pipeline 3       |
| |                   +----+------------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+------------------+---------------------+
| +Unwind             |  2 | relationships AS rel                                                                                 |              1 |   25 |       0 |           3112 |                    0/0 |     0.180 |                  | In Pipeline 2       |
| |                   +----+------------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+                  +---------------------+
| +Projection         |  3 | relationships(path) AS relationships                                                                 |              0 |    1 |       0 |                |                    0/0 |     0.050 |                  |                     |
| |                   +----+------------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+                  |                     |
| +Top                |  4 | `length(path)` ASC LIMIT 1                                                                           |              0 |    1 |       0 |          57472 |                    0/0 |     1.763 | length(path) ASC | In Pipeline 1       |
| |                   +----+------------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+------------------+---------------------+
| +Projection         |  5 | length(path) AS `length(path)`                                                                       |              0 |   32 |       0 |                |                        |           |                  |                     |
| |                   +----+------------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+                        |           +------------------+                     |
| +ShortestPath       |  6 | path = (tennisPitch)-[anon_0:ROUTE*]-(anon_1)                                                        |              0 |   32 |  181451 |          70080 |                        |           |                  |                     |
| |                   +----+------------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+                        |           +------------------+                     |
| +MultiNodeIndexSeek |  7 | RANGE INDEX tennisPitch:PointOfInterest(name, type) WHERE name = $autostring_0 AND type = $autostrin |              0 |   31 |       0 |            376 |               131215/1 |   188.723 |                  | Fused in Pipeline 0 |
|                     |    | g_1, RANGE INDEX anon_1:PointOfInterest(name) WHERE name = $autostring_2                             |                |      |         |                |                        |           |                  |                     |
+---------------------+----+------------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+------------------+---------------------+

Total database accesses: 181501, total allocated memory: 116040

The query plan shows the composite index being used, and not the previously created range index on the type property. This is because the composite index solves the queried predicate simultaneously, while the single propertied index would only be able to solve part of the predicate.

Property order and query planning

Like single-property range indexes, composite indexes support all predicates:

  • Equality check: n.prop = value

  • List membership check: n.prop IN [value, …​]

  • Existence check: n.prop IS NOT NULL

  • Range search: n.prop > value

  • Prefix search: n.prop STARTS WITH value

However, the order in which properties are defined when creating a composite index impacts how the planner will use the index to solve predicates. For example, a composite index on (n.prop1, n.prop2, n.prop3) will generate a different query plan than a composite index created on (n.prop3, n.prop2, n.prop1).

The following example shows how composite indexes on the same properties defined in a different order will generate different execution plans:

Create a composite index on three properties
CREATE INDEX composite_2 FOR (n:PointOfInterest) ON (n.lat, n.name, n.type)

Note the order in which the properties are defined when creating the index, with lat first, name second, and type last.

Query with a filter on the three indexed properties
PROFILE
MATCH (n:PointOfInterest)
WHERE n.lat = 40.7697989 AND n.name STARTS WITH 'William' AND n.type IS NOT NULL
RETURN n.name AS name
Table 5. Result
name

"William Shakespeare"

Rows: 1

Execution plan
+-----------------+----+------------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| Operator        | Id | Details                                                                                              | Estimated Rows | Rows | DB Hits | Memory (Bytes) | Page Cache Hits/Misses | Time (ms) | Pipeline            |
+-----------------+----+------------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| +ProduceResults |  0 | name                                                                                                 |              0 |    0 |       0 |              0 |                        |           |                     |
| |               +----+------------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+                        |           |                     |
| +Projection     |  1 | cache[n.name] AS name                                                                                |              0 |    0 |       0 |                |                        |           |                     |
| |               +----+------------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+                        |           |                     |
| +NodeIndexSeek  |  2 | RANGE INDEX n:PointOfInterest(lat, name, type) WHERE lat = $autodouble_0 AND name STARTS WITH $autos |              0 |    0 |       1 |            248 |                    0/2 |     1.276 | Fused in Pipeline 0 |
|                 |    | tring_1 AND type IS NOT NULL, cache[n.name]                                                          |                |      |         |                |                        |           |                     |
+-----------------+----+------------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+

Total database accesses: 1, total allocated memory: 312

The plan shows the recently created composite index is used. It also shows that the predicates are filtered as specified in the query (i.e. an equality check on the lat property, a prefix search on the name property, and an existence check on the type property).

However, if the order of the properties is altered when creating the index, a different query plan will be generated. To demonstrate this behavior, it is first necessary to drop the recently created composite_2 index and create a new composite index on the same properties defined in a different order:

Drop index
DROP INDEX composite_2
Create a composite index on same three properties defined in a different order
CREATE INDEX composite_3 FOR (n:PointOfInterest) ON (n.name, n.type, n.lat)

Note that the order of the properties has changed: the name property is now the first property defined in the composite index, and the lat property is indexed last.

Rerun query after the creation of a different composite index
PROFILE
MATCH (n:PointOfInterest)
WHERE n.lat = 40.769798 AND n.name STARTS WITH 'William' AND n.type IS NOT NULL
RETURN n.name AS name
Execution plan
+-----------------+----+-----------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| Operator        | Id | Details                                                                                             | Estimated Rows | Rows | DB Hits | Memory (Bytes) | Page Cache Hits/Misses | Time (ms) | Pipeline            |
+-----------------+----+-----------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| +ProduceResults |  0 | name                                                                                                |              0 |    0 |       0 |              0 |                        |           |                     |
| |               +----+-----------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+                        |           |                     |
| +Projection     |  1 | cache[n.name] AS name                                                                               |              0 |    0 |       0 |                |                        |           |                     |
| |               +----+-----------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+                        |           |                     |
| +Filter         |  2 | cache[n.lat] = $autodouble_0                                                                        |              0 |    0 |       0 |                |                        |           |                     |
| |               +----+-----------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+                        |           |                     |
| +NodeIndexSeek  |  3 | RANGE INDEX n:PointOfInterest(name, type, lat) WHERE name STARTS WITH $autostring_1 AND type IS NOT |              0 |    2 |       3 |            248 |                    2/0 |     0.807 | Fused in Pipeline 0 |
|                 |    | NULL AND lat IS NOT NULL, cache[n.name], cache[n.lat]                                               |                |      |         |                |                        |           |                     |
+-----------------+----+-----------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+

Total database accesses: 3, total allocated memory: 312

This plan now shows that, while a prefix search has been used to solve the name property predicate, the lat property predicate is no longer solved with an equality check, but rather with an existence check and an explicit filter operation afterward. Note that if the composite_2 index had not been dropped before the query was rerun, the planner would have used it instead of the composite_3 index.

This is because, when using composite indexes, any predicate after a prefix search will automatically be planned as an existence check predicate.

Composite index rules

  • If a query contains an equality check or a list membership check predicates, they need to be for the first properties defined when creating the composite index.

  • Queries utilizing a composite index can contain up to one range search or prefix search predicate.

  • There can be any number of existence check predicates.

  • Any predicates following a prefix search or an existence check will be planned as existence checks.

  • Suffix and substring search predicates can utilize composite indexes. However, they are always planned as an existence check and any subsequent query predicates will accordingly also be planned as such. Note that if these predicates are used, and a text index also exists on any of the indexed (STRING) properties, the planner will use the text index instead of a composite index.

These rules can be important when creating composite indexes, as some checks are more efficient than others. For instance, it is generally more efficient for the planner to perform an equality check on a property than an existence check. Depending on the queries and the application, it may, therefore, be cost-effective to consider the order in which properties are defined when creating a composite index.

Additionally, it bears repeating that composite indexes can only be used if a predicate filters on all the properties indexed by the composite index, and that composite indexes can only be created for range indexes.

Range index-backed ORDER BY

Range indexes store properties in ascending order (alphabetically for STRING values, and numerically for FLOAT and INTEGER values). This can have important implications for query performance, because the planner may be able to take advantage of a pre-existing index order and therefore not have to perform an expensive Sort operation later in the query.

To demonstrate this behavior, the following query will filter out any ROUTE relationships with a distance property less than 30, and return the distance property of the matched relationships in ascending numerical order using the ORDER BY clause.

Query to return order of results without a relevant index
PROFILE
MATCH ()-[r:ROUTE]-()
WHERE r.distance < 30
RETURN r.distance AS distance
ORDER BY distance
Execution plan
+---------------------------------+----+--------------------------------+----------------+-------+---------+----------------+------------------------+-----------+--------------+---------------------+
| Operator                        | Id | Details                        | Estimated Rows | Rows  | DB Hits | Memory (Bytes) | Page Cache Hits/Misses | Time (ms) | Ordered by   | Pipeline            |
+---------------------------------+----+--------------------------------+----------------+-------+---------+----------------+------------------------+-----------+--------------+---------------------+
| +ProduceResults                 |  0 | distance                       |           3013 |  6744 |       0 |              0 |                    0/0 |    14.397 |              |                     |
| |                               +----+--------------------------------+----------------+-------+---------+----------------+------------------------+-----------+              |                     |
| +Sort                           |  1 | distance ASC                   |           3013 |  6744 |       0 |         540472 |                    0/0 |    16.844 | distance ASC | In Pipeline 1       |
| |                               +----+--------------------------------+----------------+-------+---------+----------------+------------------------+-----------+--------------+---------------------+
| +Projection                     |  2 | cache[r.distance] AS distance  |           3013 |  6744 |       0 |                |                        |           |              |                     |
| |                               +----+--------------------------------+----------------+-------+---------+----------------+                        |           +--------------+                     |
| +Filter                         |  3 | cache[r.distance] < $autoint_0 |           3013 |  6744 |   10041 |                |                        |           |              |                     |
| |                               +----+--------------------------------+----------------+-------+---------+----------------+                        |           +--------------+                     |
| +UndirectedRelationshipTypeScan |  4 | (anon_0)-[r:ROUTE]-(anon_1)    |          10044 | 10041 |    5023 |            376 |                   84/0 |    22.397 |              | Fused in Pipeline 0 |
+---------------------------------+----+--------------------------------+----------------+-------+---------+----------------+------------------------+-----------+--------------+---------------------+

Total database accesses: 15064, total allocated memory: 540808

This plan shows two important points about indexes and the ordering of results:

  • Only the relationship type lookup index was used in this query (accessed by the UndirectedRelationshipTypeScan operator, which fetches all relationships and their start and end nodes from the relationship type index).

  • As a result, the planner has to perform a Sort operation to order the results by the distance property (in this case, it required 540472 bytes of memory).

To see how an index could impact the query plan, it is first necessary to create a range index on the distance property:

Create a range index on a relationship type property
CREATE INDEX range_index_relationships FOR ()-[r:ROUTE]-() ON (r.distance)

Re-running the query, it now generates a different plan:

Rerun query after the creation of a relevant index
PROFILE
MATCH ()-[r:ROUTE]-()
WHERE r.distance < 30
RETURN r.distance AS distance
ORDER BY distance
Execution plan
+-----------------------------------------+----+--------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+----------------+---------------------+
| Operator                                | Id | Details                                                                                          | Estimated Rows | Rows | DB Hits | Memory (Bytes) | Page Cache Hits/Misses | Time (ms) | Ordered by     | Pipeline            |
+-----------------------------------------+----+--------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+----------------+---------------------+
| +ProduceResults                         |  0 | distance                                                                                         |            301 | 6744 |       0 |              0 |                        |           |                |                     |
| |                                       +----+--------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+                        |           |                |                     |
| +Projection                             |  1 | cache[r.distance] AS distance                                                                    |            301 | 6744 |       0 |                |                        |           | distance ASC   |                     |
| |                                       +----+--------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+                        |           +----------------+                     |
| +UndirectedRelationshipIndexSeekByRange |  2 | RANGE INDEX (anon_0)-[r:ROUTE(distance)]-(anon_1) WHERE distance < $autoint_0, cache[r.distance] |            301 | 6744 |    3373 |            248 |                2361/10 |    76.542 | r.distance ASC | Fused in Pipeline 0 |
+-----------------------------------------+----+--------------------------------------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+----------------+---------------------+

Total database accesses: 3373, total allocated memory: 312

Focusing on the same two points in the plan, the following has changed:

  • The recently created range index on the relationship type property distance is now used.

  • As a result, the plan no longer needs to perform a Sort operation to order the results (because the distance property is already ordered by the index), and this substantially reduces the cost of the query (the total memory cost of the query is now 312 bytes).

In the same way, the order of a range index can significantly improve queries using the max() and min() functions.

Multiple index use

Indexes are principally used to find the starting points of patterns. If a query contains one MATCH clause, then, as a general rule, only the index that best suits the predicates in that clause will be selected by the planner. If, however, a query contains two or more MATCH clauses, it is possible to use several indexes.

To show multiple indexes used in one query, the following example will first create a new index on the lon (longitude) property for PointOfInterest nodes. It then uses a query that finds all PointOfInterest nodes north of the William Shakespeare statue in Central Park.

Create a range index on the longitude property
CREATE INDEX range_index_lon FOR (n:PointOfInterest) ON (n.lon)
Query to find all PointOfInterest nodes north of the William Shakespeare statue
PROFILE
MATCH (ws:PointOfInterest {name:'William Shakespeare'})
WITH ws
MATCH (poi:PointOfInterest)
WHERE poi.lon > ws.lon
RETURN poi.name AS name
Execution plan
+-------------------------+----+-----------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| Operator                | Id | Details                                                         | Estimated Rows | Rows | DB Hits | Memory (Bytes) | Page Cache Hits/Misses | Time (ms) | Pipeline            |
+-------------------------+----+-----------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| +ProduceResults         |  0 | name                                                            |              9 |  143 |       0 |              0 |                        |           |                     |
| |                       +----+-----------------------------------------------------------------+----------------+------+---------+----------------+                        |           |                     |
| +Projection             |  1 | poi.name AS name                                                |              9 |  143 |     283 |                |                        |           |                     |
| |                       +----+-----------------------------------------------------------------+----------------+------+---------+----------------+                        |           |                     |
| +Apply                  |  2 |                                                                 |              9 |  143 |       0 |                |                        |           |                     |
| |\                      +----+-----------------------------------------------------------------+----------------+------+---------+----------------+                        |           |                     |
| | +NodeIndexSeekByRange |  3 | RANGE INDEX poi:PointOfInterest(lon) WHERE lon > ws.lon         |              9 |  143 |     146 |           2280 |                  233/1 |     1.460 | Fused in Pipeline 1 |
| |                       +----+-----------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| +NodeIndexSeek          |  4 | RANGE INDEX ws:PointOfInterest(name) WHERE name = $autostring_0 |              2 |    1 |       2 |            376 |                    1/0 |     0.635 | In Pipeline 0       |
+-------------------------+----+-----------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+

Total database accesses: 431, total allocated memory: 2616

This plan shows that a separate index is used to improve the performance of each MATCH clause (first by utilizing the index on the name property to find the William Shakespeare node, and then by using the index on the lon property to find all nodes with a greater longitudinal value).

Indexes and null values

Neo4j indexes do not store null values. This means that the planner must be able to rule out the possibility of null values in order for queries to use an index.

The following query demonstrates the incompatibility between null values and indexes by counting all PointOfInterest nodes with an unset name property:

Query to count nodes with a null name value
PROFILE
MATCH (n:PointOfInterest)
WHERE n.name IS NULL
RETURN count(n) AS nodes
Table 6. Result
nodes

3

Rows:1

Execution plan
+-------------------+----+-------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| Operator          | Id | Details           | Estimated Rows | Rows | DB Hits | Memory (Bytes) | Page Cache Hits/Misses | Time (ms) | Pipeline            |
+-------------------+----+-------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| +ProduceResults   |  0 | nodes             |              1 |    1 |       0 |              0 |                    0/0 |     0.012 | In Pipeline 1       |
| |                 +----+-------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| +EagerAggregation |  1 | count(n) AS nodes |              1 |    1 |       0 |             32 |                        |           |                     |
| |                 +----+-------------------+----------------+------+---------+----------------+                        |           |                     |
| +Filter           |  2 | n.name IS NULL    |            141 |    3 |     373 |                |                        |           |                     |
| |                 +----+-------------------+----------------+------+---------+----------------+                        |           |                     |
| +NodeByLabelScan  |  3 | n:PointOfInterest |            188 |  188 |     189 |            376 |                  115/0 |     0.769 | Fused in Pipeline 0 |
+-------------------+----+-------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+

Total database accesses: 562, total allocated memory: 472

The plan shows that neither of the two available indexes (range and text) on the name property is used to solve the predicate.

However, if a query predicate is added which is able to exclude the presence of any null values, then an index can be used. The following query shows this by adding a substring predicate to the above query:

Query to count nodes with a null name value or nodes with a name property containing 'William'
PROFILE
MATCH (n:PointOfInterest)
WHERE n.name IS NULL OR n.name CONTAINS 'William'
RETURN count(n) AS nodes
Table 7. Result
nodes

5

Rows: 1

The query result now includes both the three nodes with an unset name value found in the previous query and the two nodes with a name value containing William (William Shakespeare and William Tecumseh Sherman).

Execution plan
+--------------------------+----+----------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| Operator                 | Id | Details                                                              | Estimated Rows | Rows | DB Hits | Memory (Bytes) | Page Cache Hits/Misses | Time (ms) | Pipeline            |
+--------------------------+----+----------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| +ProduceResults          |  0 | nodes                                                                |              1 |    1 |       0 |              0 |                    0/0 |     0.010 | In Pipeline 3       |
| |                        +----+----------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| +EagerAggregation        |  1 | count(n) AS nodes                                                    |              1 |    1 |       0 |             32 |                        |           |                     |
| |                        +----+----------------------------------------------------------------------+----------------+------+---------+----------------+                        |           |                     |
| +Distinct                |  2 | n                                                                    |            141 |    5 |       0 |            352 |                        |           |                     |
| |                        +----+----------------------------------------------------------------------+----------------+------+---------+----------------+                        |           |                     |
| +Union                   |  3 |                                                                      |            142 |    5 |       0 |            352 |                    0/0 |     0.220 | Fused in Pipeline 2 |
| |\                       +----+----------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| | +NodeIndexContainsScan |  4 | TEXT INDEX n:PointOfInterest(name) WHERE name CONTAINS $autostring_0 |              1 |    2 |       3 |            376 |                    4/0 |     0.456 | In Pipeline 1       |
| |                        +----+----------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| +Filter                  |  5 | n.name IS NULL                                                       |            141 |    3 |     373 |                |                        |           |                     |
| |                        +----+----------------------------------------------------------------------+----------------+------+---------+----------------+                        |           |                     |
| +NodeByLabelScan         |  6 | n:PointOfInterest                                                    |            188 |  188 |     189 |            376 |                  115/0 |     0.673 | Fused in Pipeline 0 |
+--------------------------+----+----------------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+

Total database accesses: 565, total allocated memory: 1352

This plan shows that an index is only used to solve the second part of the WHERE clause, which excludes the presence of null values.

The presence of null values within an indexed property therefore does not negate the use of an index. Index use is only negated if the planner is unable to rule out the inclusion of any unset properties in the matching process.

The presence of null values may not be known in advance, and this can cause unexpected instances of indexes not being used. There are, however, a few strategies to ensure that an index will be used.

Property existence checks

One method to ensure an index is used is to explicitly filter out any null values by appending IS NOT NULL to the queried property. The following example uses the same query as above but exchanges IS NULL with IS NOT NULL in the WHERE clause:

Query to count PointOfInterest nodes without a null name value
PROFILE
MATCH (n:PointOfInterest)
WHERE n.name IS NOT NULL
RETURN count(n) AS nodes
Table 8. Result
nodes

185

Rows: 1

Execution plan
+-------------------+----+------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| Operator          | Id | Details                                                    | Estimated Rows | Rows | DB Hits | Memory (Bytes) | Page Cache Hits/Misses | Time (ms) | Pipeline            |
+-------------------+----+------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| +ProduceResults   |  0 | nodes                                                      |              1 |    1 |       0 |              0 |                    0/0 |     0.013 | In Pipeline 1       |
| |                 +----+------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| +EagerAggregation |  1 | count(n) AS nodes                                          |              1 |    1 |       0 |             32 |                        |           |                     |
| |                 +----+------------------------------------------------------------+----------------+------+---------+----------------+                        |           |                     |
| +NodeIndexScan    |  2 | RANGE INDEX n:PointOfInterest(name) WHERE name IS NOT NULL |            185 |  185 |     186 |            376 |                    0/1 |     0.691 | Fused in Pipeline 0 |
+-------------------+----+------------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+

Total database accesses: 186, total allocated memory: 472

This plan shows that the previously created range index on the name property is now used to solve the predicate.

Text indexes and type predicate expressions

Text indexes require that predicates only include STRING properties.

To use text indexes in situations where any of the queried properties may be either of an incompatible type or null rather than a STRING value, add the type predicate expression IS :: STRING NOT NULL (or its alias, introduced in Neo4j 5.14, IS :: STRING!) to the query. This will enforce both the existence of a property and its STRING type, discarding any rows where the property is missing or not of type STRING, and thereby enable the use of text indexes.

For example, if the WHERE predicate in the previous query is altered to instead append IS :: STRING NOT NULL, then the text index rather than the range index is used (range indexes do not support type predicate expressions):

Query using a type predicate expression
PROFILE
MATCH (n:PointOfInterest)
WHERE n.name IS :: STRING NOT NULL
RETURN count(n) AS nodes
Execution plan
+-------------------+----+-----------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| Operator          | Id | Details                                                   | Estimated Rows | Rows | DB Hits | Memory (Bytes) | Page Cache Hits/Misses | Time (ms) | Pipeline            |
+-------------------+----+-----------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| +ProduceResults   |  0 | nodes                                                     |              1 |    1 |       0 |              0 |                    0/0 |     0.009 | In Pipeline 1       |
| |                 +----+-----------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| +EagerAggregation |  1 | count(n) AS nodes                                         |              1 |    1 |       0 |             32 |                        |           |                     |
| |                 +----+-----------------------------------------------------------+----------------+------+---------+----------------+                        |           |                     |
| +NodeIndexScan    |  2 | TEXT INDEX n:PointOfInterest(name) WHERE name IS NOT NULL |            185 |  185 |     186 |            376 |                    0/0 |     0.343 | Fused in Pipeline 0 |
+-------------------+----+-----------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+

Total database accesses: 186, total allocated memory: 472
While type predicate expressions were introduced in Neo4j 5.9, the IS :: STRING NOT NULL syntax only became an index-compatible predicate in Neo4j 5.15. For more information, see the page about type predicate expressions.

The toString function can also be used to convert an expression to STRING values, and thereby help the planner to select a text index.

Type constraints

For indexes that are compatible only with specific types (i.e. text and point indexes), the Cypher planner needs to deduce that a predicate will evaluate to null for non-compatible values in order to use the index. If a predicate is not explicitly defined as the required type (STRING or POINT), this can lead to situations where a text or point index is not used.

Since type constraints guarantee that a property is always of the same type, they can be used to extend the scenarios in which text and point indexes are compatible with a predicate.

To show this, the following example will first drop the existing range index on the name property (this is necessary because type constraints only extend the compatibility of type-specific indexes - range indexes are not limited by a value type). It will then run the same query with a WHERE predicate on the name property (for which there exists a previously created text index) before and after creating a type constraint, and compare the resulting execution plans.

Drop range index
DROP INDEX range_index_name
Query to count PointOfInterest nodes without a null name value
PROFILE
MATCH (n:PointOfInterest)
WHERE n.name IS NOT NULL
RETURN count(n) AS nodes
Execution plan
+-------------------+----+--------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| Operator          | Id | Details            | Estimated Rows | Rows | DB Hits | Memory (Bytes) | Page Cache Hits/Misses | Time (ms) | Pipeline            |
+-------------------+----+--------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| +ProduceResults   |  0 | nodes              |              1 |    1 |       0 |              0 |                    0/0 |     0.012 | In Pipeline 1       |
| |                 +----+--------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| +EagerAggregation |  1 | count(n) AS nodes  |              1 |    1 |       0 |             32 |                        |           |                     |
| |                 +----+--------------------+----------------+------+---------+----------------+                        |           |                     |
| +Filter           |  2 | n.name IS NOT NULL |            187 |  185 |     373 |                |                        |           |                     |
| |                 +----+--------------------+----------------+------+---------+----------------+                        |           |                     |
| +NodeByLabelScan  |  3 | n:PointOfInterest  |            188 |  188 |     189 |            376 |                  259/0 |     0.363 | Fused in Pipeline 0 |
+-------------------+----+--------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+

Total database accesses: 562, total allocated memory: 472

This plan shows that the available text index on the name property was not used to solve the predicate. This is because the planner was not able to deduce that all name values are of type STRING.

However, if a type constraint is created to ensure that all name properties have a STRING value, a different query plan is generated.

Create STRING type constraint on the name property
CREATE CONSTRAINT type_constraint
FOR (n:PointOfInterest) REQUIRE n.name IS :: STRING
Rerun the query after the creation of a type constraint
PROFILE
MATCH (n:PointOfInterest)
WHERE n.name IS NOT NULL
RETURN count(n) AS nodes
Execution plan
+-------------------+----+-----------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| Operator          | Id | Details                                                   | Estimated Rows | Rows | DB Hits | Memory (Bytes) | Page Cache Hits/Misses | Time (ms) | Pipeline            |
+-------------------+----+-----------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| +ProduceResults   |  0 | nodes                                                     |              1 |    1 |       0 |              0 |                    0/0 |     0.013 | In Pipeline 1       |
| |                 +----+-----------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+
| +EagerAggregation |  1 | count(n) AS nodes                                         |              1 |    1 |       0 |             32 |                        |           |                     |
| |                 +----+-----------------------------------------------------------+----------------+------+---------+----------------+                        |           |                     |
| +NodeIndexScan    |  2 | TEXT INDEX n:PointOfInterest(name) WHERE name IS NOT NULL |            187 |  185 |     186 |            376 |                    0/0 |     0.328 | Fused in Pipeline 0 |
+-------------------+----+-----------------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+---------------------+

Total database accesses: 186, total allocated memory: 472

Because of the type constraint on the name property, the planner is now able to deduce that all name properties are of type STRING, and therefore use the available text index.

Point indexes can be extended in the same way if a type constraint is created to ensure that all properties are POINT values.

Note that property existence constraints do not currently leverage index use in the same way.

Heuristics: deciding what to index

While it is impossible to give exact directions on when a search-performance index might be beneficial for a particular use-case, the following points provide some useful heuristics for when creating an index might improve query performance:

  • Frequent property-based queries: if some properties are used frequently for filtering or matching, consider creating an index on them.

  • Performance optimization: If certain queries are too slow, re-examine the properties that are filtered on, and consider creating indexes for those properties that may cause bottlenecking.

  • High cardinality properties: high cardinality properties have many distinct values (e.g., unique identifiers, timestamps, or user names). Queries that seek to retrieve such properties will likely benefit from indexing.

  • Complex queries: if queries traverse complex paths in a graph (for example, by involving multiple hops and several layers of filtering), adding indexes to the properties used in those queries can improve query performance.

  • Experiment and test: It is good practice to experiment with different indexes and query patterns, and to measure the performance of critical queries with and without different indexes to evaluate their effectiveness.

Over-indexing: considerations and solutions

Search-performance indexes can significantly improve query performance. They should, however, be used judiciously for the following reasons:

  • Storage space: because each index is a secondary copy of the data in the primary database, each index essentially doubles the amount of storage space occupied by the indexed data.

  • Slower write queries: adding indexes impacts the performance of write queries. This is because indexes are updated with each write query. If a system needs to perform a lot of writes quickly, it may be counterproductive to have an index on the affected data entities. In other words, if write performance is crucial for a particular use case, it may be beneficial to only add indexes where they are necessary for read purposes.

As a result of these two points, deciding what to index (and what not to index) is an important and non-trivial task.

Keeping track of index-use: lastRead, readCount, and trackedSince

Unused indexes take up unnecessary storage space and it may be beneficial to remove them. Knowing which indexes are most frequently used by the queries against a database can, however, be difficult. As of Neo4j 5.8, there are three relevant columns returned by the SHOW INDEX command which can help identify redundant indexes:

  • lastRead: returns the last time the index was used for reading.

  • readCount: returns the number of read queries issued to the index.

  • trackedSince returns the time when usage statistics tracking started for an index.[2]

To return these values (along with other relevant information) for the indexes in a database, run the following query:

Query to identify redundant indexes
SHOW INDEX YIELD name, type, entityType, labelsOrTypes, properties, lastRead, readCount, trackedSince

If any unused indexes are identified, it may be beneficial to delete them using the DROP INDEX command.

Summary

  • Range indexes can be used to solve most predicates.

  • Text indexes are used over range indexes for CONTAINS and ENDS WITH predicates on STRING properties, and if the queried STRING properties exceed 8 kb.

  • Point indexes are used when queries filter on distances and bounding boxes.

  • Token lookup indexes only solve node label and relationship type predicates. They do not solve any property predicates. Deleting token lookup indexes will negatively impact query performance.

  • Composite indexes are only used if the query filters on all properties indexed by the composite index. The order in which the properties are defined when creating a composite index impacts how the planner solves query predicates.

  • Queries ordering results using ORDER BY can leverage the pre-existing order in range indexes and thereby improve query performance.

  • A Cypher query can use several indexes if the planner deems it beneficial to the performance of a query.

  • Neo4j indexes do not store null values, and the planner must be able to rule out any entities with properties containing null values in order to use an index. There are several strategies to ensure the use of indexes.

  • The columns lastRead, readCount, and trackedSince returned by the SHOW INDEX command can be used to identify redundant indexes that take up unnecessary space.


1. The example queries on this page are prepended with PROFILE. This both runs the query and generates its execution plan. For more information, see Execution plans and query tuning → Note on PROFILE and EXPLAIN.
2. The trackedSince column is not part of the default return columns for the SHOW INDEXES command. To return this and all other non-default columns, use SHOW INDEXES YIELD *. For more information, see Create, show, and delete indexes → Result columns for listing indexes.