This page describes how to configure Neo4j indexes to enhance search performance and enable full-text search. The supported index types are:
All types of indexes can be created and dropped using Cypher and they can also all be used to index both nodes and relationships. The token lookup index is the only index present by default in the database.
Range, point, text, and full-text indexes provide mapping from a property value to an entity (node or relationship). Token lookup indexes are different and provide mapping from labels to nodes, or from relationships types to relationships, instead of between properties and entities.
When you write a Cypher query, you do not need to specify which indexes to use. Cypher’s query planner decides which of the available indexes to use.
The rest of this page provides information on the available indexes and their configuration aspects. For further details on creating, querying, and dropping indexes, see Cypher Manual → Indexes for search performance and Cypher Manual → Indexes to support full-text search.
The type of an index can be identified according to the table below:
Token lookup index
You cannot have indexes of the same type over the same properties.
Range indexes can be used for exact lookups on all types of values, range scans, full scans, and prefix searches.
Range indexes are the most general purpose of the property indexes, as they support all value types and wide range of operations.
Range has a key size limit of around 8kB.
If a transaction reaches the key size limit for one or more of its changes, that transaction fails before committing any changes. If the limit is reached during index population, the resulting index is in a failed state, and as such is not usable for any queries.
Since text index has key size limit of around 32kB, the key size limit of range index can be worked around by using a text index instead. However, text index is not a general purpose index like range index, so this workaround cannot be applied to all cases. For more information, see Text indexes.
Point indexes are a type of highly-specialized, single-property index and they only index properties with Point values, unlike range indexes.
Point indexes are designed to speed up spatial queries, specifically the
bounding box queries.
Exact lookups are the only non-spatial query that this index type supports.
For more information on the queries a point index can be used for, refer to Cypher Manual → Query Tuning → The use of indexes.
Point index optionally accepts configuration properties for tuning the behavior of spatial search. For more information on configuring point index, refer to Cypher Manual → Indexes for search performance.
Text indexes are a type of single-property index. Unlike range indexes, text indexes index only properties with string values.
Text indexes are specifically designed to deal with
ENDS WITH or
CONTAINS queries efficiently.
They are used through Cypher and they support a smaller set of string queries.
Even though text indexes do support other text queries,
ENDS WITH or
CONTAINS queries are the only ones for which this index type provides an advantage over a range index.
For more information on the queries a text index can be used for, refer to Cypher Manual → Query Tuning → The use of indexes.
For more information on the different index types, refer to Cypher Manual → Indexes for search performance.
Neo4j 5.1 introduces an improved index provider,
Text indexes only index single property strings.
The index has a key size limit for single property strings of around 32kB. If a transaction reaches the key size limit for one or more of its changes, that transaction fails before committing any changes. If the limit is reached during index population, the resulting index is in a failed state, and as such is not usable for any queries.
Full-text indexes are optimized for indexing and searching texts.
Even though text and full-text indexes might seem to solve very similar problems, there are essential differences. Unlike text indexes, which index only single property strings, full-text indexes can index any kind of string data. Text indexes solve substring match and exact string match according to the semantics defined by the Cypher language. While, full-text indexes use pluggable analyzers, many of which provide language-specific processing of the text that allows for more sophisticated queries than a simple substring match. Depending on which analyzer is used, the full-text index can be used for different text search types, such as exact matches, relevance matches, phrase queries, autocompletion, and many others. Additionally, the results are ordered by relevance.
An example of a use case for full-text indexes is parsing a book for a certain term and taking advantage of the knowledge that the book is written in a certain language. The use of an analyzer for that language enables the exclusion of stop words, such as "if" and "and", and the inclusion of word forms.
Another use case example is indexing the various address fields and text data in a corpus of emails.
In contrast to range and text indexes, full-text indexes are queried using built-in procedures. They are however created and dropped using Cypher. The use of full-text indexes does require familiarity with how those indexes operate.
Full-text indexes are powered by the Apache Lucene indexing and search library. A full description on how to create and use full-text indexes is provided in the Cypher Manual → Indexes to support full-text search.
The following options are available for configuring full-text indexes. For a complete list of Neo4j procedures, see [reference/procedures/].
The name of the default analyzer when creating a new Full-text index. Once created, the index’s analyzer is not affected by this setting.
The default consistency model when creating a new full-text index. Once created, the index’s consistency model is not affected by this setting.
Indexes are normally fully consistent, and the committing of a transaction does not return until both the store and indexes are updated. Eventually consistent full-text indexes, on the other hand, are not updated as part of a commit but instead have their updates queued up and applied in a background thread. This means that there can be a short delay between committing a change and that change becoming visible via any eventually consistent full-text indexes. This delay is just an artifact of the queueing and is usually relatively small since eventually consistent indexes are updated "as soon as possible".
By default, this is turned off, and full-text indexes are fully consistent.
Eventually consistent full-text indexes have their updates queued up and applied in a background thread, and this setting determines the maximum size of that update queue. If the maximum queue size is reached, then committing transactions block and wait until there is more room in the queue before adding more updates to it.
This setting applies to all eventually consistent full-text indexes, and they all use the same queue. The maximum queue length must be at least 1 index update and no more than 50 million due to heap space usage considerations.
The default maximum queue length is 10.000 index updates.
By default, the full-text index uses the
standard-no-stop-words analyzer, specified in
db.index.fulltext.default_analyzer configuration setting.
This analyzer is the same as the Lucene’s
StandardAnalyzer , except no stop-words are filtered out.
To specify another analyzer, use the
OPTIONS clause of the full-text index creation command.
The list of all possible analyzers is available via the
db.index.fulltext.listAvailableAnalyzers() Cypher procedure.
By default, the analyzer analyzes both the indexed values and query string. In some cases, however, using different analyzers for the indexed values and query string is more appropriate. You can do that by specifying an analyzer for the query string when using the full-text search procedures.
For a detailed information on how to create and use full-text indexes, see the Cypher Manual → Indexes to support full-text search.
A full-text index can be created over multiple properties.
If different analyzers for different properties are required, the standard approach in Lucene is to create a custom Composite analyzer.
The Lucene project provides
PerFieldAnalyzerWrapper that can associate analyzers with specific fields.
For more information, see the Lucene official documentation.
Token lookup indexes are used to look up nodes with a specific label or relationships of a specific type. They are always created over all labels or relationship types. Therefore, a databases can have a maximum of two token lookup indexes - one for nodes and one for relationships.
Token lookup indexes are the most important indexes as they significantly speed up the population of other indexes. They are also essential for the Cypher queries execution and Core API operations. Therefore, dropping them should be carefully considered.
The node label lookup index is important for queries that match a node by one or more labels. It can also be used for matching labels and properties of a node when there are no suitable indexes available. Likewise, the relationship type lookup index is important for queries that match relationships by their types.
Most queries are executed by matching nodes and expanding their relationships. Hence, the node label lookup index is slightly more significant than the relationship type lookup index.
Both node and relationship type lookup index are present by default in all databases created in 4.3 and onwards.
Databases created before 4.3 do not get relationship lookup index automatically, in order to preserve backwards compatibility and performance characteristics of such databases.
If needed, such databases can get a relationship type lookup index by creating it explicitly through Cypher.
Creating relationship type lookup index on a large database can take significant amount of time, as all relationships need to be scanned when populating such index.