Index configuration

How to configure indexes to enhance performance in search, and to enable full-text search.

1. Introduction

In Neo4j there are three different index types: b-tree, full-text, and token lookup.

All three types of indexes can be created and dropped using Cypher. All three types can also be used to index both nodes and relationships. The token lookup index is the only index present by default in the database.

B-tree and full-text indexes provide mapping from a property value to an entity (node or relationship). Token lookup indexes differ a bit in that they do not work with properties but instead provide mapping from labels to nodes or from relationship types to relationships. Users are not required to know the difference between b-tree and token lookup indexes in order to use an index, since Cypher’s query planner decides which index to use in which situation. B-tree indexes are good for exact look-ups on all types of values, range scans, full scans, and prefix searches.

For details on the configuration aspects of b-tree indexes, see B-tree indexes.

Full-text indexes differ from b-tree indexes, in that they are optimized for indexing and searching text. They are used for queries that demand an understanding of language and they only index string data. They must also be queried explicitly via procedures, as Cypher will not make plans that rely on them.

An example of a use case for full-text indexes is parsing a book for a certain term and taking advantage of the knowledge that the book is written in a certain language. The use of an analyzer for that language will, among other things, enable you to exclude words that are not relevant for the search (for example "if" and "and"), and include conjugations of words that are.

Another use case example is indexing the various address fields and text data in a corpus of emails. Indexing this data using the email analyzer would enable someone to find all emails that are sent from, or to, or mentions, an email account.

In contrast to b-tree indexes, full-text indexes are queried using built-in procedures. They are however created and dropped using Cypher. The use of full-text indexes do require a familiarity with how the indexes operate.

For details on the configuration aspects of full-text indexes, see Full-text indexes.

For details on creating, querying and dropping full-text indexes, see Cypher Manual → Indexes to support full-text search.

The type of an index can be identified according to the table below:

Index type Cypher command Core API

B-tree index

SHOW INDEXES#BTREE

org.neo4j.graphdb.schema.IndexType#BTREE

Full-text index

SHOW INDEXES#FULLTEXT

org.neo4j.graphdb.schema.IndexType#FULLTEXT

Token lookup index

SHOW INDEXES#LOOKUP

org.neo4j.graphdb.schema.IndexType#LOOKUP

2. B-tree indexes

B-tree indexes can be backed by two different index providers, native-btree-1.0 and lucene+native-3.0. If not explicitly set, native-btree-1.0 will be used.

For more information on the different index types, refer to Cypher Manual → Indexes for search performance and Cypher Manual → Indexes to support full-text search.

2.1. Limitations

There are a few limitations for b-tree indexes, listed below together with suggested workarounds.

2.1.1. Limitations for queries using CONTAINS and ENDS WITH

The index provider native-btree-1.0 has limited support for ENDS WITH and CONTAINS queries. These queries will not be able to do an optimized search as per queries that use STARTS WITH, =, and <>. Instead, the index result will be a stream of an index scan with filtering.

In the future, ENDS WITH and CONTAINS queries will be supported with full-text indexes, but for now the index provider lucene+native-3.0 can be used instead. Please note that lucene+native-3.0 only has support for ENDS WITH and CONTAINS for single property strings.

2.1.2. Limitations on key size

The index provider native-btree-1.0 has a key size limit of around 8kB.

If a transaction reaches the key size limit for one or more of its changes, that transaction will fail before committing any changes. If the limit is reached during index population, the resulting index will be in a failed state, and as such will not be usable for any queries.

If this is an issue, you can use the index provider lucene+native-3.0 instead. This provider has a key size limit for single property strings of around 32kB.

2.1.3. Workarounds to address limitations

To workaround problems with key size, or performance issues related to ENDS WITH or CONTAINS, you can use the index provider lucene+native-3.0. This only works for single-property string indexes.

This can be done using either of the following methods:

Option 1. Use OPTIONS clause in create command (recommended)

The Cypher commands for index creation, unique property constraint creation, and node key creation contains an optional OPTIONS clause. This clause can be used to specify index provider.

For details on indexes, see Cypher Manual → Indexes for search performance. For details on constraints, see Cypher manual → Constraints.

Option 2. Use a built-in procedure Deprecated

Please note that this option uses built-in procedures that have been deprecated, and will be removed in a future release. These have been replaced with the Cypher commands in Option 1.

The built-in procedures db.createIndex, db.createUniquePropertyConstraint, and db.createNodeKey can be used to specify index provider on index creation, unique property constraint creation, and node key creation.

For details on constraints, see Cypher manual → Constraints, and for more information on built-in procedures, see Procedures.

Option 3. Change the config Deprecated

Please note that this option uses the index setting dbms.index.default_schema_provider, which has been deprecated and will be removed in a future release. It will be a fully internal concern which index provider an index is using.

  1. Configure the setting dbms.index.default_schema_provider to the one required.

  2. Restart Neo4j.

  3. Drop and recreate the relevant index.

  4. Change dbms.index.default_schema_provider back to the original value.

  5. Restart Neo4j.

    The recommended way to set index provider for an index is to use the OPTIONS clause for index creation, unique property constraint creation, and node key creation.

2.2. Index migration

When upgrading a 3.5 store to 4.3.2, all indexes will be upgraded to the latest index version, and rebuilt automatically, with the exception for the indexes that were previously using Lucene for single-property strings. They will be upgraded to a fallback version which still use Lucene for those properties. Please note that they will still need to be rebuilt.

2.3. Procedures to create index and index backed constraint Deprecated

Indexes and constraints are best created through Cypher, but can still be created through the deprecated procedures described in the example below. Index provider and index settings can both be specified using the optional OPTIONS clause for the Cypher commands.

Example 1. Example of procedures to create index and index backed constraint Deprecated

The following procedures provide the option to specify both index provider and index settings (optional). Note that settings keys need to be escaped with back-ticks if they contain dots.

Use db.createIndex procedure to create an index:

CALL db.createIndex("MyIndex", ["Person"], ["name"], "native-btree-1.0", {`spatial.cartesian.max`: [100.0,100.0], `spatial.cartesian.min`: [-100.0,-100.0]})

If a settings map is not provided, the settings will be picked up from the Neo4j config file, the same way as when creating an index or constraint through Cypher.

CALL db.createIndex("MyIndex", ["Person"], ["name"], "native-btree-1.0")

Use db.createUniquePropertyConstraint to create a node property uniqueness constraint (the example is without settings map, left out for abbreviation):

CALL db.createUniquePropertyConstraint("MyIndex", ["Person"], ["name"], "native-btree-1.0")

Use db.createNodeKey to create node key constraint (the example is without settings map, left out for abbreviation):

CALL db.createNodeKey("MyIndex", ["Person"], ["name"], "native-btree-1.0")

3. Full-text indexes

Full-text indexes are powered by the Apache Lucene indexing and search library. A full-text index enables you to write queries that matches within the contents of indexed string properties. A full description on how to create and use full-text indexes is provided in the Cypher Manual → Indexes to support full-text search.

3.1. Configuration

The following options are available for configuring full-text indexes:

dbms.index.fulltext.default_analyzer

The name of the analyzer that the full-text indexes should use by default. This setting only has effect when a full-text index is created, and will be remembered as an index-specific setting from then on.

The list of possible analyzers is available through the db.index.fulltext.listAvailableAnalyzers() Cypher procedure.

Unless otherwise specified, the default analyzer is standard-no-stop-words, which is the same as the StandardAnalyzer from Lucene, except no stop-words are filtered out.

dbms.index.fulltext.eventually_consistent

Used to declare whether full-text indexes should be eventually consistent, or not. This setting only has effect when a full-text index is created, and will be remembered as an index-specific setting from then on.

Indexes are normally fully consistent, and the committing of a transaction does not return until both the store and the indexes have been updated. Eventually consistent full-text indexes, on the other hand, are not updated as part of commit, but instead have their updates queued up and applied in a background thread. This means that there can be a short delay between committing a change, and that change becoming visible via any eventually consistent full-text indexes. This delay is just an artifact of the queueing, and will usually be quite small since eventually consistent indexes are updated "as soon as possible".

By default, this is turned off, and full-text indexes are fully consistent.

dbms.index.fulltext.eventually_consistent_index_update_queue_max_length

Eventually consistent full-text indexes have their updates queued up and applied in a background thread, and this setting determines the maximum size of that update queue. If the maximum queue size is reached, then committing transactions will block and wait until there is more room in the queue, before adding more updates to it.

This setting applies to all eventually consistent full-text indexes, and they all use the same queue. The maximum queue length must be at least 1 index update, and must be no more than 50 million due to heap space usage considerations.

The default maximum queue length is 10.000 index updates.

4. Token lookup indexes

Token lookup indexes, as the name suggests, are used to look up nodes with a specific label or relationships of a specific type. A token lookup index is always created over all labels or relationship types, respectively, and hence there can only be a maximum of two token lookup indexes in a database - one for nodes and one for relationships. Token lookup indexes are introduced in 4.3 and whereas relationship type lookup index is a new concept, node label lookup index is not. The latter evolved from the label scan store, which has been present in various forms for a long time. Node label lookup index provides the same functionality as the former label scan store, but has additional features that are common for all indexes, such as the ability to be created and dropped using non-blocking population.

4.1. Use and Significance

Token lookup indexes are the most important indexes that can be present in a database. They are essential for both Cypher queries and Core API operations. More importantly, their presence speeds up the population of other indexes significantly, node label lookup index for node b-tree and full-text indexes and relationship type lookup index for the corresponding relationship indexes.

The node label lookup index is important for queries that match a node by one or more labels. It can be used even when matching labels and properties of a node, if there are no suitable b-tree indexes available. This is essential considering that no b-tree indexes are defined by default. In other words, a node label lookup index is often the best way to approach a query that matches labels, unless the user has defined a more appropriate b-tree index. Accordingly, the relationship type lookup index does the same for relationships and their types.

Most queries are executed by matching nodes and expanding their relationships, and hence the node label lookup index is slightly more significant than the relationship type lookup index.

Since these indexes are important for both query execution and index population, a lot of consideration should be taken before dropping them.

Both node and relationship type lookup index are present by default in all databases created in 4.3 and onwards. Please see the next section for details on databases created in earlier versions.

4.2. Databases created before 4.3

Databases created before 4.3 will get only a node label lookup index when used in a DBMS of version 4.3 or later, by default. This is to preserve backwards compatibility and performance characteristics of such databases.

If needed, such databases can get a relationship type lookup index by creating it explicitly through Cypher.

Creating relationship type lookup index on a large database can take significant amount of time as all relationships need to be scanned when populating such index.

When used in a DBMS of version 4.3 or later, databases created before 4.3 will automatically get a node label lookup index which is created by converting the former label scan store and naming it __org_neo4j_schema_index_label_scan_store_converted_to_token_index. This index name is reserved from 4.3 onwards and an error will be returned if you attempt to create a user-defined index with this name. Similarly, in the unlikely situation that an index with such name was created in previous versions, it must be dropped and recreated with a different name before upgrading to 4.3.

The following table summarizes which of token lookup indexes and label scan store are present by default in various versions. Note that the table represents only the default indexes and that the relationship type lookup index can be created explicitly through Cypher, if needed.

Database created before 4.3 after 4.3

Neo4j version

< 4.3

>= 4.3

Label scan store

yes

no

no

Node label lookup index

no

yes

yes

Relationship type lookup index

no

no

yes