13.2. Index configuration

This section describes how to configure indexes to enhance performance in search, and to enable full-text search.

This section contains the following:

13.2.1. Introduction

In Neo4j there are two different index types: b-tree and full-text.

B-tree indexes can be created and dropped using Cypher. Users typically do not have to know about the index in order to use it, since Cypher’s query planner decides which index to use in which situation. B-tree indexes are good at exact look-ups on all types of values, and range scans, full scans, and prefix searches.

For details on the configuration aspects of b-tree indexes, see Section 13.2.2, “B-tree indexes”.

Full-text indexes differ from b-tree indexes, in that they are optimised for indexing and searching text. They are used for queries that demand an understanding of language, and they only index string data. They must also be queried explicitly via procedures, as Cypher will not make plans that rely on them.

An example of a use case for full-text indexes is parsing a book for a certain term, and taking advantage of the knowledge that the book is written in a certain language. The use of an analyzer for that language will, among other things, enable you to exclude words that are not relevant for the search (for example "if" and "and"), and include conjugations of words that are.

Another use case example is indexing the various address fields and text data in a corpus of emails. Indexing this data using the email analyzer would enable someone to find all emails that are sent from, or to, or mentions, an email account.

In contrast to b-tree indexes, full-text indexes are created, queried, and dropped using built-in procedures. The use of full-text indexes do require a familiarity with how the indexes operate.

For details on the configuration aspects of full-text indexes, see Section 13.2.3, “Full-text indexes”.

For details on creating, querying and dropping full-text indexes, see Cypher Manual → Indexes to support full-text search.

The type of an index can be identified according to the table below:

Index type Procedure Core API

B-tree index

db.indexes#BTREE

org.neo4j.graphdb.schema.IndexType#BTREE

Full-text index

db.indexes#FULLTEXT

org.neo4j.graphdb.schema.IndexType#FULLTEXT

13.2.2. B-tree indexes

B-tree indexes can be backed by two different index providers, native-btree-1.0 and the deprecated lucene+native-3.0. If not explicitly set, native-btree-1.0 will be used.

For more information on the different index types, refer to Cypher Manual → Indexes.

Deprecated index providers

Index provider lucene+native-3.0 has been deprecated, and will be removed in a future release.

The recommended index provider to use is native-btree-1.0.

The only reason to use a deprecated provider should be due to the limitations, as described in the section called “Limitations for queries using CONTAINS and ENDS WITH or the section called “Limitations on key size”.

13.2.2.1. Limitations

In this section a few limitations for b-tree indexes are described, together with suggested workarounds.

Limitations for queries using CONTAINS and ENDS WITH

The index provider native-btree-1.0 has limited support for ENDS WITH and CONTAINS queries. These queries will not be able to do an optimized search as per queries that use STARTS WITH, =, and <>. Instead, the index result will be a stream of an index scan with filtering.

In the future, ENDS WITH and CONTAINS queries will be supported with full-text indexes, but for now the deprecated index provider lucene+native-3.0 can be used instead. Please note that lucene+native-3.0 only has support for ENDS WITH and CONTAINS for single property strings.

Limitations on key size

The index provider native-btree-1.0 has a key size limit of around 8kB.

If a transaction reaches the key size limit for one or more of its changes, that transaction will fail before committing any changes. If the limit is reached during index population, the resulting index will be in a failed state, and as such will not be usable for any queries.

If this is an issue, you can use the deprecated index provider lucene+native-3.0 instead. This provider has a key size limit for single property strings of around 32kB.

Please note that lucene+native-3.0 has been deprecated and will be removed in the future, whereby 8kB will be the key size limit for b-tree indexes. The recommended option for such large values is to use full-text index.

Workarounds to address limitations

To workaround problems with key size, or performance issues related to ENDS WITH or CONTAINS, you can use the deprecated index provider lucene+native-3.0. This only works for single-property string indexes.

This can be done using either of the following methods:

Option 1. Use a built-in procedure (recommended)

There are built-in procedures that can be used to specify index provider on index creation, unique property constraint creation, and node key creation (for details on constraints, see Cypher manual → Constraints.

For more information, see Built-in procedures.

Option 2. Change the config
  1. Configure the setting dbms.index.default_schema_provider to the one required.
  2. Restart Neo4j.
  3. Drop and recreate the relevant index.
  4. Change dbms.index.default_schema_provider back to the original value.
  5. Restart Neo4j.

Please note that the index setting dbms.index.default_schema_provider has been deprecated, and will be removed in a future release. It will be a fully internal concern which index provider an index is using.

The recommended way to set index provider for an index is to use the built in procedures for index creation, unique property constraint creation, and node key creation.

For more information, see Built-in procedures

13.2.2.2. Index migration

When upgrading a 3.5 store to 4.0.0, all indexes will be upgraded to the latest index version, and rebuilt automatically, with the exception for the indexes that were previously using Lucene for single-property strings. They will be upgraded to a fallback version which still use Lucene for those properties. Please note that they will still need to be rebuilt.

The table below shows the migration mapping:

Index provider in 3.5 Index provider in 4.0

native-btree-1.0

native-btree-1.0

lucene+native-2.0

native-btree-1.0

lucene+native-1.0

lucene+native-3.0

lucene-1.0

lucene+native-3.0

The caching of indexes takes place in different memory areas for different index providers. See Section 13.1, “Memory configuration”. It can be useful to run neo4j-admin memrec --database before and after the rebuilding of indexes, and adjust memory settings in accordance with the findings.

13.2.2.3. Procedures to create index and index backed constraint

Indexes and constraints are best created through Cypher, but when these indexes or constraints need to be more specifically configured than what is possible through Cypher, then you can use the procedures described in the example below.

Example 13.4. Example of procedures to create index and index backed constraint

The following procedures provide the option to specify both index provider and index settings (optional). Note that settings keys need to be escaped with back-ticks if they contain dots.

Use db.createIndex procedure to create an index:

CALL db.createIndex("MyIndex", ["Person"], ["name"], "native-btree-1.0", {+`spatial.cartesian.max+`: [100.0,100.0], +`spatial.cartesian.min+`: [-100.0,-100.0]})

If a settings map is not provided, the settings will be picked up from the Neo4j config file, the same way as when creating an index or constraint through Cypher.

CALL db.createIndex("MyIndex", ["Person"], ["name"], "native-btree-1.0")

Use db.createUniquePropertyConstraint to create a node property uniqueness constraint (the example is without settings map, left out for abbreviation):

CALL db.createUniquePropertyConstraint("MyIndex", ["Person"], ["name"], "native-btree-1.0", {+`spatial.cartesian.max+`: [100.0,100.0], +`spatial.cartesian.min+`: [-100.0,-100.0]})

Use db.createNodeKey to create node key constraint (the example is without settings map, left out for abbreviation):

CALL db.createNodeKey("MyIndex", ["Person"], ["name"], "native-btree-1.0", {+`spatial.cartesian.max+`: [100.0,100.0], +`spatial.cartesian.min+`: [-100.0,-100.0]})

13.2.3. Full-text indexes

Full-text indexes are powered by the Apache Lucene indexing and search library. A full-text index enables you to write queries that matches within the contents of indexed string properties. A full description on how to create and use full-text indexes is provided in the Cypher Manual → Indexes to support full-text search.

13.2.3.1. Configuration

The following options are available for configuring full-text indexes:

dbms.index.fulltext.default_analyzer

The name of the analyzer that the full-text indexes should use by default. This setting only has effect when a full-text index is created, and will be remembered as an index-specific setting from then on.

The list of possible analyzers is available through the db.index.fulltext.listAvailableAnalyzers() Cypher procedure.

Unless otherwise specified, the default analyzer is standard-no-stop-words, which is the same as the StandardAnalyzer from Lucene, except no stop-words are filtered out.

dbms.index.fulltext.eventually_consistent

Used to declare whether full-text indexes should be eventually consistent, or not. This setting only has effect when a full-text index is created, and will be remembered as an index-specific setting from then on.

Indexes are normally fully consistent, and the committing of a transaction does not return until both the store and the indexes have been updated. Eventually consistent full-text indexes, on the other hand, are not updated as part of commit, but instead have their updates queued up and applied in a background thread. This means that there can be a short delay between committing a change, and that change becoming visible via any eventually consistent full-text indexes. This delay is just an artifact of the queueing, and will usually be quite small since eventually consistent indexes are updated "as soon as possible".

By default, this is turned off, and full-text indexes are fully consistent.

dbms.index.fulltext.eventually_consistent_index_update_queue_max_length

Eventually consistent full-text indexes have their updates queued up and applied in a background thread, and this setting determines the maximum size of that update queue. If the maximum queue size is reached, then committing transactions will block and wait until there is more room in the queue, before adding more updates to it.

This setting applies to all eventually consistent full-text indexes, and they all use the same queue. The maximum queue length must be at least 1 index update, and must be no more than 50 million due to heap space usage considerations.

The default maximum queue length is 10.000 index updates.