11.2.2. Schema indexes

This section describes schema indexes.

This section describes the following:

11.2.2.1. Introduction

Neo4j uses a combination of native indexes and Apache Lucene for its indexing functionality. The native index is an implementation of the classic B+Tree.

For performance reasons, it is recommended to use native indexes whenever possible.

For more information on the different index types, refer to Cypher manual → Indexes.

11.2.2.2. Index providers

The index provider used when creating new indexes is controlled by the setting dbms.index.default_schema_provider. If not configured explicitly, dbms.index.default_schema_provider will default to use the newest provider available in that particular version of Neo4j.

The table below lists the available index providers and their support for native indexing:

Index provider Value types supported for native indexing Type of native index supported

native-btree-1.0

spatial, temporal, numeric, string, array, boolean

Single-property and composite indexes

lucene+native-2.0

spatial, temporal, numeric, string

Single-property indexes

lucene+native-1.0

spatial, temporal, numeric

Single-property indexes

lucene-1.0

spatial, temporal

Single-property indexes

Deprecated index providers

Index providers lucene-1.0, lucene+native-1.0, and lucene+native-2.0 have been deprecated, and will be removed in a future release.

The recommended index provider to use is native-btree-1.0.

The only reason to use a deprecated provider should be due to the limitations, as described in Section 11.2.2.3, “Limitations of native indexes”. There are currently no alternatives to cover these limitations, and deprecated providers will not be removed until there is.

11.2.2.3. Limitations of native indexes

Typically, the newest index provider version will provide the best performance. However, the native B+Tree implementation has some limitations which may require special handling.

Key size

The native B+Tree index has a key size limit that manifests itself in different ways depending on whether the key holds a single string, a single array, or multiple values (i.e. is the key in a composite index).

If a transaction reaches this limitation for one or more of its changes, that transaction will fail before committing any changes. If this limit is reached during index population, the resulting index will be in a failed state, thus not be usable for any queries.

The following sections describe the different key size limits in detail.

Element size calculations

It is useful to know how to calculate the size of a single value when calculating the total size of the resulting key. In some cases those entry sizes is different based on whether the entry is in an array or not.

Table 11.1. Element sizes
Type elementSizeifSingle * elementSizeifInArray **

Byte

2

1

Short

3

2

Int

5

4

Long

9

8

Float

5

4

Double

9

8

Boolean

1

1

Date

8

8

Time

12

12

LocalTime

8

8

DateTime

16

16

LocalDateTime

12

12

Duration

28

28

Period

28

28

Point (Cartesian)

28

24

Point (Cartesian 3D)

36

32

Point (WGS-84)

28

24

Point (WGS-84 3D)

36

32

String

2 + utf8StringSize ***

2 + utf8StringSize ***

Array

Nested arrays are not supported

* elementSizeifSingle denotes the size of an element if is a single entry.

** elementSizeifInArray denotes the size of an element if it is part of an array.

*** utf8StringSize is the size of the String in bytes when encoded with UTF8.

elementSizeArray is the size of an array element, and is calculated using the following formulas:

  • If the data type of the array is a numeric data type:

    elementSizeArray = 3 + ( arrayLength * elementSizeifInArray )

  • If the data type of the array is a geometry data type:

    elementSizeArray = 5 + ( arrayLength * elementSizeifInArray )

  • If the data type of the array is non-numeric:

    elementSizeArray = 2 + ( arrayLength * elementSizeifInArray )

String encoding with UTF8

It is worth noting that common characters, such as letters, digits and some symbols, translate into one byte per character. Non-Latin characters may occupy more than one byte per character. Therefore, for example, a string that contains 100 characters or less may be longer than 100 bytes if it contains multi-byte characters.

More specifically, the relevant length in bytes of a string is when encoded with UTF8.

Example 11.4. Calculate the size of a string when used in an index

Consider the string Sju sjösjuka sjömän sköttes av sju undersköna sjuksköterskor på skeppet Shang Hai.

This string has 74 characters that each occupies one Byte. Additionally, there are 7 characters that each occupy two bytes per character, which add 14 to the total. Therefore, the size of the String in bytes when encoded with UTF8, utf8StringSize, is 88 bytes.

If this string is part of a native index, we get:

elementSize = 2 + utf8StringSize = 2 + 88 = 90 bytes

Example 11.5. Calculate the size of an array when used in an index

Consider the array [19, 84, 20, 11, 54, 9, 59, 76, 82, 27, 9, 35, 56, 80, 65, 95, 16, 91, 61, 11].

This array has 20 elements of the type Int. Since they are in an array, we need to use elementSizeifInArray, which is 4 for Int.

Applying the formula for arrays of numeric data types, we get:

elementSizeArray = 3 + ( arrayLength * elementSizeifInArray ) = 3 + ( 20 * 4 ) = 83 bytes

Non-composite indexes

The only way that a non-composite index can violate the size limit is if the value is a long string or a large array.

Strings

Applicable to index providers: native-btree-1.0 and lucene+native-2.0.

Please see Deprecated index providers for information on deprecation of lucene-based index providers.

Strings in non-composite native B+Tree indexes have a key size limit of 4036 bytes for index provider native-btree-1.0, and 4039 bytes for index provider lucene+native-2.0.

Arrays

Applicable to index provider: native-btree-1.0.

The following formula is used for arrays in non-composite indexes:

1 + elementSizeArray =< 4039

Here elementSizeArray is the number calculated from Table 11.1, “Element sizes”.

If we count backwards, we can get the exact array length limit for each data type:

  • maxArrayLength = FLOOR( ( 4039 - 3 - 1 ) / elementSizeifInArray ) for numeric types.
  • maxArrayLength = FLOOR( ( 4039 - 2 - 1 ) / elementSizeifInArray ) for non-numeric types.

These calculations result in the table below:

Table 11.2. Maximum array length, per data type
Data type maxArrayLength

Byte

4035

Short

2017

Int

1008

Long

504

Float

1008

Double

504

Boolean

4036

String

See Table 11.3, “Maximum array length, examples for strings”

Date

504

Time

336

LocalTime

504

DateTime

252

LocalDateTime

336

Duration

144

Period

144

Point (Cartesian)

168

Point (Cartesian 3D)

126

Point (WGS-84)

168

Point (WGS-84 3D)

126

Note that in most cases, Cypher will use Long or Double when working with numbers.

Properties with the type of String are a special case because they are dynamically sized. The table below shows the maximum number of array elements in an array, based on certain string sizes:

Table 11.3. Maximum array length, examples for strings
String size maxArrayLength

1

1345

10

336

100

39

1000

4

The table can be used as a reference point. For example: if we know that all the strings in an array occupy 100 bytes or less, then arrays of length 39 or lower will definitely fit.

Composite indexes

Applicable to index provider: native-btree-1.0.

This limitation only applies if one or more of the following criteria is met:

  • Composite index contains strings
  • Composite index contains arrays
  • Composite index targets many different properties (>50)

We denote a targeted property of a composite index a slot, and the number of slots numberOfSlots. For example, an index on :Person(firstName, surName, age) has three properties and thus numberOfSlots = 3.

In the index, each slot is filled by an element. In order to calculate the size of the index, we must have the size of each element in the index, i.e. the elementSize, as calculated in previous sections.

The following equation can be used to verify that a specific composite index entry is within bounds:

numberOfSlots + sum( elementSize ) =< 4039

Here, sum( elementSize ) is the sum of the sizes of all the elements of the composite key as defined in the section called “Element size calculations”, and numberOfSlots is the number of targeted properties for the index.

Example 11.6. The size of a composite index containing strings

Consider a composite index of five strings that each can occupy the maximum of 500 bytes.

Using the equation above we get:

numberOfSlots + sum( elementSize ) = 5 + ( 5 * ( 2 + 500 ) ) = 2515 < 4039

We are well within bounds for our composite index.

Example 11.7. The size of an index containing arrays

Consider a composite index of five arrays of type Float that each have a length of 250.

First we calculate the size of each array element:

elementSizeArray = 3 + ( arrayLength * elementSizeifInArray ) = 3 + ( 250 * 4 ) = 1003

Then we calculate the size of the composite index:

numberOfSlots + sum( elementSizeArray ) = 5 + ( 5 * 1003 ) = 5020 > 4039

This index key will exceed the key size limit for native indexes.

To work around this, it is possible to create the index using the lucene+native-2.0 index provider, as described in the section called “Workarounds to address limitations”, but please note that this index provider has been deprecated.

Queries using CONTAINS and ENDS WITH

Applicable to index providers: native-btree-1.0 and lucene+native-2.0.

Please see Deprecated index providers for information on deprecation of lucene-based index providers.

Native B+Tree indexes have limited support for ENDS WITH and CONTAINS queries. These queries will not be able to do an optimized search the way they do for queries that use STARTS WITH, = and <>. Instead, the index result will be a stream of an index scan with filtering.

For details about execution plans, refer to Cypher Manual → Execution plans. For details about string operators, refer to Cypher Manual → Operators.

Workarounds to address limitations

If any of the limitations described in this section becomes a problem, a workaround is to specify an index provider that uses Lucene for that particular index. This can be done using either of the following methods:

Option 1; change the config
  1. Configure the setting dbms.index.default_schema_provider to the one required.
  2. Restart Neo4j.
  3. Drop and recreate the relevant index.
  4. Change dbms.index.default_schema_provider back to the original value.
  5. Restart Neo4j.
Option 2; use a built-in procedure
There are built-in procedures that can be used to specify index provider on index creation, unique property constraint creation, and node key creation (for details on constraints, see Cypher manual → Constraints. For more information, see Built-in procedures.

11.2.2.4. Limitations of Lucene-backed indexes

Applicable to index providers: lucene+native-1.0 and lucene-1.0

Please see Deprecated index providers for information on deprecation of lucene-based index providers.

Lucene has a string size limit of 32766 bytes when string is encoded using UTF-8. In a composite index, this limit is applicable to each individual property. This means that a composite index key can hold values that together are larger than 32766 bytes but no single value can be larger.

11.2.2.5. Upgrade considerations

When creating an index, the current index provider will be assigned to it and will remain the provider for that index until it is dropped. Therefore, when upgrading to newer versions of Neo4j, an existing index needs to be dropped and recreated in order to take advantage of improved indexing features.

The caching of indexes takes place in different memory areas for different index providers. See Section 11.1, “Memory configuration”. It can be useful to run neo4j-admin memrec --database before and after the rebuilding of indexes, and adjust memory settings in accordance with the findings.