Appendix D. Index key calculations

This appendix describes how to calculate key sizes for native indexes.

This section describes the following:

As described in the section about Schema indexes, there are limitations to how large the key size can be when using native indexes. This appendix describes in detail how the sizes can be calculated.

Element size calculations

It is useful to know how to calculate the size of a single value when calculating the total size of the resulting key. In some cases those entry sizes is different based on whether the entry is in an array or not.

Table D.1. Element sizes
Type `elementSizeifSingle` * `elementSizeifInArray` **

`Byte`

2

1

`Short`

3

2

`Int`

5

4

`Long`

9

8

`Float`

5

4

`Double`

9

8

`Boolean`

1

1

`Date`

8

8

`Time`

12

12

`LocalTime`

8

8

`DateTime`

16

16

`LocalDateTime`

12

12

`Duration`

28

28

`Period`

28

28

`Point (Cartesian)`

28

24

`Point (Cartesian 3D)`

36

32

`Point (WGS-84)`

28

24

`Point (WGS-84 3D)`

36

32

`String`

`2 + utf8StringSize` ***

`2 + utf8StringSize` ***

`Array`

Nested arrays are not supported

* `elementSizeifSingle` denotes the size of an element if is a single entry.

** `elementSizeifInArray` denotes the size of an element if it is part of an array.

*** `utf8StringSize` is the size of the `String` in bytes when encoded with UTF8.

`elementSizeArray` is the size of an array element, and is calculated using the following formulas:

• If the data type of the array is a numeric data type:

`elementSizeArray = 3 + ( arrayLength * elementSizeifInArray )`

• If the data type of the array is a geometry data type:

`elementSizeArray = 5 + ( arrayLength * elementSizeifInArray )`

• If the data type of the array is non-numeric:

`elementSizeArray = 2 + ( arrayLength * elementSizeifInArray )`

String encoding with UTF8 It is worth noting that common characters, such as letters, digits and some symbols, translate into one byte per character. Non-Latin characters may occupy more than one byte per character. Therefore, for example, a string that contains 100 characters or less may be longer than 100 bytes if it contains multi-byte characters. More specifically, the relevant length in bytes of a string is when encoded with UTF8.
Example D.1. Calculate the size of a string when used in an index

Consider the string `His name was Måns Lööv`.

This string has 19 characters that each occupies one byte. Additionally, there are three characters that each occupy two bytes per character, which add six to the total. Therefore, the size of the `String` in bytes when encoded with UTF8, utf8StringSize, is 25 bytes.

If this string is part of a native index, we get:

`elementSize = 2 + utf8StringSize = 2 + 25 = 27 bytes`

Example D.2. Calculate the size of an array when used in an index

Consider the array [19, 84, 20, 11, 54, 9, 59, 76, 82, 27, 9, 35, 56, 80, 65, 95, 16, 91, 61, 11].

This array has 20 elements of the type `Int`. Since they are in an array, we need to use `elementSizeifInArray`, which is `4` for `Int`.

Applying the formula for arrays of numeric data types, we get:

`elementSizeArray = 3 + ( arrayLength * elementSizeifInArray ) = 3 + ( 20 * 4 ) = 83 bytes`

Non-composite indexes

The only way that a non-composite index can violate the size limit is if the value is a long string or a large array.

Strings

Strings in non-composite indexes have a key size limit of 4036 bytes.

Arrays

The following formula is used for arrays in non-composite indexes:

`1 + elementSizeArray =< 4039`

Here `elementSizeArray` is the number calculated from Table D.1, “Element sizes”.

If we count backwards, we can get the exact array length limit for each data type:

• `maxArrayLength = FLOOR( ( 4039 - 3 - 1 ) / elementSizeifInArray )` for numeric types.
• `maxArrayLength = FLOOR( ( 4039 - 2 - 1 ) / elementSizeifInArray )` for non-numeric types.

These calculations result in the table below:

Table D.2. Maximum array length, per data type
Data type `maxArrayLength`

`Byte`

4035

`Short`

2017

`Int`

1008

`Long`

504

`Float`

1008

`Double`

504

`Boolean`

4036

`String`

`Date`

504

`Time`

336

`LocalTime`

504

`DateTime`

252

`LocalDateTime`

336

`Duration`

144

`Period`

144

`Point (Cartesian)`

168

`Point (Cartesian 3D)`

126

`Point (WGS-84)`

168

`Point (WGS-84 3D)`

126

Note that in most cases, Cypher will use `Long` or `Double` when working with numbers.

Properties with the type of `String` are a special case because they are dynamically sized. The table below shows the maximum number of array elements in an array, based on certain string sizes:

Table D.3. Maximum array length, examples for strings
String size `maxArrayLength`

1

1345

10

336

100

39

1000

4

The table can be used as a reference point. For example: if we know that all the strings in an array occupy 100 bytes or less, then arrays of length 39 or lower will definitely fit.

Composite indexes

This limitation only applies if one or more of the following criteria is met:

• Composite index contains strings
• Composite index contains arrays
• Composite index targets many different properties (>50)

We denote a targeted property of a composite index a `slot`, and the number of slots `numberOfSlots`. For example, an index on `:Person(firstName, surName, age)` has three properties and thus `numberOfSlots = 3`.

In the index, each slot is filled by an element. In order to calculate the size of the index, we must have the size of each element in the index, i.e. the `elementSize`, as calculated in previous sections.

The following equation can be used to verify that a specific composite index entry is within bounds:

`numberOfSlots + sum( elementSize ) =< 4039`

Here, `sum( elementSize )` is the sum of the sizes of all the elements of the composite key as defined in the section called “Element size calculations”, and `numberOfSlots` is the number of targeted properties for the index.

Example D.3. The size of a composite index containing strings

Consider a composite index of five strings that each can occupy the maximum of 500 bytes.

Using the equation above we get:

`numberOfSlots + sum( elementSize ) = 5 + ( 5 * ( 2 + 500 ) ) = 2515 < 4039`

We are well within bounds for our composite index.

Example D.4. The size of an index containing arrays

Consider a composite index of five arrays of type `Float` that each have a length of 250.

First we calculate the size of each array element:

`elementSizeArray = 3 + ( arrayLength * elementSizeifInArray ) = 3 + ( 250 * 4 ) = 1003`

Then we calculate the size of the composite index:

`numberOfSlots + sum( elementSizeArray ) = 5 + ( 5 * 1003 ) = 5020 > 4039`

This index key will exceed the key size limit for native indexes.

To work around this, it is possible to create the index using the `lucene+native-2.0` index provider, as described in the section called “Workarounds to address limitations”, but please note that this index provider has been deprecated.