11.1. Memory configuration

This section describes the different aspects of Neo4j memory configuration and use.

11.1.1. Overview

Consider the image below. As you can see, the RAM of the Neo4j server has a number of usage areas, with some sub-areas:

Figure 11.1. Neo4j memory management
neo4j memory management
OS memory

Some memory must be reserved for running the processes of the operating system itself. It is not possible to explicitly configure the amount of RAM that should be reserved for the operating system, as this is what RAM remains available after configuring page cache and heap space. However, if we do not leave enough space for the OS it will start swapping to disk, which will heavily affect performance.

1GB is a good starting point for a server that is dedicated to running Neo4j. However, there are cases where the amount reserved for the OS is significantly larger than 1GB, such as servers with exceptionally large RAM.

Lucene index cache
Neo4j uses Apache Lucene for some of its indexing functionality. Index lookup performance is optimized by ensuring that as much as possible of the indexes are cached into memory. Similar to the OS memory, the Lucene index cache can not be explicitly configured. Instead we estimate the memory needed and make sure that there is enough headroom left for Lucene indexes after the page cache and heap cache have been assigned.
Page cache

The page cache is used to cache the Neo4j data and native indexes. The caching of graph data and indexes into memory will help avoid costly disk access and result in optimal performance.

The parameter for specifying how much memory Neo4j is allowed to use for the page cache is: dbms.memory.pagecache.size.

Heap size

The heap space is used for query execution, transaction state, management of the graph etc. The size needed for the heap is very dependent on the nature of the usage of Neo4j. For example, long-running queries, or very complicated queries, are likely to require a larger heap than simpler queries.

Generally speaking, in order to aid performance, we want to configure a large enough heap to sustain concurrent operations.

In case of performance issues we may have to tune our queries, and monitor their memory usage, in order to determine whether the heap needs to be increased.

The heap memory size is determined by the parameters dbms.memory.heap.initial_size and dbms.memory.heap.max_size. It is recommended to set these two parameters to the same value. This will help avoid unwanted full garbage collection pauses.

Transaction state

Transaction state is the memory that is needed to hold data and intermediate results in transactions that update records in the database. Queries that only read data do not require transaction state memory allocation. By default, transaction state is allocated from the heap, or on-heap. Note that when the transaction state is configured inside the heap, its maximum size cannot be specified.

Transaction state can also be configured to be allocated separately from the heap, or off-heap, by using the parameter dbms.tx_state.memory_allocation. When the transaction state is allocated off-heap, the maximum size of the transaction state can be defined using the parameter dbms.tx_state.max_off_heap_memory. Note that the transaction state memory is not pre-allocated; it will grow and shrink as needed by the activity in the database. Keeping transaction state off-heap is particularly beneficial to applications characterized by large, write-intensive transactions.

11.1.2. Considerations

Always use explicit configuration
In order to have good control of a system’s behavior, it is recommended that you always define the page cache and heap size parameters explicitly in neo4j.conf. If these parameters are not explicitly defined, some heuristic values will be computed at startup based on available system resources.
Initial memory recommendation
Use the neo4j-admin memrec command to get an initial recommendation for how to distribute a certain amount of memory. The values may need to be adjusted to cater for each specific use case.
Inspect the memory settings of a database

The neo4j-admin memrec --database command is useful for inspecting the current distribution of data and indexes in an existing database.

Example 11.1. Use neo4j-admin memrec to inspect the memory settings of a database

We wish to estimate the total size of the database files.

$neo4j-home> bin/neo4j-admin memrec --database=graph.db
...
...
...
# Lucene indexes: 6690m
# Data volume and native indexes: 17050m

We can see that the Lucene indexes take up approximately 6.7GB of data, and that the data volume and native indexes combined take up approximately 17GB.

Using this information, we can do a sanity check of our memory configuration:

  • Compare the value for data volume and native indexes to the value of dbms.memory.pagecache.size.
  • Compare the value for Lucene indexes to how much memory is left after assigning dbms.memory.pagecache.size and dbms.memory.heap.initial_size.

Note that even though we strive for caching as much of our data and indexes as possible, in some production systems the access to memory is limited and must be negotiated between different areas. Then there will will be a certain amount of testing and tuning to figure out the optimal division of the available memory.

The effect of index providers on memory usage
After an upgrade from an earlier version of Neo4j, it is advantageous to rebuild certain indexes in order to take advantage of new index features. For details, see Section 11.2, “Index configuration”. The rebuilding of indexes will change the distribution of memory utilization. In a database with many indexes, a significant amount of memory may have been reserved for Lucene. After the rebuild, it could be necessary to allocate some of that memory to the page cache instead. Use neo4j-admin memrec --database to inspect the database before and after rebuilding indexes.

11.1.3. Capacity planning

In many use cases, it is advantageous to try to cache as much of the data and indexes as possible. The following examples illustrate methods for estimating the page cache size, depending on whether we are already running in production or planning for a future deployment:

Example 11.2. Estimate page cache for an existing Neo4j database

First estimate the total size of data and indexes, and then multiply with some factor, for example 20%, to allow for growth.

$neo4j-home> bin/neo4j-admin memrec --database=graph.db
...
...
...
# Lucene indexes: 6690m
# Data volume and native indexes: 35050m

We can see that the data volume and native indexes combined take up approximately 35GB. In our specific use case we estimate that 20% will provide sufficient head room for growth.

dbms.memory.pagecache.size = 1.2 * (35GB) =  42GB

We configure the page cache by adding the following to neo4j.conf:

dbms.memory.pagecache.size=42GB
Example 11.3. Estimate page cache for a new Neo4j database

When planning for a future database, it is useful to run an import with a fraction of the data, and then multiply the resulting store size by that fraction plus some percentage for growth. For example, import 1/100th of the data and measure its data volume and native indexes. Then multiply that number by 120 to size up the result, and allow for 20% growth.

Assume that we have imported 1/100th of the data into a test database.

$neo4j-home> bin/neo4j-admin memrec --database=graph.db
...
...
...
# Lucene indexes: 425.0
# Data volume and native indexes: 251100k

We can see that the data volume and native indexes combined take up approximately 250MB. We size up the result and additionally reserve 20% for growth:

dbms.memory.pagecache.size = 120 * (250MB) =  30GB

We configure the page cache by adding the following to neo4j.conf:

dbms.memory.pagecache.size=30G