13.1. Memory configuration

This section describes the different aspects of Neo4j memory configuration and use.

13.1.1. Overview

Consider the image below. As you can see, the RAM of the Neo4j server has a number of usage areas, with some sub-areas:

Figure 13.1. Neo4j memory management
neo4j memory management
OS memory

Some memory must be reserved for running the processes of the operating system itself. It is not possible to explicitly configure the amount of RAM that should be reserved for the operating system, as this is what RAM remains available after configuring page cache and heap space. However, if we do not leave enough space for the OS it will start swapping to disk, which will heavily affect performance.

1GB is a good starting point for a server that is dedicated to running Neo4j. However, there are cases where the amount reserved for the OS is significantly larger than 1GB, such as servers with exceptionally large RAM.

Lucene index cache
Neo4j uses Apache Lucene for some of its indexing functionality. Index lookup performance is optimized by ensuring that as much as possible of the indexes are cached into memory. Similar to the OS memory, the Lucene index cache can not be explicitly configured. Instead we estimate the memory needed and make sure that there is enough headroom left for Lucene indexes after the page cache and heap cache have been assigned.
Page cache

The page cache is used to cache the Neo4j data and native indexes. The caching of graph data and indexes into memory will help avoid costly disk access and result in optimal performance.

The parameter for specifying how much memory Neo4j is allowed to use for the page cache is: dbms.memory.pagecache.size.

Heap size

The heap space is used for query execution, transaction state, management of the graph etc. The size needed for the heap is very dependent on the nature of the usage of Neo4j. For example, long-running queries, or very complicated queries, are likely to require a larger heap than simpler queries.

Generally speaking, in order to aid performance, we want to configure a large enough heap to sustain concurrent operations.

In case of performance issues we may have to tune our queries, and monitor their memory usage, in order to determine whether the heap needs to be increased.

The heap memory size is determined by the parameters dbms.memory.heap.initial_size and dbms.memory.heap.max_size. It is recommended to set these two parameters to the same value. This will help avoid unwanted full garbage collection pauses.

Transaction state

Transaction state is the memory that is needed to hold data and intermediate results in transactions that update records in the database. Queries that only read data do not require transaction state memory allocation. By default, transaction state is allocated off-heap. When the transaction state is allocated off-heap, the maximum size of the transaction state can be defined using the parameter dbms.tx_state.max_off_heap_memory. Note that the transaction state memory is not pre-allocated; it will grow and shrink as needed by the activity in the database. Keeping transaction state off-heap is particularly beneficial to applications characterized by large, write-intensive transactions.

Transaction state can also be configured to be allocated on-heap, by using the parameter dbms.tx_state.memory_allocation. Note that when the transaction state is configured on-heap, its maximum size cannot be specified.

13.1.2. Considerations

Always use explicit configuration
In order to have good control of a system’s behavior, it is recommended that you always define the page cache and heap size parameters explicitly in neo4j.conf. If these parameters are not explicitly defined, some heuristic values will be computed at startup based on available system resources.
Initial memory recommendation
Use the neo4j-admin memrec command to get an initial recommendation for how to distribute a certain amount of memory. The values may need to be adjusted to cater for each specific use case.
Inspect the memory settings of a database

The neo4j-admin memrec --database command is useful for inspecting the current distribution of data and indexes in an existing database.

Example 13.1. Use neo4j-admin memrec to inspect the memory settings of a database

We wish to estimate the total size of the database files.

$neo4j-home> bin/neo4j-admin memrec --database=neo4j
# Lucene indexes: 6690m
# Data volume and native indexes: 17050m

We can see that the Lucene indexes take up approximately 6.7GB of data, and that the data volume and native indexes combined take up approximately 17GB.

Using this information, we can do a sanity check of our memory configuration:

  • Compare the value for data volume and native indexes to the value of dbms.memory.pagecache.size.
  • For cases when off-heap transaction state is used, estimate transactional workload and how much memory is left to the value of dbms.tx_state.max_off_heap_memory.
  • Compare the value for Lucene indexes to how much memory is left after assigning dbms.memory.pagecache.size and dbms.memory.heap.initial_size.

Note that even though we strive for caching as much of our data and indexes as possible, in some production systems the access to memory is limited and must be negotiated between different areas. Then there will will be a certain amount of testing and tuning to figure out the optimal division of the available memory.

13.1.3. Capacity planning

In many use cases, it is advantageous to try to cache as much of the data and indexes as possible. The following examples illustrate methods for estimating the page cache size, depending on whether we are already running in production or planning for a future deployment:

Example 13.2. Estimate page cache for an existing Neo4j database

First estimate the total size of data and indexes, and then multiply with some factor, for example 20%, to allow for growth.

$neo4j-home> bin/neo4j-admin memrec --database=neo4j
# Lucene indexes: 6690m
# Data volume and native indexes: 35050m

We can see that the data volume and native indexes combined take up approximately 35GB. In our specific use case we estimate that 20% will provide sufficient head room for growth.

dbms.memory.pagecache.size = 1.2 * (35GB) =  42GB

We configure the page cache by adding the following to neo4j.conf:

Example 13.3. Estimate page cache for a new Neo4j database

When planning for a future database, it is useful to run an import with a fraction of the data, and then multiply the resulting store size by that fraction plus some percentage for growth. For example, import 1/100th of the data and measure its data volume and native indexes. Then multiply that number by 120 to size up the result, and allow for 20% growth.

Assume that we have imported 1/100th of the data into a test database.

$neo4j-home> bin/neo4j-admin memrec --database=neo4j
# Lucene indexes: 425.0
# Data volume and native indexes: 251100k

We can see that the data volume and native indexes combined take up approximately 250MB. We size up the result and additionally reserve 20% for growth:

dbms.memory.pagecache.size = 120 * (250MB) =  30GB

We configure the page cache by adding the following to neo4j.conf:


13.1.4. Configure query heap usage

When running a Cypher query, Neo4j will utilize the heap internally for storing results. It may be difficult to predict how much memory a query needs and if a query ends up using too much memory it could severely hamper the overall performance of the database.

There are two settings that can be enabled in neo4j.conf that can help improve the heap utilization of Neo4j:


If the former setting is enabled, Neo4j will be tracking the heap utilization of all Cypher queries. You can view the utilization of running queries by calling:

CALL dbms.listQueries()

Or alternatively, you can enable dbms.logs.query.allocation_logging_enabled and the memory usage of each query will be logged in the query log.

By setting the latter configuration, cypher.query_max_allocations.size, you can limit the amount of memory each query can use. Whenever that limit is reached, the query will be gracefully terminated without affecting the overall health of the database.

The heap-usage of query is only an estimate and the actual heap utilization might be slightly bigger or slightly smaller than the estimated value.