Memory configuration

This section describes the different aspects of Neo4j memory configuration and use.

1. Overview

The RAM of the Neo4j server has a number of usage areas, with some sub-areas:

neo4j memory management
Figure 1. Neo4j memory management
OS memory

Some memory must be reserved for running the processes of the operating system itself. It is not possible to explicitly configure the amount of RAM that should be reserved for the operating system, as this is what RAM remains available after configuring Neo4j. If you do not leave enough space for the OS, it will start to swap memory to disk, which will heavily affect performance.

1GB is a good starting point for a server that is dedicated to running Neo4j. However, there are cases where the amount reserved for the OS is significantly larger than 1GB, such as servers with exceptionally large RAM.

JVM Heap

The JVM has a heap that is the runtime data area from which memory for all class instances and arrays is allocated. Heap storage for objects is reclaimed by an automatic storage management system (known as a garbage collector or GC).

The heap memory size is determined by the parameters dbms.memory.heap.initial_size and dbms.memory.heap.max_size. It is recommended to set these two parameters to the same value to avoid unwanted full garbage collection pauses.

Generally, to aid performance, you should configure a large enough heap to sustain concurrent operations.

Native memory

Native memory, sometimes referred to as off-heap memory, is memory directly allocated by Neo4j from the OS. This memory will grow dynamically as needed and is not subject to the garbage collector.

DBMS

The database management system, or DBMS, contains the global components of the Neo4j instance. For example, the bolt server, logging service, monitoring service, etc.

Database

Each database in the system comes with an overhead. In deployments with multiple databases, this overhead needs to be accounted for.

Transaction

When executing a transaction, Neo4j holds not yet committed data, the result, and intermediate states of the queries in memory. The size needed for this is very dependent on the nature of the usage of Neo4j. For example, long-running queries, or very complicated queries, are likely to require more memory. Some parts of the transactions can optionally be placed off-heap, but for the best performance, it is recommended to keep the default with everything on-heap.

This memory group can be limited with the setting dbms.memory.transaction.global_max_size.

Page cache

The page cache is used to cache the Neo4j data stored on disk. The caching of graph data and indexes into memory helps avoid costly disk access and result in optimal performance.

The parameter for specifying how much memory Neo4j is allowed to use for the page cache is: dbms.memory.pagecache.size.

Network buffers

Direct buffers are used by Neo4j to send and receive data. Direct byte buffers are important for improving performance because they allow native code and Java code to share data without copying it. However, they are expensive to create, which means byte buffers are usually reused once they are created.

Other shared buffers

This includes unspecified shared direct buffers.

JVM overhead

The JVM will require some memory to function correctly. For example, this can be:

  • Thread stacks – Each thread has its own call stack. The stack stores primitive local variables and object references along with the call stack (list of method invocations) itself. The stack is cleaned up as stack frames move out of context, so there is no GC performed here.

  • Metaspace – Metaspace stores the java class definitions and some other metadata.

  • Code cache – The JIT compiler stores the native code it generates in the code cache to improve performance by reusing it.

For more details and means of limiting the memory used by the JVM please consult your JVM documentation.

2. Considerations

Always use explicit configuration

To have good control of the system behavior, it is recommended to always define the page cache and heap size parameters explicitly in neo4j.conf. Otherwise, Neo4j computes some heuristic values at startup based on the available system resources.

Initial memory recommendation

Use the neo4j-admin memrec command to get an initial recommendation for how to distribute a certain amount of memory. The values may need to be adjusted to cater for each specific use case.

Inspect the memory settings of all databases in a DBMS

The neo4j-admin memrec command is useful for inspecting the current distribution of data and indexes.

Example 1. Use neo4j-admin memrec to inspect the memory settings of all your databases

Estimate the total size of the database files.

$neo4j-home> bin/neo4j-admin memrec
...
...
...
# Total size of lucene indexes in all databases: 6690m
# Total size of data and native indexes in all databases: 17050m

You can see that the Lucene indexes take up approximately 6.7GB of data, and that the data volume and native indexes combined take up approximately 17GB.

Using this information, you can do a sanity check of your memory configuration:

  • Compare the value for data volume and native indexes to the value of dbms.memory.pagecache.size.

  • For cases when off-heap transaction state is used, estimate transactional workload and how much memory is left to the value of dbms.tx_state.max_off_heap_memory.

  • Compare the value for Lucene indexes to how much memory is left after assigning dbms.memory.pagecache.size and dbms.memory.heap.initial_size.

In some production systems the access to memory is limited and must be negotiated between different areas. Therefore, it is recommended to perform a certain amount of testing and tuning of these settings to figure out the optimal division of the available memory.

3. Capacity planning

In many use cases, it is advantageous to try to cache as much of the data and indexes as possible. The following examples illustrate methods for estimating the page cache size, depending on whether you are already running in production or planning for a future deployment:

Example 2. Estimate page cache for the existing Neo4j databases

First, estimate the total size of data and indexes, and then multiply with some factor, for example 20%, to allow for growth.

$neo4j-home> bin/neo4j-admin memrec
...
...
...
# Total size of lucene indexes in all databases: 6690m
# Total size of data and native indexes in all databases: 35050m

You can see that the data volume and native indexes combined take up approximately 35GB. In your specific use case, you estimate that 20% will provide sufficient head room for growth.

dbms.memory.pagecache.size = 1.2 * (35GB) =  42GB

You configure the page cache by adding the following to neo4j.conf:

dbms.memory.pagecache.size=42GB
Example 3. Estimate page cache for a new Neo4j database

When planning for a future database, it is useful to run an import with a fraction of the data, and then multiply the resulting store size delta by that fraction plus some percentage for growth.

  1. Run the memrec command to see the total size of the data and indexes in all current databases.

    $neo4j-home> bin/neo4j-admin memrec
    ...
    ...
    ...
    # Total size of lucene indexes in all databases: 6690m
    # Total size of data and native indexes in all databases: 35050m
  2. Import 1/100th of the data and again measure the data volume and native indexes of all databases.

    $neo4j-home> bin/neo4j-admin memrec
    ...
    ...
    ...
    # Total size of lucene indexes in all databases: 6690m
    # Total size of data and native indexes in all databases: 35400m

    You can see that the data volume and native indexes combined take up approximately 35.4GB.

  3. Multiply the resulting store size delta by that fraction.

    35.4GB - 35GB = 0.4GB * 100 = 40GB

  4. Multiply that number by 1.2 to size up the result, and allow for 20% growth.

    dbms.memory.pagecache.size = 1.2 * (40GB) = 48GB

  5. Configure the page cache by adding the following to neo4j.conf:

    dbms.memory.pagecache.size=48G

4. Limit transaction memory usage

By using the dbms.memory.transaction.global_max_size setting you can configure a global maximum memory usage for all of the transactions running on the server. This setting must be configured low enough so that you do not run out of memory. If you are experiencing OutOfMemory messages during high transaction load, try to lower this limit.

Neo4j also offers the following settings to provide fairness, which can help improve stability in multi-tenant deployments.

  • The setting dbms.memory.transaction.database_max_size limits the transaction memory usage per database.

  • The setting dbms.memory.transaction.max_size constrains each transaction.

When any of the limits are reached, the transaction is terminated without affecting the overall health of the database.

To help configure these settings you can use the following commands to list the current usage:

CALL dbms.listPools()
CALL dbms.listTransactions()
CALL dbms.listQueries()

Or alternatively, you can enable dbms.logs.query.allocation_logging_enabled and monitor the memory usage of each query in the query.log.

The heap-usage is only an estimate and the actual heap utilization might be slightly bigger or slightly smaller than the estimated value.