Chapter 9. Performance

This chapter describes factors that affect operational performance and how to tune Neo4j for optimal throughput.

9.1. Memory tuning

This section covers how to configure memory for a Neo4j instance. The various memory requirements and trade-offs are explained as well as the characteristics of garbage collection.

Neo4j will automatically configure default values for memory-related configuration parameters that are not explicitly defined within its configuration on startup. In doing so, it will assume that all of the RAM on the machine is available for running Neo4j.

There are three types of memory to consider: OS Memory, Page Cache and Heap Space.

Please notice that the OS memory is not explicitly configurable, but is "what is left" when done specifying page cache and heap space. If configuring page cache and heap space equal to or greater than the available RAM, or if not leaving enough head room for the OS, the OS will start swapping to disk, which will heavily affect performance. Therefore, follow this checklist:

  1. Plan OS memory sizing
  2. Plan page cache sizing
  3. Plan heap sizing
  4. Do the sanity check:

Actual OS allocation = available RAM - (page cache + heap size)

Make sure that your system is configured such that it will never need to swap.

9.1.1. OS memory sizing

Some memory must be reserved for all activities on the server that are not Neo4j related. In addition, leave enough memory for the operating system file buffer cache to fit the contents of the index and schema directories, since it will impact index lookup performance if the indexes cannot fit in memory. 1G is a good starting point for when Neo4j is the only server running on that machine.

OS Memory = 1GB + (size of graph.db/index) + (size of graph.db/schema)

9.1.2. Page cache sizing

The page cache is used to cache the Neo4j data as stored on disk. Ensuring that all, or at least most, of the graph data from disk is cached into memory will help avoid costly disk access and result in optimal performance. You can determine the total memory needed for the page cache by summing up the sizes of the NEO4J_HOME/data/databases/graph.db/*store.db* files and adding 20% for growth.

The parameter for specifying the page cache is: dbms.memory.pagecache.size. This specifies how much memory Neo4j is allowed to use for the page cache. To ensure that you have control over your system’s behavior, it is always recommended to define this parameter explicitly in neo4j.conf. If it is not explicitly defined, then a heuristic setting will be computed at startup based on available system resources.

The following are two possible methods for estimating the page cache size:

  1. For an existing Neo4j database, sum up the size of all the store.db files in your store file directory, to figure out how big a page cache you need to fit all your data. Add another 20% for growth. For instance, on a posix system you can look at the total of running $ du -hc *store.db* in the data/databases/graph.db directory.
  2. For a new Neo4j database, it is useful to run an import with a fraction (e.g. 1/100th) of the data and then multiply the resulting store-size by that fraction (x 100). Add another 20% for growth. For example: import 1/100th of the data and sum up the sizes of the resulting database files. Then multiply by 120 for a total estimate of the database size, including 20% for growth.

9.1.3. Heap sizing

The size of the available heap memory is an important aspect for the performance of Neo4j.

Generally speaking, it is beneficial to configure a large enough heap space to sustain concurrent operations. For many setups, a heap size between 8G and 16G is large enough to run Neo4j reliably.

The heap memory size is determined by the parameters in NEO4J_HOME/conf/neo4j.conf, namely dbms.memory.heap.initial_size and dbms.memory.heap.max_size providing the heap size in Megabytes or with a unit, e.g. 16000 or preferably 16G. It is recommended to set these two parameters to the same value to avoid unwanted full garbage collection pauses.

9.1.4. Tuning of the garbage collector

The heap is separated into an old generation and a young generation. New objects are allocated in the young generation, and then later moved to the old generation, if they stay live (in use) for long enough. When a generation fills up, the garbage collector performs a collection, during which all other threads in the process are paused. The young generation is quick to collect since the pause time correlates with the live set of objects, and is independent of the size of the young generation. In the old generation, pause times roughly correlates with the size of the heap. For this reason, the heap should ideally be sized and tuned such that transaction and query state never makes it to the old generation.

The heap size is configured with the dbms.memory.heap.max_size (in MBs) setting in the neo4j.conf file. The initial size of the heap is specified by the dbms.memory.heap.initial_size setting, or with the -Xms???m flag, or chosen heuristically by the JVM itself if left unspecified. The JVM will automatically grow the heap as needed, up to the maximum size. The growing of the heap requires a full garbage collection cycle. It is recommended to set the initial heap size and the maximum heap size to the same value. This way the pause that happens when the garbage collector grows the heap can be avoided.

The ratio of the size between the old generation and the new generation of the heap is controlled by the -XX:NewRatio=N flag. N is typically between 2 and 8 by default. A ratio of 2 means that the old generation size, divided by the new generation size, is equal to 2. In other words, two thirds of the heap memory will be dedicated to the old generation. A ratio of 3 will dedicate three quarters of the heap to the old generation, and a ratio of 1 will keep the two generations about the same size. A ratio of 1 is quite aggressive, but may be necessary if your transactions changes a lot of data. Having a large new generation can also be important if you run Cypher queries that need to keep a lot of data resident, for example when sorting big result sets.

If the new generation is too small, short-lived objects may be moved to the old generation too soon. This is called premature promotion and will slow the database down by increasing the frequency of old generation garbage collection cycles. If the new generation is too big, the garbage collector may decide that the old generation does not have enough space to fit all the objects it expects to promote from the new to the old generation. This turns new generation garbage collection cycles into old generation garbage collection cycles, again slowing the database down. Running more concurrent threads means that more allocations can take place in a given span of time, in turn increasing the pressure on the new generation in particular.

The Compressed OOPs feature in the JVM allows object references to be compressed to use only 32 bits. The feature saves a lot of memory, but is not enabled for heaps larger than 32 GB. Gains from increasing the heap size beyond 32 GB can therefore be small or even negative, unless the increase is significant (64 GB or above).

Neo4j has a number of long-lived objects, that stay around in the old generation, effectively for the lifetime of the Java process. To process them efficiently, and without adversely affecting the garbage collection pause time, we recommend using a concurrent garbage collector.

How to tune the specific garbage collection algorithm depends on both the JVM version and the workload. It is recommended to test the garbage collection settings under realistic load for days or weeks. Problems like heap fragmentation can take a long time to surface.

To gain good performance, these are the things to look into first:

  • Make sure the JVM is not spending too much time performing garbage collection. The goal is to have a large enough heap to make sure that heavy/peak load will not result in so called GC-trashing. Performance can drop as much as two orders of magnitude when GC-trashing happens. Having too large heap may also hurt performance so you may have to try some different heap sizes.
  • Use a concurrent garbage collector. We find that -XX:+UseG1GC works well in most use-cases.

    • The Neo4j JVM needs enough heap memory for the transaction state and query processing, plus some head-room for the garbage collector. Because the heap memory needs are so workload dependent, it is common to see configurations from 1 GB, up to 32 GBs of heap memory.
  • Start the JVM with the -server flag and a good sized heap.

    • The operating system on a dedicated server can usually make do with 1 to 2 GBs of memory, but the more physical memory the machine has, the more memory the operating system will need.

Edit the following properties:

Table 9.1. neo4j.conf JVM tuning properties
Property Name Meaning


initial heap size (in MB)


maximum heap size (in MB)


additional literal JVM parameter