This chapter describes factors that affect operational performance and how to tune Neo4j for optimal throughput.
This section covers how to configure memory for a Neo4j instance. The various memory requirements and trade-offs are explained as well as the characteristics of garbage collection.
Neo4j will automatically configure default values for memory-related configuration parameters that are not explicitly defined within its configuration on startup. In doing so, it will assume that all of the RAM on the machine is available for running Neo4j.
There are three types of memory to consider: OS Memory, Page Cache and Heap Space.
Please notice that the OS memory is not explicitly configurable, but is "what is left" when done specifying page cache and heap space. If configuring page cache and heap space equal to or greater than the available RAM, or if not leaving enough head room for the OS, the OS will start swapping to disk, which will heavily affect performance. Therefore, follow this checklist:
Actual memory left for the OS = available RAM - (page cache + heap size)
Make sure that your system is configured such that it will never need to swap.
Some memory must be reserved for all activities on the server that are not Neo4j related. In addition, leave enough memory
for the operating system file buffer cache to fit the contents of the
schema directories, since it will impact index lookup performance if the indexes cannot fit in memory.
1G is a good starting point for when Neo4j is the only server running on that machine.
OS Memory = 1GB + (size of graph.db/index) + (size of graph.db/schema)
The page cache is used to cache the Neo4j data as stored on disk. Ensuring that all, or at least most, of the graph data from disk is cached into memory will help avoid costly disk access and result in optimal performance.
The parameter for specifying the page cache is:
This specifies how much memory Neo4j is allowed to use for the page cache.
To ensure that you have control over your system’s behavior, it is always recommended to define this parameter explicitly
If it is not explicitly defined, then a heuristic setting will be computed at startup based on available system resources.
The following are two methods for estimating the page cache size, depending on whether you are already running in production or planning for a future deployment:
Estimate database size
The database files are located in the data directory. You determine the total size of the database files by adding up the sizes of the files and directories as described below.
find databases/graph.db -regex '*store.db*' | du -hc | tail -1
find databases/graph.db/schema/index -regex '.*/native.*' | du -hc | tail -1
The size of the available heap memory is an important aspect for the performance of Neo4j.
Generally speaking, it is beneficial to configure a large enough heap space to sustain concurrent operations. For many setups, a heap size between 8G and 16G is large enough to run Neo4j reliably.
The heap memory size is determined by the parameters in NEO4J_HOME/conf/neo4j.conf, namely
dbms.memory.heap.max_size providing the heap size in Megabytes or with a unit, e.g.
16000 or preferably
16G. It is recommended to set these two parameters to the same value to avoid unwanted full garbage collection pauses.
The heap is separated into an old generation and a young generation. New objects are allocated in the young generation, and then later moved to the old generation, if they stay live (in use) for long enough. When a generation fills up, the garbage collector performs a collection, during which all other threads in the process are paused. The young generation is quick to collect since the pause time correlates with the live set of objects, and is independent of the size of the young generation. In the old generation, pause times roughly correlates with the size of the heap. For this reason, the heap should ideally be sized and tuned such that transaction and query state never makes it to the old generation.
The heap size is configured with the
dbms.memory.heap.max_size (in MBs) setting in the neo4j.conf file.
The initial size of the heap is specified by the
dbms.memory.heap.initial_size setting, or with the
-Xms???m flag, or chosen heuristically by the JVM itself if left unspecified.
The JVM will automatically grow the heap as needed, up to the maximum size.
The growing of the heap requires a full garbage collection cycle.
It is recommended to set the initial heap size and the maximum heap size to the same value.
This way the pause that happens when the garbage collector grows the heap can be avoided.
The ratio of the size between the old generation and the new generation of the heap is controlled by the
N is typically between 2 and 8 by default.
A ratio of 2 means that the old generation size, divided by the new generation size, is equal to 2.
In other words, two thirds of the heap memory will be dedicated to the old generation.
A ratio of 3 will dedicate three quarters of the heap to the old generation, and a ratio of 1 will keep the two generations
about the same size.
A ratio of 1 is quite aggressive, but may be necessary if your transactions changes a lot of data.
Having a large new generation can also be important if you run Cypher queries that need to keep a lot of data resident, for
example when sorting big result sets.
If the new generation is too small, short-lived objects may be moved to the old generation too soon. This is called premature promotion and will slow the database down by increasing the frequency of old generation garbage collection cycles. If the new generation is too big, the garbage collector may decide that the old generation does not have enough space to fit all the objects it expects to promote from the new to the old generation. This turns new generation garbage collection cycles into old generation garbage collection cycles, again slowing the database down. Running more concurrent threads means that more allocations can take place in a given span of time, in turn increasing the pressure on the new generation in particular.
The Compressed OOPs feature in the JVM allows object references to be compressed to use only 32 bits. The feature saves a lot of memory, but is not enabled for heaps larger than 32 GB. Gains from increasing the heap size beyond 32 GB can therefore be small or even negative, unless the increase is significant (64 GB or above).
Neo4j has a number of long-lived objects, that stay around in the old generation, effectively for the lifetime of the Java process. To process them efficiently, and without adversely affecting the garbage collection pause time, we recommend using a concurrent garbage collector.
How to tune the specific garbage collection algorithm depends on both the JVM version and the workload. It is recommended to test the garbage collection settings under realistic load for days or weeks. Problems like heap fragmentation can take a long time to surface.
To gain good performance, these are the things to look into first:
Use a concurrent garbage collector. We find that -XX:+UseG1GC works well in most use-cases.
Start the JVM with the -server flag and a good sized heap.
Edit the following properties:
initial heap size (in MB)
maximum heap size (in MB)
additional literal JVM parameter