24.4. Performance Guide

This is the Neo4j performance guide. It will give you guidance on how to tune Neo4j to achieve maximum performance.

Try this first

The first thing to look at when Neo4j is not performing as expected is how the Cypher queries are being executed. Make sure that they don’t do more work than they have to. Some queries may accidentally be written in a way that generates a large cartesian product. Other queries may have to perform expensive label scans because an important index is missing. The Chapter 15, Query Tuning chapter has more information on how to investigate Cypher performance issues.

The second thing to look at is the Java Virtual Machine process. Make sure that it has enough memory to do its work. If there is not enough memory to keep the JVM heap resident, the operating system will swap it out to storage. When garbage collection takes place, the swapped out heap memory has to be swapped back in, and something else swapped out. This is called swap-thrashing and has dramatic impact on the performance of a database, rendering it practically unusable. A well-tuned Neo4j database should not have any swap activity in its steady-state.

Assigning sufficient memory to the JVM also limits the time spent in garbage collection. The goal is to have a large enough heap to handle peak load without thrashing occurring in the garbage collector. Performance can drop as much as two orders of magnitude when GC-thrashing happens.

Start the JVM with -server flag and -Xmx<good sized heap>, for example -Xmx512m for 512 MB memory or -Xmx3g for 3 GB memory. Having the heap be too large may also hurt performance so you may have to try out some different heap sizes. Make sure that you are using a concurrent garbage collector. We find that -XX:+UseG1GC works well in most use cases.

The next thing to look at, is the file caching memory. Neo4j uses its own page cache for the store files, and relies on the operating system for caching the index files. Make sure that the dbms.pagecache.memory setting (in neo4j.properties) is large enough to fit the entire store, if possible. But also make sure that you are not allocating so much memory to the JVM and the Neo4j page cache, that there is no memory left for the operating system to cache the Lucene index files. For more information on configuration see Chapter 24, Configuration & Performance.

Configuring heap size and garbage collection

The size of the JVM heap is an important aspect of the performance of any Java application. The heap is separated into an old generation and a young generation. New objects are allocated in the young generation, and then later moved to the old generation, if they stay live (in use) for long enough. When a generation fills up, the garbage collector performs a collection, during which all other threads in the process are paused. The young generation is quick to collect since the pause time correlates with the live set of objects, and is independent of the size of the young generation. In the old generation, pause times roughly correlates with the size of the heap. For this reason, the heap should ideally be sized and tuned such that transaction and query state never makes it to the old generation.

[Note]Note

When using Neo4j Server, JVM configuration goes into the conf/neo4j-wrapper.conf file. See Section 24.2, “Server Configuration”.

In server deployments, the heap size is configured with the wrapper.java.maxmemory (in MBs) setting in the neo4j-wrapper.conf file. For embedded, the heap size is specified by giving the -Xmx???m command line flag to the java process, where the ??? is the maximum heap size in MBs. The initial size of the heap is specified by the wrapper.java.initmemory setting, or with the -Xms???m flag, or chosen heuristically by the JVM itself if left unspecified. The JVM will automatically grow the heap as needed, up to the maximum size. The growing of the heap requires a full GC cycle. If you know that you will need all the heap memory, you can set the initial heap size and the maximum heap size to the same value. This way the pause that happens when the garbage collector grows the heap can be avoided.

The ratio of the size between the old generation and the new generation of the heap is controlled by the -XX:NewRatio=N flag. N is typically between 2 and 8 by default. A ratio of 2 means that the old generation size, divided by the new generation size, is equal to 2. In other words, two thirds of the heap memory will be dedicated to the old generation. A ratio of 3 will dedicate three quarters of the heap to the old generation, and a ratio of 1 will keep the two generations about the same size. A ratio of 1 is quite aggressive, but may be necessary if your transactions changes a lot of data. Having a large new generation can also be important if you run Cypher queries that need to keep a lot of data resident, for example when sorting big result sets.

If the new generation is too small, short-lived objects may be moved to the old generation too soon. This is called premature promotion and will slow the database down by increasing the frequency of old generation GC cycles. If the new generation is too big, the garbage collector may decide that the old generation does not have enough space to fit all the objects it expects to promote from the new to the old generation. This turns new generation GC cycles into old generation GC cycles, again slowing the database down. Running more concurrent threads means that more allocations can take place in a given span of time, in turn increasing the pressure on the new generation in particular.

[Caution]Caution

The Compressed OOPs feature in the JVM allows object references to be compressed to use only 32 bits. The feature saves a lot of memory, but is not enabled for heaps larger than 32 GB. Gains from increasing the heap size beyond 32 GB can therefore be small or even negative, unless the increase is significant (64 GB or above).

Neo4j has a number of long-lived objects, that stay around in the old generation, effectively for the lifetime of the Java process. To process them efficiently, and without adversely affecting the GC pause time, we recommend using a concurrent garbage collector.

[Tip]Tip

The recommended garbage collector to use when running Neo4j in production is the G1 garbage collector. G1 is turned on by default in server deployments. For embedded deployments, it can be turned on by supplying -XX:+UseG1GC as a JVM parameter.

How to tune the specific GC algorithm depends on both the JVM version and the workload. It is recommended to test the GC settings under realistic load for days or weeks. Problems like heap fragmentation can take a long time to surface.

Disks, RAM and other tips

As with any persistence solution, performance depends a lot on the persistence media used. Better disks equals better performance.

If you have multiple disks or persistence media available it may be a good idea to divide the store files and transaction logs across those disks. Keeping the store files on disks with low seek time can do wonders for read operations. Today a typical mechanical drive has an average seek time of about 5ms. This can cause a query or traversal to be very slow when the amount of RAM assigned to the page cache is too small. A new, good SATA enabled SSD has an average seek time of less than 100 microseconds, meaning those scenarios will execute at least 50 times faster. However, this is still tens or hundreds of times slower than accessing RAM.

To avoid hitting disk you need more RAM. On a standard mechanical drive you can handle graphs with a few tens of millions of primitives (nodes, relationships and properties) with 2-3 GBs of RAM. A server with 8-16 GBs of RAM can handle graphs with hundreds of millions of primitives, and a good server with 16-64 GBs can handle billions of primitives. However, if you invest in a good SSD you will be able to handle much larger graphs on less RAM.

Use tools like dstat or vmstat to gather information when your application is running. If the swap or paging numbers are high, that is a sign that the Lucene indexes don’t quite fit in memory. In this case, queries that do index lookups will have high latencies.

When Neo4j starts up, its page cache is empty and needs to warm up. This can take a while, especially for large stores. It is not uncommon to see a long period with many blocks being read from the drive, and high IO wait times.

Neo4j also flushes its page cache in the background, so it is not uncommon to see a steady trickle of blocks being written to the drive during steady-state. This background flushing only produces a small amount of IO wait, however. If the IO wait times are high during steady-state, it may be a sign that Neo4j is bottle-necked on the random IO performance of the drive. The best drives for running Neo4j are fast SSDs that can take lots of random IOPS.

Linux filesystem tuning

Databases often produce many small and random reads when querying data, and few sequential writes when committing changes. Neo4j is no different in this regard.

By default, most Linux distributions schedule IO requests using the Completely Fair Queuing (CFQ) algorithm, which provides a good balance between throughput and latency. The particular IO workload of a database, however, is better served by the Deadline scheduler. The Deadline scheduler gives preference to read requests, and processes them as soon as possible. This tends to decrease the latency of reads, while the latency of writes goes up. Since the writes are usually sequential, their lingering in the IO queue increases the change of overlapping or adjacent write requests being merged together. This effectively reduces the number of writes that are sent to the drive.

On Linux, the IO scheduler for a drive, in this case sda, can be changed at runtime like this:

$ echo 'deadline' > /sys/block/sda/queue/scheduler
$ cat               /sys/block/sda/queue/scheduler
noop [deadline] cfq

Another recommended practice is to disable file and directory access time updates. This way, the file system won’t have to issue writes that update this meta-data, thus improving write performance. This can be accomplished by setting the noatime,nodiratime mount options in fstab, or when issuing the disk mount command.

There may be other tuning options relevant to any particular file system, but it is important to make sure that barriers are enabled. Barriers prevent certain reorderings of writes. They are important for maintaining the integrity of the transaction log, in case a power failure happens.

Setting the number of open files

Linux platforms impose an upper limit on the number of concurrent files a user may have open. This number is reported for the current user and session with the ulimit -n command:

user@localhost:~$ ulimit -n
1024

The usual default of 1024 is often not enough. This is especially true when many indexes are used or a server installation sees too many connections. Network sockets count against the limit as well. Users are therefore encouraged to increase the limit to a healthy value of 40 000 or more, depending on usage patterns. It is possible to set the limit with the ulimit command, but only for the root user, and it only affects the current session. To set the value system wide, follow the instructions for your platform.

What follows is the procedure to set the open file descriptor limit to 40 000 for user neo4j under Ubuntu 10.04 and later.

[Note]Note

If you opted to run the neo4j service as a different user, change the first field in step 2 accordingly.

  1. Become root, since all operations that follow require editing protected system files.

    user@localhost:~$ sudo su -
    Password:
    root@localhost:~$
  2. Edit /etc/security/limits.conf and add these two lines:

    neo4j   soft    nofile  40000
    neo4j   hard    nofile  40000
  3. Edit /etc/pam.d/su and uncomment or add the following line:

    session    required   pam_limits.so
  4. A restart is required for the settings to take effect.

    After the above procedure, the neo4j user will have a limit of 40 000 simultaneous open files. If you continue experiencing exceptions on Too many open files or Could not stat() directory, you may have to raise the limit further.