Chapter 6. Performance

Table of Contents

This chapter describes some of the internal workings of Neo4j memory settings and how to adjust them for optimal performance.

6.1. Modifying configuration settings

6.2. Cypher tuning

The first thing to look at when Neo4j is not performing as expected is how the Cypher queries are being executed. Make sure that they don’t do more work than they have to. Some queries may accidentally be written in a way that generates a large cartesian product. Other queries may have to perform expensive label scans because an important index is missing. The Neo4j developer manual has more information on how to investigate Cypher performance issues.

6.3. Memory tuning

Neo4j will automatically configure default values for memory-related configuration parameters that are not explicitly defined within its configuration on startup. In doing so, it will assume that all of the RAM on the machine is available for running Neo4j.

There are three types of memory to consider: OS Memory, Page Cache and Heap Space.

Please notice that the OS memory is not explicitly configurable, but is "what is left" when done specifying page cache and heap space. If configuring page cache and heap space equal to or greater than the available RAM, or if not leaving enough head room for the OS, the OS will start swapping to disk, which will heavily affect performance. Therefore, follow this checklist:

  1. Plan OS memory sizing
  2. Plan page cache sizing
  3. Plan heap sizing
  4. Do the sanity check:

Actual OS allocation = available RAM - (page cache + heap size)

Make sure that your system is configured such that it will never need to swap.

6.3.1. OS memory sizing

Some memory must be reserved for all activities on the server that are not Neo4j related. In addition, leave enough memory for the operating system file buffer cache to fit the contents of the index and schema directories, since it will impact index lookup performance if the indexes cannot fit in memory. 1G is a good starting point for when Neo4j is the only server running on that machine.

OS Memory = 1GB + (size of graph.db/index) + (size of graph.db/schema)

6.3.2. Page cache sizing

The page cache is used to cache the Neo4j data as stored on disk. Ensuring that all, or at least most, of the graph data from disk is cached into memory will help avoid costly disk access and result in optimal performance. You can determine the total memory needed for the page cache by summing up the sizes of the NEO4J_HOME/data/databases/graph.db/*store.db* files and adding 20% for growth.

The parameter for specifyig the page cache is: dbms.memory.pagecache.size. This specifies how much memory Neo4j is allowed to use for this cache.

If this is not explicitly defined on startup, Neo4j will look at how much available memory the machine has, subtract the JVM max heap allocation from that, and then use 50% of what is left for the page cache. This is considered the default configuration.

The following are two possible methods for estimating the page cache size:

  1. For an existing Neo4j database, sum up the size of all the *store.db* files in your store file directory, to figure out how big a page cache you need to fit all your data. Add another 20% for growth. For instance, on a posix system you can look at the total of running $ du -hc *store.db* in the data/databases/graph.db directory.
  2. For a new Neo4j database, it is useful to run an import with a fraction (e.g. 1/100th) of the data and then multiply the resulting store-size by that fraction (x 100). Add another 20% for growth. For example: import 1/100th of the data and sum up the sizes of the resulting database files. Then multiply by 120 for a total estimate of the database size, including 20% for growth.
Parameter Possible values Effect

dbms.memory.pagecache.size

The maximum amount of memory to use for the page cache, either in bytes, or greater byte-like units, such as 100m for 100 mega-bytes, or 4g for 4 giga-bytes.

The amount of memory to use for mapping the store files, in a unit of bytes. This will automatically be rounded down to the nearest whole page. This value cannot be zero. For extremely small and memory constrained deployments, it is recommended to still reserve at least a couple of megabytes for the page cache.

unsupported.dbms.report_configuration

true or false

If set to true the current configuration settings will be written to the default system output, mostly the console or the logfiles.

6.3.3. Heap sizing

The size of the available heap memory is an important aspect for the performance of Neo4j.

Generally speaking, it is beneficial to configure a large enough heap space to sustain concurrent operations. For many setups, a heap size between 8G and 16G is large enough to run Neo4j reliably.

The heap memory size is determined by the parameters in NEO4J_HOME/conf/neo4j-wrapper.conf, namely dbms.memory.heap.initial_size and dbms.memory.heap.max_size providing the heap size in Megabytes, e.g. 16000. It is recommended to set these two parameters to the same value to avoid unwanted full garbage collection pauses.

6.3.4. Tuning of the garbage collector

The heap is separated into an old generation and a young generation. New objects are allocated in the young generation, and then later moved to the old generation, if they stay live (in use) for long enough. When a generation fills up, the garbage collector performs a collection, during which all other threads in the process are paused. The young generation is quick to collect since the pause time correlates with the live set of objects, and is independent of the size of the young generation. In the old generation, pause times roughly correlates with the size of the heap. For this reason, the heap should ideally be sized and tuned such that transaction and query state never makes it to the old generation.

The heap size is configured with the dbms.memory.heap.max_size (in MBs) setting in the neo4j-wrapper.conf file. The initial size of the heap is specified by the dbms.memory.heap.initial_size setting, or with the -Xms???m flag, or chosen heuristically by the JVM itself if left unspecified. The JVM will automatically grow the heap as needed, up to the maximum size. The growing of the heap requires a full garbage collection cycle. It is recommended to set the initial heap size and the maximum heap size to the same value. This way the pause that happens when the garbage collector grows the heap can be avoided.

The ratio of the size between the old generation and the new generation of the heap is controlled by the -XX:NewRatio=N flag. N is typically between 2 and 8 by default. A ratio of 2 means that the old generation size, divided by the new generation size, is equal to 2. In other words, two thirds of the heap memory will be dedicated to the old generation. A ratio of 3 will dedicate three quarters of the heap to the old generation, and a ratio of 1 will keep the two generations about the same size. A ratio of 1 is quite aggressive, but may be necessary if your transactions changes a lot of data. Having a large new generation can also be important if you run Cypher queries that need to keep a lot of data resident, for example when sorting big result sets.

If the new generation is too small, short-lived objects may be moved to the old generation too soon. This is called premature promotion and will slow the database down by increasing the frequency of old generation garbage collection cycles. If the new generation is too big, the garbage collector may decide that the old generation does not have enough space to fit all the objects it expects to promote from the new to the old generation. This turns new generation garbage collection cycles into old generation garbage collection cycles, again slowing the database down. Running more concurrent threads means that more allocations can take place in a given span of time, in turn increasing the pressure on the new generation in particular.

The Compressed OOPs feature in the JVM allows object references to be compressed to use only 32 bits. The feature saves a lot of memory, but is not enabled for heaps larger than 32 GB. Gains from increasing the heap size beyond 32 GB can therefore be small or even negative, unless the increase is significant (64 GB or above).

Neo4j has a number of long-lived objects, that stay around in the old generation, effectively for the lifetime of the Java process. To process them efficiently, and without adversely affecting the garbage collection pause time, we recommend using a concurrent garbage collector.

How to tune the specific garbage collection algorithm depends on both the JVM version and the workload. It is recommended to test the garbage collection settings under realistic load for days or weeks. Problems like heap fragmentation can take a long time to surface.

To gain good performance, these are the things to look into first:

  • Make sure the JVM is not spending too much time performing garbage collection. The goal is to have a large enough heap to make sure that heavy/peak load will not result in so called GC-trashing. Performance can drop as much as two orders of magnitude when GC-trashing happens. Having too large heap may also hurt performance so you may have to try some different heap sizes.
  • Use a concurrent garbage collector. We find that -XX:+UseG1GC works well in most use-cases.

    • The Neo4j JVM needs enough heap memory for the transaction state and query processing, plus some head-room for the garbage collector. Because the heap memory needs are so workload dependent, it is common to see configurations from 1 GB, up to 32 GBs of heap memory.
  • Start the JVM with the -server flag and a good sized heap.

    • The operating system on a dedicated server can usually make do with 1 to 2 GBs of memory, but the more physical memory the machine has, the more memory the operating system will need.

Edit the following properties:

Table 6.1. neo4j-wrapper.conf JVM tuning properties
Property Name Meaning

dbms.memory.heap.initial_size

initial heap size (in MB)

dbms.memory.heap.max_size

maximum heap size (in MB)

dbms.jvm.additional

additional literal JVM parameter

6.4. Transaction logs

The transaction logs record all operations in the database. They are the source of truth in scenarios where the database needs to be recovered. Transaction logs are used to provide for incremental backups, as well as for cluster operations. For any given configuration at least the latest non-empty transaction log will be kept.

By default, log switches happen when log sizes surpass 250 MB. This can be configured using the parameter Table A.69, “dbms.tx_log.rotation.size”.

There are several different means of controlling the amount of transaction logs that is kept, using the parameter Table A.68, “dbms.tx_log.rotation.retention_policy”. The format in which this is configured is:

dbms.tx_log.rotation.retention_policy=<true/false>
dbms.tx_log.rotation.retention_policy=<amount> <type>

For example:

# Will keep logical logs indefinitely
dbms.tx_log.rotation.retention_policy=true

# Will keep only the most recent non-empty log
dbms.tx_log.rotation.retention_policy=false

# Will keep logical logs which contains any transaction committed within 30 days
dbms.tx_log.rotation.retention_policy=30 days

# Will keep logical logs which contains any of the most recent 500 000 transactions
dbms.tx_log.rotation.retention_policy=500k txs

Full list:

Type Description Example

files

Number of most recent logical log files to keep

"10 files"

size

Max disk size to allow log files to occupy

"300M size" or "1G size"

txs

Number of latest transactions to keep Keep

"250k txs" or "5M txs"

hours

Keep logs which contains any transaction committed within N hours from current time

"10 hours"

days

Keep logs which contains any transaction committed within N days from current time

"50 days"

6.5. Compressed property value storage

Neo4j can in many cases compress and inline the storage of property values, such as short arrays and strings, with the purpose of saving disk space and possibly an I/O operation.

Compressed storage of short arrays

Neo4j will try to store your primitive arrays in a compressed way. To do that, it employs a "bit-shaving" algorithm that tries to reduce the number of bits required for storing the members of the array. In particular:

  1. For each member of the array, it determines the position of leftmost set bit.
  2. Determines the largest such position among all members of the array.
  3. It reduces all members to that number of bits.
  4. Stores those values, prefixed by a small header.

That means that when even a single negative value is included in the array then the original size of the primitives will be used.

There is a possibility that the result can be inlined in the property record if:

  • It is less than 24 bytes after compression.
  • It has less than 64 members.

For example, an array long[] {0L, 1L, 2L, 4L} will be inlined, as the largest entry (4) will require 3 bits to store so the whole array will be stored in 4 × 3 = 12 bits. The array long[] {-1L, 1L, 2L, 4L} however will require the whole 64 bits for the -1 entry so it needs 64 × 4 = 32 bytes and it will end up in the dynamic store.

Compressed storage of short strings

Neo4j will try to classify your strings in a short string class and if it manages that it will treat it accordingly. In that case, it will be stored without indirection in the property store, inlining it instead in the property record, meaning that the dynamic string store will not be involved in storing that value, leading to reduced disk footprint. Additionally, when no string record is needed to store the property, it can be read and written in a single lookup, leading to performance improvements and less disk space required.

The various classes for short strings are:

  • Numerical, consisting of digits 0..9 and the punctuation space, period, dash, plus, comma and apostrophe.
  • Date, consisting of digits 0..9 and the punctuation space dash, colon, slash, plus and comma.
  • Hex (lower case), consisting of digits 0..9 and lower case letters a..f
  • Hex (upper case), consisting of digits 0..9 and upper case letters a..f
  • Upper case, consisting of upper case letters A..Z, and the punctuation space, underscore, period, dash, colon and slash.
  • Lower case, like upper but with lower case letters a..z instead of upper case
  • E-mail, consisting of lower case letters a..z and the punctuation comma, underscore, period, dash, plus and the at sign (@).
  • URI, consisting of lower case letters a..z, digits 0..9 and most punctuation available.
  • Alpha-numerical, consisting of both upper and lower case letters a..zA..z, digits 0..9 and punctuation space and underscore.
  • Alpha-symbolical, consisting of both upper and lower case letters a..zA..Z and the punctuation space, underscore, period, dash, colon, slash, plus, comma, apostrophe, at sign, pipe and semicolon.
  • European, consisting of most accented european characters and digits plus punctuation space, dash, underscore and period — like latin1 but with less punctuation.
  • Latin 1.
  • UTF-8.

In addition to the string’s contents, the number of characters also determines if the string can be inlined or not. Each class has its own character count limits, which are

Table 6.2. Character count limits
String class Character count limit

Numerical, Date and Hex

54

Uppercase, Lowercase and E-mail

43

URI, Alphanumerical and Alphasymbolical

36

European

31

Latin1

27

UTF-8

14

That means that the largest inline-able string is 54 characters long and must be of the Numerical class and also that all Strings of size 14 or less will always be inlined.

Also note that the above limits are for the default 41 byte PropertyRecord layout — if that parameter is changed via editing the source and recompiling, the above have to be recalculated.

6.6. Linux file system tuning

Databases often produce many small and random reads when querying data, and few sequential writes when committing changes.

By default, most Linux distributions schedule IO requests using the Completely Fair Queuing (CFQ) algorithm, which provides a good balance between throughput and latency. The particular IO workload of a database, however, is better served by the Deadline scheduler. The Deadline scheduler gives preference to read requests, and processes them as soon as possible. This tends to decrease the latency of reads, while the latency of writes goes up. Since the writes are usually sequential, their lingering in the IO queue increases the change of overlapping or adjacent write requests being merged together. This effectively reduces the number of writes that are sent to the drive.

On Linux, the IO scheduler for a drive, in this case sda, can be changed at runtime like this:

$ echo 'deadline' > /sys/block/sda/queue/scheduler
$ cat               /sys/block/sda/queue/scheduler
noop [deadline] cfq

Another recommended practice is to disable file and directory access time updates. This way, the file system won’t have to issue writes that update this meta-data, thus improving write performance. This can be accomplished by setting the noatime,nodiratime mount options in fstab, or when issuing the disk mount command.

6.7. Disks, RAM and other tips

As with any persistence solution, performance depends a lot on the persistence media used. Better disks equals better performance.

If you have multiple disks or persistence media available it may be a good idea to divide the store files and transaction logs across those disks. Keeping the store files on disks with low seek time can do wonders for read operations. Today a typical mechanical drive has an average seek time of about 5ms. This can cause a query or traversal to be very slow when the amount of RAM assigned to the page cache is too small. A new, good SATA enabled SSD has an average seek time of less than 100 microseconds, meaning those scenarios will execute at least 50 times faster. However, this is still tens or hundreds of times slower than accessing RAM.

To avoid hitting disk you need more RAM. On a standard mechanical drive you can handle graphs with a few tens of millions of primitives (nodes, relationships and properties) with 2-3 GBs of RAM. A server with 8-16 GBs of RAM can handle graphs with hundreds of millions of primitives, and a good server with 16-64 GBs can handle billions of primitives. However, if you invest in a good SSD you will be able to handle much larger graphs on less RAM.

Use tools like dstat or vmstat to gather information when your application is running. If the swap or paging numbers are high, that is a sign that the Lucene indexes don’t quite fit in memory. In this case, queries that do index lookups will have high latencies.

When Neo4j starts up, its page cache is empty and needs to warm up. This can take a while, especially for large stores. It is not uncommon to see a long period with many blocks being read from the drive, and high IO wait times.

Neo4j also flushes its page cache in the background, so it is not uncommon to see a steady trickle of blocks being written to the drive during steady-state. This background flushing only produces a small amount of IO wait, however. If the IO wait times are high during steady-state, it may be a sign that Neo4j is bottle-necked on the random IO performance of the drive. The best drives for running Neo4j are fast SSDs that can take lots of random IOPS.