24.11. Memory mapped IO settings

Introduction

Each file in the Neo4j store is accessed through the Neo4j page cache, when reading from, or writing to, the store files. Since there is only one page cache, there is only one setting for specifying how much memory Neo4j is allowed to use for page caching. The shared page cache ensures that memory is split across the various store files in the most optimal manner [1], depending on how the database is used and what data is popular.

The memory for the page cache is allocated outside the normal Java heap, so you need to take both the Java heap, and the page cache, into consideration in your capacity planning. Other processes running on the OS will impact the availability of such memory. Neo4j will require all of the heap memory of the JVM, plus the memory to be used for the page cache, to be available as physical memory. Other processes may thus not use more than what is available after the configured memory allocation is made for Neo4j.

[Important]Important

Make sure that your system is configured such that it will never need to swap. If memory belonging to the Neo4j process gets swapped out, it can lead to considerable performance degradation.

The amount of memory available to the page cache can be configured in two ways; both using the mapped_memory_total_size setting. You can either directly specify the number of bytes available to the page cache, e.g. 150M og 4G, or you can specify a percentage of the available memory, which is what the default setting of 50% means. In either case, the amount of memory reserved for the page cache is capped at the amount of free memory as measured when the database starts.

For optimal performance, you will want to have as much of your data fit in the page cache as possible. You can sum up the size of all the neostore.* files in your store file directory, to figure out how big a page cache you need to fit all your data. Obviously the store files will grow as you add more nodes, relationships and properties.

Batch insert example

Read general information on batch insertion in Chapter 38, Batch Insertion.

The configuration should suit the data set you are about to inject using BatchInsert. Lets say we have a random-like graph with 10M nodes and 100M relationships. Each node (and maybe some relationships) have different properties of string and Java primitive types. The important thing is that the page cache has enough memory to work with, that it doesn’t slow down the BatchInserter:

mapped_memory_total_size=4G

The configuration above will more or less fit the entire graph in memory. A rough formula to calculate the memory needed can look like this:

bytes_needed = number_of_nodes * 15
             + number_of_relationships * 34
             + number_of_properties * 64

Note that the size of the individual property very much depends on what data it contains. The numbers given in the above formula are only a rough estimate.



[1] This is an informal comparison to the store-specific memory mapping settings of previous versions of Neo4j. We are not claiming that our page replacement algorithms are optimal in the formal sense. Truly optimal page replacement algorithms require knowledge of events arbitrarily far into the future.