Goals This guide explains Neo4j sizing and hardware calculation tips and best practices. It details what you need to make Neo4j fly. Prerequisites You should know how to download and install Neo4j on your system. You should be accustomed to… Learn More →
This guide explains Neo4j sizing and hardware calculation tips and best practices. It details what you need to make Neo4j fly.
You should know how to download and install Neo4j on your system. You should be accustomed to the graph data model and have written your Neo4j application. You should know that Neo4j employs its own persistence and caching implementation, have seen the files on disk (
data/graph.db), and have glanced at
Neo4j is a native graph database; it handles the graph data records down to the disk/filesystem level. On top of the memory mapped filesystem, Neo4j employs additional caches for loaded Node and Relationship-Records.
Neo4j’s storage is organized in record-based files per data structure – nodes, relationships, properties, labels, and so on. Each node and relationship record block is directly addressable by its id.
The blocks have different sizes to accommodate the different type of data they contain:
- Node records: labels, pointer to first relationship, pointer to first property block.
- Relationship records: relationship-type, pointers to: start-node, end-node, first property block, next-relationship and previous relationship for both start and end-nodes
- Property records: up to four property entries, each with a property-key id, and an encoded / compressed property value or pointer to string or array storage list
Nodes occupy 15B of space, relationships occupy 31B of space and properties occupy 41B of space.
An example disk space calculation is:
Low Level Cache
The low level cache requirements are the same as for the disk space. Its configuration (memory mapped regions per file), is done in
conf/neo4j.properties. You should try to memory-map as much data as your RAM allows (allow for 6 to 16GB heap depending on availability and use-cases).
In optimizing a graph database solution for performance, we should bear the following guidelines in mind:
- Utilize the filesystem cache as much as possible, and map store files in their entirety into this cache
- Have as much RAM available as possible
- Tune the JVM heap and keep an eye on GC behaviors
- Consider using fast disks—SSDs or enterprise flash hardware to bring up the bottom line when disk access becomes inevitable
There are three areas in which we can optimize for performance:
- Keep the object cache to a reasonable size (2 to 16 GB, about 50% of the heap)
- Increase the file cache (from 2 GB all the way up to 200 GB or more in exceptional circumstances)
- Increase the percentage of the store mapped into the filesystem cache
- Invest in faster disks: SSDs or enterprise flash hardware