This section covers Neo4j I/O behavior, and how to optimize for operations on disk.
Databases often produce many small and random reads when querying data, and few sequential writes when committing changes.
By default, most Linux distributions schedule IO requests using the Completely Fair Queuing (CFQ) algorithm, which provides a good balance between throughput and latency. The particular IO workload of a database, however, is better served by the Deadline scheduler. The Deadline scheduler gives preference to read requests, and processes them as soon as possible. This tends to decrease the latency of reads, while the latency of writes goes up. Since the writes are usually sequential, their lingering in the IO queue increases the change of overlapping or adjacent write requests being merged together. This effectively reduces the number of writes that are sent to the drive.
On Linux, the IO scheduler for a drive, in this case
sda, can be changed at runtime like this:
$ echo 'deadline' > /sys/block/sda/queue/scheduler $ cat /sys/block/sda/queue/scheduler noop [deadline] cfq
Another recommended practice is to disable file and directory access time updates.
This way, the file system won’t have to issue writes that update this meta-data, thus improving write performance.
This can be accomplished by setting the
noatime,nodiratime mount options in fstab, or when issuing the disk mount command.