2.6. CAPI Flash

This section describes how to use CAPI Flash as storage for Neo4j.

Neo4j can be configured to use CAPI Flash as storage for its store files, instead of the file system. CAPI is the Coherent Accelerator Processor Interface technology from IBM, allowing a FGPA (Field Programmable Gate Array) on a PCIe (Peripheral Component Interconnect Express) expansion card to share a coherent view of memory with a Power8 CPU. CAPI Flash is an application of this technology to access storage, either embedded on the CAPI card or via fiber channel to flash storage appliances.

The Neo4j CAPI Flash integration allows greater I/O throughput and better scaling for concurrent I/O load. It also avoids double caching of the store files, which improves memory utilisation and avoids block tearing. By extension, it avoids the read-modify-write problem that can occur when file writes are not aligned to the native block size of the underlying storage system. Together, these advantages improve the performance of Neo4j, in particular for highly concurrent read workloads.

The Neo4j CAPI Flash integration is an extension that is compatible with Neo4j Enterprise Edition. It is available for download at the Neo4j download site.

Compatibility Notice: If upgrading from Neo4j 3.1.0 or 3.1.1 to any later version, a manual upgrade process must be followed. See Section 2.6.4, “Upgrading from 3.1.0 or 3.1.1”.

2.6.1. Configuring Neo4j to run on CAPI Flash

There are three main steps for configuring Neo4j to work with CAPI Flash:

  1. You must ensure that the environment – the Power8 system and its configuration — is properly configured to give Neo4j access to the CAPI Flash hardware.

    See Section, “Power8 System & CAPI Flash configuration” for more information.

  2. You will need the neo4j-blockdevice-VERSION.jar file, specific for the version of Neo4j that will be run on CAPI Flash.

    See Section, “Neo4j Block Device Integration Library” for more information.

  3. Some configurations must be added to neo4j.conf in order to enable CAPI Flash and specify how it should work.

    See Section, “Neo4j Block Device configuration” for more information.

Before beginning, ensure that Neo4j is not running. The configuration for Neo4j would ideally be started on a clean installation. However, it is possible to migrate an existing Neo4j database onto CAPI Flash storage using the neo4j-admin blockdev import command. Refer to Section 2.6.3, “Admin commands for CAPI Flash” for more information on how to do this. Power8 System & CAPI Flash configuration

First, review the documentation for the CAPI Flash hardware to ensure that it is installed correctly and that it is working. Also make sure that the CAPI Flash devices exposed through the operating system (typically through a path like /dev/sgX where X is a number) are accessible, readable and writeable to the user running the Neo4j database.

In a typical installation, the CAPI Flash devices will be read/write accessible to every user in the cxl group. If Neo4j is going to run as a dedicated neo4j user, then this user can be added to the cxl group by running a sudo usermod -a -G cxl neo4j command. Assuming the CAPI Flash software has been installed in the /opt/ibm/capikv directory, a user in the cxl group will be able to inspect what devices are available:

root:~$ /opt/ibm/capikv/bin/cxlfstatus
CXL Flash Device Status

Found 0601 0000:01:00.0 Slot2
  Device:  SCSI       Block    Mode       LUN WWID                           Persist
  sg2:     1:0:0:0,   sdb,     superpipe, 60025380025382462300035000000000,  sg0200      sd0200
  sg3:     1:1:0:0,   sdc,     superpipe, 60025380025382462300048000000000,  sg0210      sd0210

Found 0601 0005:01:00.0 Slot4
  Device:  SCSI       Block    Mode       LUN WWID                           Persist
  sg4:     2:0:0:0,   sdd,     superpipe, 60025380025382463300014000000000,  sg0400      sd0400
  sg5:     2:1:0:0,   sde,     superpipe, 60025380025382463300160000000000,  sg0510      sd0510


In the output above, we have two CAPI FlashGT cards, each with two SSDs installed. The sgX devices can be found in the /dev directory as /dev/sgX. The Mode field can be either legacy or superpipe, and in order for Neo4j to work on the devices, Mode must be set to superpipe. The last two columns are the "persistent port names", and are the recommended ports to use in the Neo4j configuration. More on that below.

Neo4j expects to have exclusive access to the CAPI Flash devices, and that only a single Neo4j instance will be using the devices at any point in time. Furthermore, Neo4j expects access to the physical LUNs (Logical Unit Number). This means that virtual LUNs are not supported. Virtual LUNs is an operational mode of CAPI Flash using CAPI Flash as an extension to RAM. This is not applicable when CAPI Flash is used as storage, as is the case when using it with Neo4j.

Next, figure out the topology of the CAPI Flash hardware. Each of the /dev/sgX devices is in CAPI Flash hardware terms known as a port. Each CAPI Flash card can have more than one port, and a system can have more than one CAPI Flash card installed. Furthermore, when using fiber channel attached flash, two different ports can refer to the same Logical Block Address (LBA) space. We will go through how to make Neo4j take advantage of more than one device in Section, “Neo4j Block Device configuration”. This feature is a significant contributor to the high concurrent I/O throughput that CAPI Flash offers.

When configuring Neo4j to use CAPI Flash, it is important that the persistent port names — such as /dev/sg0200 as opposed to /dev/sg2, for instance — are used for the device specifier. These persistent port names will never change their names, even if the system is restarted. This means that any references to them will always reference the same physical storage hardware. The other (ephemeral) port names may change their meaning between system restarts, such that they point to different storage hardware, and would thus end up representing a different configuration. If this happens then Neo4j will log an error and refuse to start.

Finally, the CAPI Flash block library needs to be installed on your system and available for use by Neo4j. This is usually a file called libcflsh_block.so and is typically found in a /opt/ibm/capikv/lib directory. The library may have to be readable and executable by the Neo4j user. The exact minimal working set of permissions depends on your setup, but in most cases, it is safe to mark the library as readable and executable to everyone since the CAPI devices themselves also require permissions. Neo4j Block Device Integration Library

The Neo4j Block Device Integration Library is distributed as a neo4j-blockdevice-VERSION.jar file, where the VERSION is composed of a target Neo4j version, e.g. 3.1.0, and a stitch version digit, such as neo4j-blockdevice- The stitch version allows more than one version of the block device integration library to be released for a given version of Neo4j. The library is only compatible with the given specific version of Neo4j. The integration library jar file is placed in the <neo4j-home>/lib directory, and given the same access permissions as its sibling jar files. This will ensure that the library is part of the classpath for Neo4j. Neo4j Block Device configuration

Three parameters must be configured in neo4j.conf in order for the Neo4j Block Device Integration to work:

  • Set dbms.memory.pagecache.swapper=capi to enable the CAPI Flash block device integration.
  • Set dbms.memory.pagecache.swapper.capi.lib to the path of the libcflsh_block.so library.
  • Set dbms.memory.pagecache.swapper.capi.device to a device specifier that references all relevant persistent CAPI Flash ports and describes their topology. See the section called “The device specifier and CAPI Flash device topology”.

Once everything has been configured correctly, the CAPI Flash device needs to be formatted. This is done with the neo4j-admin blockdev format command:

$neo4j-home> bin/neo4j-admin blockdev format

Neo4j stores its data in files, so the block device integration library comes with an embedded file system called DBFS. Formatting the device writes the necessary metadata for DBFS to work. Note that formatting the device will remove all data on the device. This cannot be undone.

After the device has been formatted, Neo4j can be started and will run with CAPI Flash as storage.

The device specifier and CAPI Flash device topology

The block device integration library exposes a virtual block device that can be composed of multiple physical block devices. This is configured using the dbms.memory.pagecache.swapper.capi.device setting in neo4j.conf.

The easiest device specifier configuration is one that is based on a single physical block device. In this case, the device specifier is simply the path to that physical device. For example:


If we have two devices exposed to us and they represent two different physical devices, then they will have different LBA-spaces (Logical Block Addressing). This means that LBA 0 on one is a different block than LBA 0 on the other. In this case we can bundle them up and use them as a single, larger device that has a capacity that is the sum of the two devices. We do this by providing the paths to both devices in the device specifier, separated by two consecutive path-separator characters. The path separator character is semicolon ; on Windows, and colon : on all other platforms.

Below is an example where sg0100 and sg0200 have different LBA-spaces, and are combined into a single, larger, logical device:


The logical device consists of interleaving 16 MiB stripes from each of the underlying devices in the order given by the device specifier. This effectively creates a software defined RAID-0 array of the underlying devices.

By combining several devices — as many as required — a logical device can be created with a capacity that is much larger than that provided by any individual device. Note that the participating underlying devices must all have the same capacity. Below is an example where three devices are combined:


Some devices can expose more than one port. They will look like multiple distinct devices to the operating system, even though they have the same LBA-spaces. An example is the CAPI Flash cards that connect to IBMs FlashSystem appliances through fiber channel. These cards have two FC (Fibre Channel) ports — two physical cables going out from the CAPI Flash card — that can both go to the same appliance and can be configured to expose the same LBA-space. Assume that sg0100 and sg0200 represent these ports. In this case, it will not matter if LBA 0 is accessed through sg0100 or sg0200. It will be the same physical block regardless of which port is used. We describe such a setup in the device specifier by separating the paths to sg0100 and sg0200 with a single path separator:


These two features — software RAID-0, and multi-port devices — can be combined. Devices that share an LBA-space will be grouped together, each separated by a single separator character, and each of the groups in turn separated by two separator characters. For instance, below is an example where sg1 and sg2 share an LBA-space, while sg3 and sg4 share a different LBA-space:

                                             ┌─┘       ┌┘         └┐       └─┐
                                        ┌────▼───┐┌────▼───┐  ┌────▼───┐┌────▼───┐
                                       │      device a      ││      device b      │

Regardless of how the devices are combined, all devices must still have exactly the same capacity. Also note that the above example only used the ephemeral port names to keep the diagram compact.

2.6.2. Limitations

Not all features of Neo4j are yet fully compatible with CAPI Flash as storage. The following is a list of features that are not supported: Dump and load facilites of the neo4j-admin command

The admin commands neo4j-admin dump and neo4j-admin load currently do not work with databases that are stored on CAPI Flash. The block device integration library provides other commands that can be used instead, but they are admittedly not quite as convenient at this time. Changing the dbms.memory.pagecache.swapper parameters

Once the database has been started with a particular configuration, the dbms.memory.pagecache.swapper parameter cannot be changed. If you do so anyway, the database will log an error, and refuse to start.

Additionally, the nature of the block device integration technology itself has the following limitations:

The dbms.memory.pagecache.swapper.* configurations describe where the storage is located, and how it is put together. They cannot be changed — not even the order of devices in the device specifier can be modified — without formatting (neo4j-admin blockdev format) the devices afterwards. If the configuration is changed without reformatting the devices, then Neo4j will log an error and refuse to start. All of the devices given in the device specifier must have exactly the same capacity. Specifically, they must all have the same block size, and they must all have the same number of blocks. The easiest way to ensure this is to use devices of the same make and model for all of them.

2.6.3. Admin commands for CAPI Flash

The Neo4j Block Device Integration Library adds the following commands to the neo4j-admin utility:

  • neo4j-admin blockdev help prints a help message for all the block device specific admin commands.
  • neo4j-admin blockdev format formats the configured block device with DBFS. This removes all data on the device.
  • neo4j-admin blockdev ls gives a listing of the files stored on the configured block device.
  • neo4j-admin blockdev fsck checks the consistency of the DBFS file system metadata.
  • neo4j-admin blockdev import <from-database-path> imports the given existing database onto the configured block device storage.
  • neo4j-admin blockdev dump <file> dumps the binary contents of the given file on the block device to standard out.
  • neo4j-admin blockdev rename <source-path> <target-path> moves everything on the block device from the given source path to the target path.

Never use neo4j-admin commands on a running database unless they are explicitly documented to support this.

All of these commands require that the neo4j-blockdevice-*.jar file has been installed in the <neo4j-install-dir>/lib directory, and that the neo4j.conf file has been properly configured to use CAPI Flash.

If the block device integration has not been properly configured, then the following error message will be shown:

$neo4j-home> bin/neo4j-admin blockdev help
neo4j-admin blockdev <sub-command> [options]

    Configure, inspect and administrate the Neo4j block-device integration.
    Use the 'help' sub-command for more information.

This database has not been configured to use custom block device storage

Once the block device integration parameters have been configured in neo4j.conf, the help command will be more useful:

$neo4j-home> bin/neo4j-admin blockdev help
neo4j-admin blockdev <sub-command> [options]

    Configure, inspect and administrate the Neo4j block-device integration.
    Use the 'help' sub-command for more information.

The following sub-commands are available:

  * help
    Print this help message.

  * format
    Format the block device with a file system, erasing all data on it.

  * ls
    List the files stored on the block device, and their size.

  * fsck
    Check the consistency of the file system metadata on the block device.

  * import <from-database-path>
    Import an existing Neo4j database from the given path on the local file
    system, onto the block device at the same path.

  * dump <file>
    Dump the binary contents of the given file on the block device to standard

  * rename <source-path> <target-path>
    Move everything on device from source path to target path. The 'move' is
    effectively only a name change of the files. This is useful when performing
    a database-wide file move operation.
$neo4j-home> The format command

The format command formats the configured device with DBFS, the file system that is embedded with the Neo4j block device integration library.

This command must be called after completing the configuration of dbms.memory.pagecache.swapper* in neo4j.conf, but before starting Neo4j. The command effectively removes all data on the configured device and prepares a clean file system for the database.

Below is an example showing the output:

$neo4j-home> bin/neo4j-admin blockdev format
Done! Device has been formatted with DBFS: /dev/sg1::/dev/sg2
$neo4j-home> The ls command

The ls command lists the files stored on the configured block device. After a format, the device will be empty:

$neo4j-home> bin/neo4j-admin blockdev ls
Total: 0 files, 0 bytes

After the database has been started we can see that some files have been created:

$neo4j-home> bin/neo4j-admin blockdev ls
/neo4j-home/data/databases/graph.db/neostore.nodestore.db.labels            4096 bytes
/neo4j-home/data/databases/graph.db/neostore.nodestore.db                   4096 bytes
/neo4j-home/data/databases/graph.db/neostore.propertystore.db.index.keys    4096 bytes
/neo4j-home/data/databases/graph.db/neostore.propertystore.db.index         4096 bytes
/neo4j-home/data/databases/graph.db/neostore.propertystore.db.strings       4096 bytes
/neo4j-home/data/databases/graph.db/neostore.propertystore.db.arrays        24576 bytes
/neo4j-home/data/databases/graph.db/neostore.propertystore.db               16384 bytes
/neo4j-home/data/databases/graph.db/neostore.relationshipstore.db           12288 bytes
/neo4j-home/data/databases/graph.db/neostore.relationshiptypestore.db.names 4096 bytes
/neo4j-home/data/databases/graph.db/neostore.relationshiptypestore.db       4096 bytes
/neo4j-home/data/databases/graph.db/neostore.labeltokenstore.db.names       4096 bytes
/neo4j-home/data/databases/graph.db/neostore.labeltokenstore.db             4096 bytes
/neo4j-home/data/databases/graph.db/neostore.schemastore.db                 4096 bytes
/neo4j-home/data/databases/graph.db/neostore.relationshipgroupstore.db      4096 bytes
/neo4j-home/data/databases/graph.db/neostore                                4096 bytes
Total: 15 files, 102400 bytes

The ls command will list the absolute paths of all the files on the device, without regard for your current working directory. This is because it is operating on a file system that is unrelated to, and disconnected from, your normal file system. The fsck command

The fsck command checks the DBFS file system metadata to verify that it is consistent. If you experience an apparent inconsistency with data in a CAPI Flash database installation, it is advisable to run this command before doing a consistency check on the graph data itself. This fsck command is very fast compared to neo4j-admin check-consistency, and the latter will not be meaningful if fsck report failures.

A passing fsck looks like this:

$neo4j-home> bin/neo4j-admin blockdev fsck
DBFS file system is consistent!

If fsck reports any errors, then it is unfortunately not user-repairable. Instead, send the fsck output, along with neo4j.conf and debug.log to Neo4j support. The import command

The import command imports an existing database onto the block device storage while keeping its location the same. This means that the imported store files will have the same file names and paths on the block device as they did on the normal file system. This allows Neo4j to start up with the database immediately after the import.

Say for instance you already have some data in the default graph.db database, and would like to store it on block device storage. Then you can import it — after having configured Neo4j to use block device storage — with the bin/neo4j-admin blockdev import data/databases/graph.db command:

$neo4j-home> bin/neo4j-admin blockdev import data/databases/graph.db
2016-11-24 17:53:21.994+0000 INFO  [o.n.i.p.PageCache] Configured dbms.memory.pagecache.swapper: capi
neostore: 1/1. Done!
neostore.propertystore.db.arrays: 2/2. Done!
neostore.propertystore.db.index.keys: 1/1. Done!
neostore.labeltokenstore.db: 0/0. Done!
neostore.propertystore.db.strings: 1/1. Done!
neostore.nodestore.db: 2/2. Done!
neostore.relationshiptypestore.db.names: 1/1. Done!
neostore.propertystore.db.index: 0/0. Done!
neostore.labeltokenstore.db.names: 1/1. Done!
neostore.nodestore.db.labels: 0/0. Done!
neostore.relationshiptypestore.db: 0/0. Done!
neostore.schemastore.db: 0/0. Done!
neostore.relationshipgroupstore.db: 10/10. Done!
neostore.propertystore.db: 57/57. Done!
neostore.relationshipstore.db: 6074/6074. Done!
Done, PT0.628S.

The import command reports its progress as a percentage, and in terms of blocks. The dump command

The dump command takes an absolute path to a file on the block device storage, and writes its binary contents to the standard output stream. This can be used as a rudimentary export feature. The typical usage is to pipe the output into a file. Any error messages — such as the file name being mistyped — will be written to the standard error stream, so they will not be hidden when the standard output is sent into a pipe.

In the example below, the relationship store file is dumped, piped through gzip to be compressed, and then written to a file in a backup directory:

$neo4j-home> bin/neo4j-admin blockdev dump /neo4j-home/data/databases/graph.db/neostore.relationshiptypestore.db | gzip > /var/backup/neostore.relationshipstore.db.gz
$neo4j-home> The rename command

The rename command can be used to rename or move files on the block device storage. It takes a source-path parameter, which is used to match from the start of the absolute paths of files on the block device, and a target-path parameter that will replace the source-path portion of all matching paths. The matching is done on a path-element basis, so if we, for instance, want to rename the /neo4j-home directory to /home, we have to spell out the whole neo4j-home path element, e.g. rename /neo4j-home /home – just saying rename /neo4j- / will not work.

To illustrate, we can use rename to change the name of the database directory from the default graph.db to, say, example.db. If we have the following files on the block device storage:

$neo4j-home> bin/neo4j-admin blockdev ls
/neo4j-home/data/databases/graph.db/neostore.nodestore.db.labels            4096 bytes
/neo4j-home/data/databases/graph.db/neostore.nodestore.db                   4096 bytes
/neo4j-home/data/databases/graph.db/neostore.propertystore.db.index.keys    4096 bytes
/neo4j-home/data/databases/graph.db/neostore.propertystore.db.index         4096 bytes
/neo4j-home/data/databases/graph.db/neostore.propertystore.db.strings       4096 bytes
/neo4j-home/data/databases/graph.db/neostore.propertystore.db.arrays        24576 bytes
/neo4j-home/data/databases/graph.db/neostore.propertystore.db               16384 bytes
/neo4j-home/data/databases/graph.db/neostore.relationshipstore.db           12288 bytes
/neo4j-home/data/databases/graph.db/neostore.relationshiptypestore.db.names 4096 bytes
/neo4j-home/data/databases/graph.db/neostore.relationshiptypestore.db       4096 bytes
/neo4j-home/data/databases/graph.db/neostore.labeltokenstore.db.names       4096 bytes
/neo4j-home/data/databases/graph.db/neostore.labeltokenstore.db             4096 bytes
/neo4j-home/data/databases/graph.db/neostore.schemastore.db                 4096 bytes
/neo4j-home/data/databases/graph.db/neostore.relationshipgroupstore.db      4096 bytes
/neo4j-home/data/databases/graph.db/neostore                                4096 bytes
Total: 15 files, 102400 bytes

Then our rename command can be given by:

$neo4j-home> bin/neo4j-admin blockdev rename  /neo4j-home/data/databases/graph.db /neo4j-home/data/databases/example.db
rename from /neo4j-home/data/databases/graph.db/neostore.relationshiptypestore.db
rename to   /neo4j-home/data/databases/example.db/neostore.relationshiptypestore.db
rename from /neo4j-home/data/databases/graph.db/neostore.nodestore.db.labels
rename to   /neo4j-home/data/databases/example.db/neostore.nodestore.db.labels
rename from /neo4j-home/data/databases/graph.db/neostore.labeltokenstore.db.names
rename to   /neo4j-home/data/databases/example.db/neostore.labeltokenstore.db.names
rename from /neo4j-home/data/databases/graph.db/neostore.propertystore.db.arrays
rename to   /neo4j-home/data/databases/example.db/neostore.propertystore.db.arrays
rename from /neo4j-home/data/databases/graph.db/neostore.propertystore.db.strings
rename to   /neo4j-home/data/databases/example.db/neostore.propertystore.db.strings
rename from /neo4j-home/data/databases/graph.db/neostore.relationshiptypestore.db.names
rename to   /neo4j-home/data/databases/example.db/neostore.relationshiptypestore.db.names
rename from /neo4j-home/data/databases/graph.db/neostore.propertystore.db.index
rename to   /neo4j-home/data/databases/example.db/neostore.propertystore.db.index
rename from /neo4j-home/data/databases/graph.db/neostore.labeltokenstore.db
rename to   /neo4j-home/data/databases/example.db/neostore.labeltokenstore.db
rename from /neo4j-home/data/databases/graph.db/neostore.schemastore.db
rename to   /neo4j-home/data/databases/example.db/neostore.schemastore.db
rename from /neo4j-home/data/databases/graph.db/neostore.nodestore.db
rename to   /neo4j-home/data/databases/example.db/neostore.nodestore.db
rename from /neo4j-home/data/databases/graph.db/neostore.propertystore.db.index.keys
rename to   /neo4j-home/data/databases/example.db/neostore.propertystore.db.index.keys
rename from /neo4j-home/data/databases/graph.db/neostore
rename to   /neo4j-home/data/databases/example.db/neostore
rename from /neo4j-home/data/databases/graph.db/neostore.relationshipstore.db
rename to   /neo4j-home/data/databases/example.db/neostore.relationshipstore.db
rename from /neo4j-home/data/databases/graph.db/neostore.propertystore.db
rename to   /neo4j-home/data/databases/example.db/neostore.propertystore.db
rename from /neo4j-home/data/databases/graph.db/neostore.relationshipgroupstore.db
rename to   /neo4j-home/data/databases/example.db/neostore.relationshipgroupstore.db

The rename command can also be used on individual files, by providing the complete absolute path for the given file.

2.6.4. Upgrading from 3.1.0 or 3.1.1

If upgrading from any version other than 3.1.0 or 3.1.1, please refer to the standard upgrade procedures.

If upgrading from 3.1.0 or 3.1.1 to any later version, some manual steps are necessary due to a DBFS format change. See below for details.

  1. Cleanly shut down the database if it is running.
  2. Back up Neo4j using your regular backup method.
  3. Now take a backup of all your CAPI-hosted store files – using the pre-upgrade version of Neo4j (3.1.0 or 3.1.1) – with the neo4j-admin blockdev dump command.
  4. Install Neo4j 3.4.10 and drop the 3.4.10 version of neo4j-blockdevice-*.jar into the lib directory (see Section 3.2, “File locations”).
  5. Review the settings in the configuration files of the previous installation and transfer any custom settings to the 3.4.10 installation. In particular, transfer the CAPI Flash-specific configuration settings dbms.memory.pagecache.swapper, dbms.memory.pagecache.swapper.capi.lib, and dbms.memory.pagecache.swapper.capi.device in neo4j.conf. If databases are stored in a custom location, configure dbms.directories.data to point to it. If the database is not called graph.db, set dbms.active_database in neo4j.conf to the name of the database.

The following steps are performed using the newly installed Neo4j 3.4.10.

  1. Use neo4j-admin blockdev format to reformat to the new DBFS version.
  2. Use neo4j-admin blockdev import to re-import the store files you backed up in step 3 above.
  3. If necessary, use the neo4j-admin blockdev rename command to ensure that the store files regain their original path names.
  4. Set dbms.allow_upgrade=true in neo4j.conf. Neo4j will fail to start without this configuration.
  5. Start up Neo4j. The database upgrade will take place during startup. Information about the upgrade and a progress indicator are logged into debug.log.
  6. When upgrade has finished, dbms.allow_upgrade should be set to false or be removed from neo4j.conf.
  7. It is good practice to make a full backup immediately after the upgrade.