Configure Apache Arrow server

GDS supports importing graphs and exporting properties via Apache Arrow Flight. This chapter is dedicated to configuring the Arrow Flight Server as part of the Neo4j and GDS installation. For using Arrow Flight with an Arrow client, please refer to our documentation for projecting graphs and streaming properties.

The simplest way to use Arrow is through our Neo4j Graph Data Science Client, which uses Arrow by default if available.

Arrow is bundled with GDS Enterprise Edition which must be installed.

Installation

Arrow is installed by default on Neo4j AuraDS.

On a standalone Neo4j Server, Arrow needs to be explicitly enabled and configured. The Flight Server is disabled by default, to enable it, add the following to your $NEO4J_HOME/conf/neo4j.conf file:

gds.arrow.enabled=true

The following additional settings are available:

Name Default Optional Description

gds.arrow.listen_address

localhost:8491

Yes

This setting specifies how the Arrow Flight Server listens for incoming connections. It consists of two parts; an IP address (e.g. 127.0.0.1 or 0.0.0.0) and a port number (e.g. 7687), and is expressed in the format <ip-address>:<port-number>.

gds.arrow.advertised_listen_address

localhost:8491

Yes

This setting specifies the address that clients should use for connecting to the Arrow Flight Server. This is useful if the server runs behind a proxy that forwards the advertised address to an internal address. The advertised address consists of two parts; an address (fully qualified domain name, hostname, or IP address) and a port number (e.g. 8491), and is expressed in the format <address>:<port-number>.

gds.arrow.abortion_timeout

10

Yes

The maximum time in minutes to wait for the next command before aborting the import process.

gds.arrow.batch_size

10000

Yes

The batch size used for arrow property export.

Note, that any change to the configuration requires a database restart.

You can run CALL gds.debug.arrow() to check that Arrow is available.

Authentication

Client connections to the Arrow Flight server are authenticated using the Neo4j native auth provider. Any authenticated user can perform all available Arrow operations, i.e., graph projection and property streaming. There are no dedicated roles to configure.

To enable authentication, use the following DBMS setting:

dbms.security.auth_enabled=true

Encryption

Communication between client and server can optionally be encrypted. The Arrow Flight server is re-using the Neo4j native SSL framework. In terms of configuration scope, the Arrow Server supports https and bolt. If both scopes are configured, the Arrow Server prioritizes the https scope.

To enable encryption for https, use the following DBMS settings:

dbms.ssl.policy.https.enabled=true
dbms.ssl.policy.https.private_key=private.key
dbms.ssl.policy.https.public_certificate=public.crt

It is currently not possible to use a certificate where the private key is protected by a password. Such a certificate can be used to secure Neo4j. For Arrow Flight, only certificates with a password-less private key are accepted.

Flight server encryption can also be deactivated, even if it is configured for Neo4j. To disable encryption, use the following settings:

gds.arrow.encryption.never=true

The setting can only used to deactivate encryption for the GDS Flight server. It cannot be used to deactivate encryption for the Neo4j server. It cannot be used to activate encryption for the GDS Flight server if the Neo4j server has no encryption configured.

Monitoring

To return details about the status of the GDS Flight server, GDS provides the gds.debug.arrow procedure.

Run the debug procedure.
CALL gds.debug.arrow()
YIELD
  running: Boolean,
  enabled: Boolean,
  listenAddress: String,
  batchSize: Integer,
  abortionTimeout: Integer
Table 1. Results
Name Type Description

running

Boolean

True, if the Arrow Flight Server is currently running.

enabled

Boolean

True, if the corresponding setting is enabled.

versions

List

A list of supported command versions (e.g. ["v1", "v2"]).

listenAddress

String

The address (host and port) the Arrow Flight Client should connect to.

batchSize

Integer

The batch size used for arrow property export.

abortionTimeout

Duration

The maximum time to wait for the next command before aborting the import process.

advertisedListenAddress

String

DEPRECATED: Same as listenAddress.

serverLocation

String

DEPRECATED: Always NULL.

Versioning

All features that the GDS Arrow Flight server exposes are versioned. This allows us to make changes to existing features, introduce new ones or remove deprecated ones without breaking existing clients. The versioning scheme is applied to the commands that the client sends to the server. A command is a GDS-specific abstraction over Arrow Flight Actions, Descriptors and Tickets.

Commands are sent by the client as UTF-8-encoded JSON documents. Each command is associated with additional meta-data, such as the version of the command.

{
    name: "MY_COMMAND",
    version: "v1",
    body: {
        ...
    }
}

The only exception from that are Flight Actions, where the version is part of the action type. The version is always at the beginning of the action type, separated by a forward slash (/).

Action type: V1/CREATE_GRAPH
Action body: {
    ...
}

All available actions can be requested from the GDS Arrow Flight Server by using the LIST_ACTIONS endpoint.

Up until GDS 2.6, commands were not versioned as GDS Arrow features were still in alpha. In GDS 2.6, the GDS Arrow server supports both, versioned and prior alpha commands. Alpha commands are considered deprecated for deletion and will be removed in a future release.