Apache Arrow

This feature is in the alpha tier. For more information on feature tiers, see Operations reference.

GDS supports importing graphs and exporting properties via Apache Arrow Flight. This chapter is dedicated to configuring the Arrow Flight Server as part of the Neo4j and GDS installation. For using Arrow Flight with an Arrow client, please refer to our documentation for projecting graphs and streaming properties.

Arrow is bundled with GDS Enterprise Edition which must be installed.

1. Installation

On a standalone Neo4j Server, Arrow needs to be explicitly enabled and configured. The Flight Server is disabled by default, to enable it, add the following to your $NEO4J_HOME/conf/neo4j.conf file:

gds.arrow.enabled=true

The following additional settings are available:

Name Default Optional Description

gds.arrow.listen_address

localhost:8491

Yes

This setting specifies how the Arrow Flight Server listens for incoming connections. It consists of two parts; an IP address (e.g. 127.0.0.1 or 0.0.0.0) and a port number (e.g. 7687), and is expressed in the format <ip-address>:<port-number>.

gds.arrow.advertised_listen_address

localhost:8491

Yes

This setting specifies the address that clients should use for connecting to the Arrow Flight Server. This is useful if the server runs behind a proxy that forwards the advertised address to an internal address. The advertised address consists of two parts; an address (fully qualified domain name, hostname, or IP address) and a port number (e.g. 8491), and is expressed in the format <address>:<port-number>.

gds.arrow.abortion_timeout

10

Yes

The maximum time in minutes to wait for the next command before aborting the import process.

gds.arrow.batch_size

10000

Yes

The batch size used for arrow property export.

Note, that any change to the configuration requires a database restart.

2. Authentication

Client connections to the Arrow Flight server are authenticated using the Neo4j native auth provider. Any authenticated user can perform all available Arrow operations, i.e., graph projection and property streaming. There are no dedicated roles to configure.

To enable authentication, use the following DBMS setting:

dbms.security.auth_enabled=true

3. Encryption

Communication between client and server can optionally be encrypted. The Arrow Flight server is re-using the Neo4j native SSL framework. In terms of configuration scope, the Arrow Server supports https and bolt. If both scopes are configured, the Arrow Server prioritizes the https scope.

To enable encryption for https, use the following DBMS settings:

dbms.ssl.policy.https.enabled=true
dbms.ssl.policy.https.private_key=private.key
dbms.ssl.policy.https.public_certificate=public.crt

It is currently not possible to use a certificate where the private key is protected by a password. Such a certificate can be used to secure Neo4j. For Arrow Flight, only certificates with a password-less private key are accepted.

Flight server encryption can also be deactivated, even if it is configured for Neo4j. To disable encryption, use the following settings:

gds.arrow.encryption.never=true

The setting can only used to deactivate encryption for the GDS Flight server. It cannot be used to deactivate encryption for the Neo4j server. It cannot be used to activate encryption for the GDS Flight server if the Neo4j server has no encryption configured.

4. Monitoring

To return details about the status of the GDS Flight server, GDS provides the gds.debug.arrow procedure.

Run the debug procedure.
CALL gds.debug.arrow()
YIELD
  running: Boolean,
  enabled: Boolean,
  listenAddress: String,
  batchSize: Integer,
  abortionTimeout: Integer
Table 1. Results
Name Type Description

running

Boolean

True, if the Arrow Flight Server is currently running.

enabled

Boolean

True, if the corresponding setting is enabled.

listenAddress

String

The address (host and port) the Arrow Flight Client should connect to.

batchSize

Integer

The batch size used for arrow property export.

abortionTimeout

Duration

The maximum time to wait for the next command before aborting the import process.

advertisedListenAddress

String

DEPRECATED: Same as listenAddress.

serverLocation

String

DEPRECATED: Always NULL.