Apache Arrow

This chapter explains how to set up Apache Arrow™ in the Neo4j Graph Data Science library.

This feature is in the alpha tier.

This feature is not available in AuraDS

GDS supports importing graphs and exporting properties via Apache Arrow Flight. This chapter is dedicated to configuring the Arrow Flight Server as part of the Neo4j and GDS installation. For using Arrow Flight with an Arrow client, please refer to our documentation for projecting graphs and streaming properties.

Arrow is bundled with GDS Enterprise Edition which must be installed.

1. Installation

On a standalone Neo4j Server, Arrow needs to be explicitly enabled and configured. The Flight Server is disabled by default, to enable it, add the following to your $NEO4J_HOME/conf/neo4j.conf file:

gds.arrow.enabled=true

The following additional settings are available:

Name Default Optional Description

gds.arrow.listen_address

localhost:8491

Yes

Address the GDS Arrow Flight Server should bind to.

gds.arrow.abortion_timeout

10

Yes

The maximum time in minutes to wait for the next command before aborting the import process.

gds.arrow.batch_size

10000

Yes

The batch size used for arrow property export.

Note, that any change to the configuration requires a database restart.

2. Authentication

Client connections to the Arrow Flight server are authenticated using the Neo4j native auth provider. Any authenticated user can perform all available Arrow operations, i.e., graph projection and property streaming. There are no dedicated roles to configure.

To enable authentication, use the following DBMS setting:

dbms.security.auth_enabled=true

3. Encryption

Communication between client and server can optionally be encrypted. The Arrow Flight server is re-using the Neo4j native SSL framework. In terms of configuration scope, the Arrow Server supports https and bolt. If both scopes are configured, the Arrow Server prioritizes the https scope.

To enable encryption for https, use the following DBMS settings:

dbms.ssl.policy.https.enabled=true
dbms.ssl.policy.https.private_key=private.key
dbms.ssl.policy.https.public_certificate=public.crt

4. Monitoring

To return details about the status of the GDS Flight server, GDS provides the gds.debug.arrow procedure.

Run the debug procedure.
CALL gds.debug.arrow()
YIELD
  running: Boolean,
  enabled: Boolean,
  listenAddress: String,
  batchSize: Integer,
  abortionTimeout: Integer
Table 1. Results
Name Type Description

running

Boolean

True, if the Arrow Flight Server is currently running.

enabled

Boolean

True, if the corresponding setting is enabled.

listenAddress

String

Address (host and port) the Arrow Flight Server is bound to.

batchSize

Integer

The batch size used for arrow property export.

abortionTimeout

Duration

The maximum time to wait for the next command before aborting the import process.