GDS with Neo4j Causal Cluster

This feature is not available in AuraDS

It is possible to run GDS as part of Neo4j Causal Cluster deployment. Since GDS performs large computations with the full resources of the system it is not suitable to run as part of the cluster’s core. We make use of a Read Replica instance to deploy the GDS library and process analytical workloads. Calls to GDS write procedures are internally directed to the cluster LEADER instance via server-side routing.

1. Deployment

Please refer to the official Neo4j documentation for details on how to setup Neo4j Causal Cluster. Note that the link points to the latest Neo4j version documentation and the configuration settings may differ from earlier versions.

  • The cluster must contain at least one Read Replica instance

    • single Core member and a Read Replica is a valid scenario.

    • GDS workloads are not load-balanced if there are more than one Read Replica instances.

  • Cluster should be configured to use server-side routing.

  • GDS plugin deployed on the Read Replica.

    • A valid GDS Enterprise Edition license must be installed and configured on the Read Replica.

    • The driver connection to operated GDS should be made using the bolt:// protocol, or server-policy routed to the Read Replica instance.

For more information on setting up, configuring and managing a Neo4j Causal Clustering, please refer to the documentation.

When working with cluster configuration you should beware strict config validation in Neo4j.

When configuring GDS for a Read Replica you will introduce GDS-specific configuration into neo4j.conf - and that is fine because with the GDS plugin installed, Neo4j will happily validate those configuration items.

However, you might not be able to reuse that same configuration file verbatim on the core cluster members, because there you will not install GDS plugin, and thus Neo4j will not be able to validate the GDS-specific configuration items. And validation failure would mean Neo4j would refuse to start.

It is of course also possible to turn strict validation off.

2. GDS Configuration

The following optional settings can be used to control transaction size.

Property Default

gds.cluster.tx.min.size

10000

gds.cluster.tx.max.size

100000

The batch size for writing node properties is computed using both values along with the configured concurrency and total node count. The batch size for writing relationship is using the lower value of the two settings. There are some procedures that support batch size configuration which takes precedence if present in procedure call parameters.