GDS with Neo4j Causal Cluster

This section describes how the Neo4j Graph Data Science library can be used in a Neo4j Causal Cluster deployment.

This feature is not available in AuraDS

It is possible to run GDS as part of Neo4j Causal Cluster deployment. Since GDS performs large computations with the full resources of the system it is not suitable to run as part of the cluster’s core. We make use of a Read Replica instance to deploy the GDS library and process analytical workloads. Calls to GDS write procedures are internally directed to the cluster LEADER instance via server-side routing.

1. Deployment

  • The cluster must contain at least one Read Replica instance

    • single Core member and a Read Replica is a valid scenario.

    • GDS workloads are not load-balanced if there are more than one Read Replica instances.

  • Cluster should be configured to use server-side routing.

  • GDS plugin deployed on the Read Replica.

    • A valid GDS Enterprise Edition license must be installed and configured on the Read Replica.

    • The driver connection to operated GDS should be made using the bolt:// protocol, or server-policy routed to the Read Replica instance.

For more information on setting up, configuring and managing a Neo4j Causal Clustering, please refer to the documentation.

2. GDS Configuration

The following optional settings can be used to control transaction size.

Property Default Value

gds.cluster.tx.min.size

10000

gds.cluster.tx.max.size

100000

The batch size for writing node properties is computed using both values along with the configured concurrency and total node count. The batch size for writing relationship is using the lower value of the two settings. There are some procedures that support batch size configuration which takes precedence if present in procedure call parameters.