Release Date: 10 October 2022
GDS 2.2.0 is compatible with Neo 4.3 versions ≥ 4.3.15 and 4.4 ≥ 4.4.9.


For GDS compatibility with previous releases of 4.3 and 4.4, please use please see GDS 2.1.6. The 2.1 series is also incompatible with Neo4j 3.5.x, 4.0, 4.1, and 4.2. For a 3.5 compatible release, please see GDS 1.1.7. For a 4.0 compatible release, please see GDS 1.6.5. For a 4.1 or 4.2 compatible release, please see GDS 1.8.8

Breaking changes

  • Link Prediction filtering:
    • Change graph filtering in gds.beta.pipeline.linkPrediction.train
      • Replace parameter nodeLabels with sourceNodeLabel and targetNodeLabel.
      • Replace parameter relationshipTypes with targetRelationshipType.
    • Change graph filtering in gds.beta.pipeline.linkPrediction.predict
      • Replace parameter nodeLabels with optional sourceNodeLabels and targetNodeLabels. By default, they will be derived from the model's train configuration.
      • Change the default value for relationshipTypes with the targetRelationshipType from the model's train configuration.
  • Node Classification & Regression filtering:
    • Change graph filtering in gds.beta.pipeline.nodeClassification.train and gds.beta.pipeline.nodeRegression.train
      • Replace parameter nodeLabels with targetNodeLabels
    • Change graph filtering in gds.beta.pipeline.nodeClassification.predict and gds.beta.pipeline.nodeRegression.predict
      • Replace parameter nodeLabels with targetNodeLabels By default, they will be derived from the model's train configuration.
  • Promoting Collapse Path to beta tier
    • Changed the procedure name to gds.beta.collapsePath.mutate
    • Use parameter pathTemplates to now specify multiplepath templates.
  • Promoting CELF to beta tier
    • Moved gds.alpha.influenceMaximization.celf.stream to gds.beta.influenceMaximization.celf.stream
  • For graphs created, with gds.graph.project.cypher, reduce output of gds.graph.list to only print the names of parameters. This will avoid printing the parameter values, which potentially leads to long procedure execution times.
  • RandomWalk algorithm promoted to product tier
    • gds.beta.randomWalk.stats => gds.randomWalk.stats
    • gds.beta.randomWalk.stats.estimate => gds.randomWalk.stats.estimate
    • gds.beta.randomWalk.stream => gds.randomWalk.stream
    • gds.beta.randomWalk.stream.estimate => gds.randomWalk.stream.estimate
  • Removed debug_log config field from Arrow Create Database action.
  • Node2Vec uses new embedding initializer NORMALIZED as default.
  • Dropped support for older patches:
    • for 4.3, only 4.3.15 and later is supported
    • for 4.4, only 4.4.9 and later is supported

New features

  • Link Prediction filtering:
    • Supports heterogeneous LinkPrediction pipelines by allowing configuring which node labels and relationship type to train and predict for.
    • See Breaking changes above for more details.
  • K-means:
    • Added centroids and average node-centroid distance to result for Mutate, Stats, and Write modes.
    • Added distance to centroid per node result in Stream mode.
    • Introduced a parameter numberOfRestarts that runs K-Means multiple times and picks the one with the minimum node-centroid distance.
    • Introduced a parameter computeSilhouette that if enabled will compute silhouette related metrics.
    • Introduced a parameter initialSampler which can select different sampling strategies for picking the first centroids.
      • Added the K-means++ initialization algorithm which can be enabled by setting initialSampler=kmeans++.
    • Introduced the parameter seedCentroids which seeds input centroids to k-means (in negation of the above).
  • Introduced a new scaler Center for ScaleProperties that subtracts the mean from each value.
  • Expose penaltyL2 to configure the l2 regularization term to the loss function in gds.beta.graphSage.train.
  • Add Multilayer Perceptron as a training method for node classification (gds.alpha.pipeline.nodeClassification.addMLP) and link prediction (gds.alpha.pipeline.linkPrediction.addMLP).
  • Add SAME_CATEGORY feature type to gds.beta.pipeline.linkPrediction.addFeature.
  • Added new procedure gds.beta.graph.relationships.stream that streams relationship topology.
  • Added arrow export endpoint gds.beta.graph.relationships.stream that streams relationship topology.
  • Added new procedure gds.alpha.graph.sample.rwr that creates a new graph projection by sampling using random walk with restarts.
  • Added the ability to collapse multiple paths using gds.beta.collapsePath.mutate.
  • Promoting CELF algorithm to beta tier.
    • Added gds.beta.influenceMaximization.celf.stats
    • Added gds.beta.influenceMaximization.celf.mutate
    • Added gds.beta.influenceMaximization.celf.write
    • Added progress tracking capabilities.
    • Added memory estimation.
  • Progress tracking for KMeans algorithm.
  • Memory estimation for KMeans.
    • added gds.alpha.kmeans.mutate.estimate
    • added gds.alpha.kmeans.stats.estimate
    • added gds.alpha.kmeans.stream.estimate
    • added gds.alpha.kmeans.write.estimate
  • Added procedure to compute modularity for pre-computed communities.
    • gds.alpha.modularity.stats
    • gds.alpha.modularity.stream
  • Added new config options to the GDS Flight server.
    • gds.arrow.encryption.never deactivates the server encryption even if it would otherwise be enabled.
    • gds.arrow.advertised_listen_address sets the server location that clients should connect to.
  • Added support for importing String node identifiers for the Arrow CREATE_DATABASE action.
  • Added capability to run BetweennessCentrality using relationship weights.
    • Added relationshipWeightProperty optional configuration parameter.
  • Added stats mode procedures for RandomWalk.
    • gds.beta.randomWalk.stats
    • gds.beta.randomWalk.stats.estimate
  • Introduced the ability to configure defaults and limits for configuration parameters.
    • gds.alpha.config.defaults.list
    • gds.alpha.config.defaults.set
    • gds.alpha.config.limits.list
    • gds.alpha.config.limits.set
  • Introduce new configuration parameters contextNodeLabels and contextRelationshipTypes in nodePropertySteps.
    • gds.beta.pipeline.linkPrediction.addNodeProperty
    • gds.beta.pipeline.nodeClassification.addNodeProperty
    • gds.alpha.pipeline.nodeRegression.addNodeProperty
    • The context is used to enlarge the input graph to the node property steps when running gds.beta.pipeline.linkPrediction.addNodeProperty.[train|predict], gds.beta.pipeline.nodeClassification.[train|predict] and gds.alpha.pipeline.nodeRegression.[train|predict].
  • Leiden
    • Add capability to mutate intermediateCommunities when includeIntermediateCommunities is set to true.
    • Add capability to write intermediateCommunities when includeIntermediateCommunities is set to true.
  • Node2Vec adds new embedding initializer NORMALIZED configured with the parameter embeddingInitializer.

Bug fixes

  • Fixed a bug where eager checking for business rules around GDS on a Neo4j cluster would cause the cluster to fail to start.
  • Fixed a bug where Neo4j users with admin role could not see all graphs in the catalog on GDS enterprise.
  • Fixed a bug in random graph generation where the resulting graph can end up with an incorrect relationship schema.
  • Fixed a bug where a schema filter would not create a deep copy of the property schema map.
  • Fixed a bug where modularity could have been incorrectly updated in ModularityOptimization. This may affect the number of performed iterations for ModularityOptimization or number of levels for Louvain.
  • Fixed a bug where restoring from csv could not read values wrapped in quotes.
  • Fixed a bug where KNN did not use the expected search space. This will improve the result but also increase the runtime.
  • Fixed a bug in ML autotuning where maxTrials included model evaluations with concrete configs.
  • Fixed a bug where gds.triangleCount and gds.localClusteringCoefficient were allowed to run on directed graphs.
  • Fixed a bug in gds.graph.export and Arrow DB import where the writeConcurrency was not respected.
  • Fixed a bug with Node Operations where gds.graph.nodeProperties.write, gds.graph.nodeProperties.drop and gds.graph.nodeProperties/y.stream would not accept String input for parameters nodeLabels and/or nodeProperties.
  • Fixed a bug, where Node2Vec would report negative losses.
  • Fixed a bug with gds.graph.nodeProperties/y.stream, where the wrong nodes where returned when specifying a nodeLabels filter and using Arrow.
  • Fixed a bug in the Louvain algorithm, where aggregating dense communities could potentially lead to an exception.
  • Fixed a bug where model loading is attempted even for unlicensed user, which might fail database startup.

Improvements

  • Better error handling in K-means
  • Improve memory estimation for gds.beta.pipeline.linkPrediction.train when the nodePropertySteps used a weighted graph.
  • Improve runtime of feature generation in gds.beta.linkPrediction.[train|predict].
  • Improve performance of gds.graph.project.cypher by using the subscriber API.
  • Improve convergence criteria for LogisticRegression and LinearRegression trainers, by making it independent of the number of batches. This affects gds.alpha.pipeline.nodeRegression.train, gds.beta.pipeline.[linkPrediction|nodeClassification].train.
  • Improve error handling on invalid user input.
  • Cypher on GDS projections is now capable of setting labels on nodes.
  • Promoting CELF algorithm to beta tier.
    • Improved performance.
  • A new column serverLocation in gds.debug.arrow() that shows the actual location where the server is running, which might be different from the configured location.
  • Improve runtime of KNN by reusing similarity computations. This also affects gds.beta.pipeline.linkPrediction.predict when using the approximate search strategy.
  • Configuration keys for Node-/Relationship- and Property-Projections are now case-insensitive.
  • The gds.debug.sysInfo procedure now shows the license expiration date when run with a valid GDS license.
  • Role-based access control not only for licensed, commercial users, but for everyone.
  • The arrow create database endpoint is now capable of creating properties with an improved range of property types: string, string[], datetime, local datetime
  • Improve error message thrown when calling gds.beta.[nodeClassification|linkPrediction].[train|predict] for too small graphs.
  • Added a new, optional method close to PregelComputation, allowing implementers to close any opened resources, such as ThreadLocals.
  • Added new feature toggle procedures for enabling / disabling Arrow database import (default: enabled)
    • gds.features.enableArrowDatabaseImport
    • gds.features.enableArrowDatabaseImport.reset
  • Runtime improvements for RandomWalk especially for the case of first order random walks.

Other changes

  • Renamed and deprecated some graph management procedures:
    • Renamed gds.alpha.graph.removeGraphProperty to gds.alpha.graph.graphProperty.drop.
    • Renamed gds.alpha.graph.streamGraphProperty to gds.alpha.graph.graphProperty.stream.
    • Deprecated gds.graph.removeNodeProperties by new procedure gds.graph.nodeProperties.drop.
    • Deprecated gds.graph.streamNodeProperties by new procedure gds.graph.nodeProperties.stream.
    • Deprecated gds.graph.streamNodeProperty by new procedure gds.graph.nodeProperty.stream.
    • Deprecated gds.graph.streamRelationshipProperties by new procedure gds.graph.relationshipProperties.stream.
    • Deprecated gds.graph.streamRelationshipProperty by new procedure gds.graph.relationshipProperty.stream.
    • Deprecated gds.graph.writeNodeProperties by new procedure gds.graph.nodeProperties.write.
    • Deprecated gds.graph.writeRelationship by new procedure gds.graph.relationship.write.
    • Deprecated gds.graph.deleteRelationships by new procedure gds.graph.relationships.drop.
  • CSV format changes for backup/restore
    • Export no longer writes databaseId field
    • Import still can read databaseId field but doesn't use it
  • Deprecated enableDebugLog config option for gds.graph.export.