Graph Data Science 2.0.0

Release Date: 24 March 2022

GDS 2.0.0 is compatible with Neo4j 4.3 and 4.4 but not Neo4j 3.5.x, 4.0, 4.1, or 4.2. For a 3.5 compatible release, please see GDS 1.1.7. For a 4.0 compatible release, please see GDS 1.6.5. For a 4.1 or 4.2 compatible release, please see GDS 1.8.6

Breaking changes

Moved BFS to product tier
- Gds.alpha.bfs => gds.bfs.stream
- Added support for gds.bfs.stream.estimate
- Removed configuration parameter relationshipWeightProperty.
- Rename configuration parameter startNodeId to sourceNode.
- Rename YIELD field startNodeId to sourceNode.
Moved DFS to product tier
- Gds.alpha.bfs => gds.dfs.stream
- Added support for gds.dfs.stream.estimate
- Removed configuration parameter relationshipWeightProperty.
- Rename configuration parameter startNodeId to sourceNode.
- Rename YIELD field startNodeId to sourceNode.
Moved KNN to product tier
- gds.beta.knn.mutate => gds.knn.mutate
- gds.beta.knn.stats => gds.knn.stats
- gds.beta.knn.stream => gds.knn.stream
- gds.beta.knn.write => gds.knn.write
- Removed ANN (superseded by KNN)
- nodeWeightProperty for KNN replaced by nodeProperties, which accepts multiple properties.
Similarity:
- Moved alpha similarity functions to product tier.
  - gds.alpha.similarity.cosine => gds.similarity.cosine
  - gds.alpha.similarity.euclidean => gds.similarity.euclidean
  - gds.alpha.similarity.euclideanDistance => gds.similarity.euclideanDistance
  - gds.alpha.similarity.jaccard => gds.similarity.jaccard
  - gds.alpha.similarity.overlap => gds.similarity.overlap
  - gds.alpha.similarity.pearson => gds.similarity.pearson
  - Pearson similarity function no longer accepts Lists of Maps, but computes over Lists of Numbers like the other similarity functions.
  - Removed gds.alpha.similarity.asVector function.
- Removed alpha similarity procedures (similarity metrics added as modes for KNN and Node Similarity).
  - gds.alpha.similarity.cosine
  - gds.alpha.similarity.euclidean
  - gds.alpha.similarity.overlap
  - gds.alpha.similarity.pearson
  - gds.alpha.ml.ann
Moved delta stepping shortest path to product tier
- gds.alpha.shortestPath.deltaStepping => gds.allShortestPath.delta.[write,stream, mutate, estimate]
Moved Closeness Centrality to beta tier
- gds.alpha.closeness.stream => gds.beta.closeness.stream
- gds.alpha.closeness.stats => gds.beta.closeness.stats
- gds.alpha.closeness.write => gds.beta.closeness.write
- gds.alpha.closeness.mutate => gds.beta.closeness.mutate
- Removed return item nodes from write and mutate mode.
- Renamed configuration parameter improved to useWassermanFaust.
- Renamed YIELD field centrality to score in stream mode.
Moved link prediction pipeline procedures to beta tier:
- gds.beta.pipeline.linkPrediction.addFeature
- gds.beta.pipeline.linkPrediction.addNodeProperty
- gds.beta.pipeline.linkPrediction.configureParams
- gds.beta.pipeline.linkPrediction.configureSplit
- gds.beta.pipeline.linkPrediction.create
- gds.beta.pipeline.linkPrediction.predict.mutate
- gds.beta.pipeline.linkPrediction.predict.mutate.estimate
- gds.beta.pipeline.linkPrediction.predict.stream
- gds.beta.pipeline.linkPrediction.predict.stream.estimate
- gds.beta.pipeline.linkPrediction.train
- gds.beta.pipeline.linkPrediction.train.estimate.
Moved node classification pipeline procedures to beta tier:
- gds.beta.pipeline.nodeClassification.selectFeatures
- gds.beta.pipeline.nodeClassification.addNodeProperty
- gds.beta.pipeline.nodeClassification.configureParams
- gds.beta.pipeline.nodeClassification.configureSplit
- gds.beta.pipeline.nodeClassification.create
- gds.beta.pipeline.nodeClassification.predict.mutate
- gds.beta.pipeline.nodeClassification.predict.mutate.estimate
- gds.beta.pipeline.nodeClassification.predict.stream
- gds.beta.pipeline.nodeClassification.predict.stream.estimate
- gds.beta.pipeline.nodeClassification.predict.write
- gds.beta.pipeline.nodeClassification.predict.write.estimate
- gds.beta.pipeline.nodeClassification.train
- gds.beta.pipeline.nodeClassification.train.estimate.
Removed non-pipeline versions of Node Classification, including procedures:
- gds.alpha.ml.nodeClassification.predict.mutate
- gds.alpha.ml.nodeClassification.predict.mutate.estimate
- gds.alpha.ml.nodeClassification.predict.stream
- gds.alpha.ml.nodeClassification.predict.stream.estimate
- gds.alpha.ml.nodeClassification.predict.write
- gds.alpha.ml.nodeClassification.predict.write.estimate
- gds.alpha.ml.nodeClassification.train
- gds.alpha.ml.nodeClassification.train.estimate
Removed non-pipeline versions of Link Prediction, including procedures:
- gds.alpha.ml.linkPrediction.predict.mutate
- gds.alpha.ml.linkPrediction.predict.mutate.estimate
- gds.alpha.ml.linkPrediction.predict.stream
- gds.alpha.ml.linkPrediction.predict.stream.estimate
- gds.alpha.ml.linkPrediction.predict.write
- gds.alpha.ml.linkPrediction.predict.write.estimate
- gds.alpha.ml.linkPrediction.train
- gds.alpha.ml.linkPrediction.train.estimate
Additional changes to node classification & link predictions
- Removed batchSize parameter for Node Classification pipeline predict modes, because it is not useful.
- The procedure resolution for the taskName parameter of gds.alpha.ml.pipeline.linkPrediction.addNodeProperty and gds.alpha.ml.pipeline.nodeClassification.addNodeProperty changed and now requires the inclusion of the tier, e.g. 'scaleProperties' must now be written as 'alpha.scaleProperties'.
- Changed node classification and link prediction training pipelines management from the model catalog to the new pipeline catalog. Trained pipelines (which we refer to as models) are still managed in the model catalog.
- Replaced gds.beta.pipeline.[nodeClassification|linkPrediction].configureParams(pipelineName::String, parameterSpace::List of Map) by gds.beta.pipeline.[nodeClassification|linkPrediction].addLogisticRegression(pipelineName::String, config::Map. This also removes the previous default model candidate.
- Removed useBiasFeature parameter in gds.beta.pipeline.linkPrediction.addLogisticRegression.
Graph Projection:
- gds.graph.create renamed gds.graph.project
- In gds.graph.project, defining the same node property for different labels with different neoPropertyKeys is no longer allowed.
- Inputs for comparison expressions in graph.project.subgraph must resolve to the same type, i.e., long or double.
Removed support for anonymous graph syntax from algorithm execution. Only explicit, named graphs are supported.
- Memory estimation is an exception to this.
Changed the syntax of memory estimation. The graph name or graph create config always go into the first parameter, the algorithm config always into the second.
Dropped Neo4j 4.2 support
Removed USE_PRE_AGGREGATION feature toggle.

New features

KNN graduated to product tier:
- Added a random walk sampler for initializing KNN based on the topology of the input graph. The configuration key initialSampler accepts either UNIFORM or RANDOM_WALK.
- Added possibility to exclude pairs of nodes in the K-Nearest Neighbor algorithm that have a similarity below a given threshold defined with an optional configuration parameter similarityCutoff.
- Added perturbation rate to KNN, to reduce the risk of some neighbors not being explored. Configured with perturbationRate as a value between 0 and 1.
- Improved normalization of KNN metrics to make them consistent and usable in combination
- KNN supports multiple node properties via the nodeProperties key
- Added metrics ranIterations, didConverge and nodePairsConsidered to the result of gds.knn.[stats|mutate|write].
- KNN can compute similarity over multiple node properties, specified with the new nodeProperties parameter.
- Added new similarity metrics to KNN, configured per property via the nodeProperties key.
  - Euclidean
  - Overlap
  - Pearson.
Added similarity metric selection to Node Similarity configured with similarityMetric (supports Jaccard or Overlap)
BFS & DFS graduated to product tier
- Added support for mutate mode with gds.dfs.mutate, gds.bfs.mutate
- Added support for estimate mode to gds.bfs.[stream|mutate] and gds.dfs.[stream|mutate] procedures.
- Added progress logging support
Added a new parallel single-source shortest path algorithm to product-tier:
- gds.allShortestPaths.delta.stream
- gds.allShortestPaths.delta.write.estimate
- gds.allShortestPaths.delta.write
- gds.allShortestPaths.delta.write.estimate
- gds.allShortestPaths.delta.mutate
- gds.allShortestPaths.delta.mutate.estimate.
Closeness Centrality graduated to beta tier, added:
- gds.beta.closeness.mutate
- gds.beta.closeness.stats
Node Classification:
- Models produced with gds.alpha.ml.pipeline.nodeClassification.train can now be stored (persisted) using gds.alpha.model.store.
- Added estimate mode to gds.alpha.ml.pipeline.nodeClassification.[train|predict.stream|predict.mutate|predict.write] procedures.
- Added modelSelectionStats to gds.alpha.ml.pipeline.nodeClassification.train
- Only save metrics for winning model inside modelInfo.
Link Prediction:
- Models produced with gds.alpha.ml.pipeline.linkPrediction.train can now be stored (persisted) using gds.alpha.model.store.
- Added estimate mode to gds.alpha.ml.pipeline.linkPrediction.train procedure.
- Added estimate mode to gds.alpha.ml.pipeline.linkPrediction.[train|predict.stream|predict.mutate] procedures.
- Added modelSelectionStats to gds.alpha.ml.pipeline.linkPrediction.train
- Only save metrics for winning model inside modelInfo.
Added support for Random Forest models in both Link Prediction and Node Classification pipelines with gds.alpha.pipeline.[linkPrediction|nodeClassification].addRandomForest
Added pipeline catalog procedures for managing training pipelines:
- gds.beta.pipeline.list
- gds.beta.pipeline.exists
- gds.beta.pipeline.drop.
Added new way of projecting a graph using Cypher: gds.alpha.graph.project, which is an aggregation rather than a procedure.
Added surface for hints and warnings generated by executed tasks with the new gds.alpha.userLog logging procedure.
Support for write back from Neo4j Causal Cluster Read Replica instance (requires Enterprise GDS).
Support for graph projections backup and restore with gds.alpha.backup and gds.alpha.restore (requires Enterprise GDS)

Bug fixes

Fixed a bug where Node2Vec would produce an AIOOBE on sufficiently large graphs.
Fixed a bug where ForkJoin pools were not properly closed which could lead to OOMs using Pregel-based algorithms,e.g. Page Rank.
GraphSAGE:
- Fixed a bug where gds.beta.graphSage would produce incorrect results for smaller graphs.
- Fixed a bug where gds.beta.graphSage would produce incorrect results for the pool aggregator.
Node Classification & Link Prediction pipelines:
- Fixed a bug where gds.alpha.ml.pipeline.nodeClassification.train would train a model under the wrong username and not be accessible for the actual user.
- Fixed a bug where gds.alpha.ml.pipeline.nodeClassification.train and gds.alpha.ml.pipeline.linkPrediction.train would skip applying a penalty to the weight of the last feature.
- Fixed a bug where the trainConfig of persisted models would not be shown to the user.
- Fixed a bug where gds.alpha.ml.pipeline.nodeClassification.train would not scale penalty to train set size correctly.
Fixed a bug in gds.beta.graph.create.subgraph where long values greater than 2^53 were not properly handled during expression evaluation.
Triangle Count & Local Clustering Coefficient
- Fixed a bug where gds.triangleCount and gds.localClusteringCoefficient might produce wrong results when using a nodeLabels filter.
- Fixed a bug where graph intersection used in Triangle Count and Local Clustering Coefficient would fail on union node filtered graphs.
Fixed a bug where gds.alpha.closeness might produce incorrect results for directed graphs.
Fixed a bug where function gds.alpha.similarity.cosine and procedures gds.alpha.similarity.cosine.[stats,stream,write] returned the absolute value of the cosine computation, instead of the cosine value itself.
Fixed a bug where cypher on gds would try to access node properties as relationship properties and vice versa.
Fixed a bug where gds.graph.create.cypher would sometimes not display the root cause in case of an error.
Fixed a bug where concurrently computing degrees on a node filtered graph would produce an AIOOBE.
Fixed a bug where the memory estimation for generated Pregel procedures was calculated incorrectly.

Improvements

GraphSAGE:
- Improved runtime performance for gds.beta.graphSage when using the relationshipWeight configuration parameter.
- Improve memory usage of gds.beta.graphSage by computing the features per batch lazily.
Memory estimation for gds.graph.project returns the estimated peak memory consumption during loading instead of the estimated final graph size.
Reduced memory consumption while loading using Native or Cypher projections.
gds.alpha.ml.pipeline.[nodeClassification|linkPrediction].train will raise an error when either of train, test, or validation sets are empty.
Added failIfMissing flag to gds.beta.[pipeline|model].drop.
Implemented batched prediction for LinkPrediction which improves runtime.
Breadth First / Depth First Search:
- Parallel implementation of gds.bfs.stream.
- Result field path of gds.bfs.stream and gds.dfs.stream will only be computed if explicitly specified in the YIELD clause or there is no YIELD clause.
Provide more information to users if a node is missing a particular property in KNN.

Recent Graph Data Science Releases

See All Graph Data Science Releases →

Release Notes: Graph Data Science 2.0.0

Breaking changes

New features

Bug fixes

Improvements

Recent Graph Data Science Releases

Stay Connected