Release Date: 24 March 2022
GDS 2.0.0 is compatible with Neo4j 4.3 and 4.4 but not Neo4j 3.5.x, 4.0, 4.1, or 4.2. For a 3.5 compatible release, please see GDS 1.1.7. For a 4.0 compatible release, please see GDS 1.6.5. For a 4.1 or 4.2 compatible release, please see GDS 1.8.6Breaking changes
- Moved BFS to product tier 
- Gds.alpha.bfs => gds.bfs.stream
- Added support for gds.bfs.stream.estimate
- Removed configuration parameter relationshipWeightProperty.
- Rename configuration parameter startNodeIdtosourceNode.
- Rename YIELD field startNodeIdtosourceNode.
 
- Moved DFS to product tier 
- Gds.alpha.bfs => gds.dfs.stream
- Added support for gds.dfs.stream.estimate
- Removed configuration parameter relationshipWeightProperty.
- Rename configuration parameter startNodeIdtosourceNode.
- Rename YIELD field startNodeIdtosourceNode.
 
- Moved KNN to product tier 
- gds.beta.knn.mutate=>- gds.knn.mutate
- gds.beta.knn.stats=>- gds.knn.stats
- gds.beta.knn.stream=>- gds.knn.stream
- gds.beta.knn.write=>- gds.knn.write
- Removed ANN (superseded by KNN)
- nodeWeightPropertyfor KNN replaced by- nodeProperties, which accepts multiple properties.
 
- Similarity: 
- Moved alpha similarity functions to product tier.  
- gds.alpha.similarity.cosine=>- gds.similarity.cosine
- gds.alpha.similarity.euclidean=>- gds.similarity.euclidean
- gds.alpha.similarity.euclideanDistance=>- gds.similarity.euclideanDistance
- gds.alpha.similarity.jaccard=>- gds.similarity.jaccard
- gds.alpha.similarity.overlap=>- gds.similarity.overlap
- gds.alpha.similarity.pearson=>- gds.similarity.pearson
- Pearson similarity function no longer accepts Lists of Maps, but computes over Lists of Numbers like the other similarity functions.
- Removed gds.alpha.similarity.asVectorfunction.
 
- Removed alpha similarity procedures (similarity metrics added as modes for KNN and Node Similarity).  
- gds.alpha.similarity.cosine
- gds.alpha.similarity.euclidean
- gds.alpha.similarity.overlap
- gds.alpha.similarity.pearson
- gds.alpha.ml.ann
 
 
- Moved alpha similarity functions to product tier.  
- Moved delta stepping shortest path to product tier 
- gds.alpha.shortestPath.deltaStepping =>- gds.allShortestPath.delta.[write,stream, mutate, estimate]
 
- Moved Closeness Centrality to beta tier 
- gds.alpha.closeness.stream => gds.beta.closeness.stream
- gds.alpha.closeness.stats => gds.beta.closeness.stats
- gds.alpha.closeness.write => gds.beta.closeness.write
- gds.alpha.closeness.mutate => gds.beta.closeness.mutate
- Removed return item nodesfromwriteandmutatemode.
- Renamed configuration parameter improvedtouseWassermanFaust.
- Renamed YIELD field centralitytoscoreinstreammode.
 
- Moved link prediction pipeline procedures to betatier:- gds.beta.pipeline.linkPrediction.addFeature
- gds.beta.pipeline.linkPrediction.addNodeProperty
- gds.beta.pipeline.linkPrediction.configureParams
- gds.beta.pipeline.linkPrediction.configureSplit
- gds.beta.pipeline.linkPrediction.create
- gds.beta.pipeline.linkPrediction.predict.mutate
- gds.beta.pipeline.linkPrediction.predict.mutate.estimate
- gds.beta.pipeline.linkPrediction.predict.stream
- gds.beta.pipeline.linkPrediction.predict.stream.estimate
- gds.beta.pipeline.linkPrediction.train
- gds.beta.pipeline.linkPrediction.train.estimate.
 
- Moved node classification pipeline procedures to betatier:- gds.beta.pipeline.nodeClassification.selectFeatures
- gds.beta.pipeline.nodeClassification.addNodeProperty
- gds.beta.pipeline.nodeClassification.configureParams
- gds.beta.pipeline.nodeClassification.configureSplit
- gds.beta.pipeline.nodeClassification.create
- gds.beta.pipeline.nodeClassification.predict.mutate
- gds.beta.pipeline.nodeClassification.predict.mutate.estimate
- gds.beta.pipeline.nodeClassification.predict.stream
- gds.beta.pipeline.nodeClassification.predict.stream.estimate
- gds.beta.pipeline.nodeClassification.predict.write
- gds.beta.pipeline.nodeClassification.predict.write.estimate
- gds.beta.pipeline.nodeClassification.train
- gds.beta.pipeline.nodeClassification.train.estimate.
 
- Removed non-pipeline versions of Node Classification, including procedures: 
- gds.alpha.ml.nodeClassification.predict.mutate
- gds.alpha.ml.nodeClassification.predict.mutate.estimate
- gds.alpha.ml.nodeClassification.predict.stream
- gds.alpha.ml.nodeClassification.predict.stream.estimate
- gds.alpha.ml.nodeClassification.predict.write
- gds.alpha.ml.nodeClassification.predict.write.estimate
- gds.alpha.ml.nodeClassification.train
- gds.alpha.ml.nodeClassification.train.estimate
 
- Removed non-pipeline versions of Link Prediction, including procedures: 
- gds.alpha.ml.linkPrediction.predict.mutate
- gds.alpha.ml.linkPrediction.predict.mutate.estimate
- gds.alpha.ml.linkPrediction.predict.stream
- gds.alpha.ml.linkPrediction.predict.stream.estimate
- gds.alpha.ml.linkPrediction.predict.write
- gds.alpha.ml.linkPrediction.predict.write.estimate
- gds.alpha.ml.linkPrediction.train
- gds.alpha.ml.linkPrediction.train.estimate
 
- Additional changes to node classification & link predictions 
- Removed batchSizeparameter for Node Classification pipeline predict modes, because it is not useful.
- The procedure resolution for the taskNameparameter ofgds.alpha.ml.pipeline.linkPrediction.addNodePropertyandgds.alpha.ml.pipeline.nodeClassification.addNodePropertychanged and now requires the inclusion of the tier, e.g.'scaleProperties'must now be written as'alpha.scaleProperties'.
- Changed node classification and link prediction training pipelines management from the model catalog to the new pipeline catalog. Trained pipelines (which we refer to as models) are still managed in the model catalog.
- Replaced gds.beta.pipeline.[nodeClassification|linkPrediction].configureParams(pipelineName::String, parameterSpace::List of Map)bygds.beta.pipeline.[nodeClassification|linkPrediction].addLogisticRegression(pipelineName::String, config::Map. This also removes the previous default model candidate.
- Removed useBiasFeatureparameter ingds.beta.pipeline.linkPrediction.addLogisticRegression.
 
- Removed 
- Graph Projection: 
- gds.graph.create renamed gds.graph.project
- In gds.graph.project, defining the same node property for different labels with differentneoPropertyKeysis no longer allowed.
- Inputs for comparison expressions in graph.project.subgraphmust resolve to the same type, i.e.,longordouble.
 
- Removed support for anonymous graph syntax from algorithm execution. Only explicit, named graphs are supported. 
- Memory estimation is an exception to this.
 
- Changed the syntax of memory estimation. The graph name or graph create config always go into the first parameter, the algorithm config always into the second.
- Dropped Neo4j 4.2support
- Removed USE_PRE_AGGREGATIONfeature toggle.
New features
- KNN graduated to product tier: 
- Added a random walk sampler for initializing KNN based on the topology of the input graph. The configuration key initialSampleraccepts eitherUNIFORMorRANDOM_WALK.
- Added possibility to exclude pairs of nodes in the K-Nearest Neighbor algorithm that have a similarity below a given threshold defined with an optional configuration parameter similarityCutoff.
- Added perturbation rate to KNN, to reduce the risk of some neighbors not being explored. Configured with perturbationRateas a value between 0 and 1.
- Improved normalization of KNN metrics to make them consistent and usable in combination
- KNN supports multiple node properties via the nodePropertieskey
- Added metrics ranIterations,didConvergeandnodePairsConsideredto the result ofgds.knn.[stats|mutate|write].
- KNN can compute similarity over multiple node properties, specified with the new nodePropertiesparameter.
- Added new similarity metrics to KNN, configured per property via the nodePropertieskey.- Euclidean
- Overlap
- Pearson.
 
 
- Added a random walk sampler for initializing KNN based on the topology of the input graph. The configuration key 
- Added similarity metric selection to Node Similarity configured with similarityMetric (supports Jaccard or Overlap)
- BFS & DFS graduated to product tier 
- Added support for mutatemode withgds.dfs.mutate,gds.bfs.mutate
- Added support for  estimatemode togds.bfs.[stream|mutate]andgds.dfs.[stream|mutate]procedures.
- Added progress logging support
 
- Added support for 
- Added a new parallel single-source shortest path algorithm to product-tier: 
- gds.allShortestPaths.delta.stream
- gds.allShortestPaths.delta.write.estimate
- gds.allShortestPaths.delta.write
- gds.allShortestPaths.delta.write.estimate
- gds.allShortestPaths.delta.mutate
- gds.allShortestPaths.delta.mutate.estimate.
 
- Closeness Centrality graduated to beta tier, added: 
- gds.beta.closeness.mutate
- gds.beta.closeness.stats
 
- Node Classification: 
- Models produced with gds.alpha.ml.pipeline.nodeClassification.traincan now be stored (persisted) usinggds.alpha.model.store.
- Added estimatemode togds.alpha.ml.pipeline.nodeClassification.[train|predict.stream|predict.mutate|predict.write]procedures.
- Added modelSelectionStatstogds.alpha.ml.pipeline.nodeClassification.train
- Only save metrics for winning model inside modelInfo.
 
- Models produced with 
- Link Prediction: 
- Models produced with gds.alpha.ml.pipeline.linkPrediction.traincan now be stored (persisted) usinggds.alpha.model.store.
- Added estimatemode togds.alpha.ml.pipeline.linkPrediction.trainprocedure.
- Added estimatemode togds.alpha.ml.pipeline.linkPrediction.[train|predict.stream|predict.mutate]procedures.
- Added modelSelectionStatstogds.alpha.ml.pipeline.linkPrediction.train
- Only save metrics for winning model inside modelInfo.
 
- Models produced with 
- Added support for Random Forest models in both Link Prediction and Node Classification pipelines with gds.alpha.pipeline.[linkPrediction|nodeClassification].addRandomForest
- Added pipeline catalog procedures for managing training pipelines: 
- gds.beta.pipeline.list
- gds.beta.pipeline.exists
- gds.beta.pipeline.drop.
 
- Added new way of projecting a graph using Cypher: gds.alpha.graph.project, which is an aggregation rather than a procedure.
- Added surface for hints and warnings generated by executed tasks with the new gds.alpha.userLoglogging procedure.
- Support for write back from Neo4j Causal Cluster Read Replica instance (requires Enterprise GDS).
- Support for graph projections backup and restore with gds.alpha.backupandgds.alpha.restore(requires Enterprise GDS)
Bug fixes
- Fixed a bug where Node2Vec would produce an AIOOBE on sufficiently large graphs.
- Fixed a bug where ForkJoin pools were not properly closed which could lead to OOMs using Pregel-based algorithms,e.g. Page Rank.
- GraphSAGE: 
- Fixed a bug where gds.beta.graphSagewould produce incorrect results for smaller graphs.
- Fixed a bug where gds.beta.graphSagewould produce incorrect results for the pool aggregator.
 
- Fixed a bug where 
- Node Classification & Link Prediction pipelines: 
- Fixed a bug where gds.alpha.ml.pipeline.nodeClassification.trainwould train a model under the wrong username and not be accessible for the actual user.
- Fixed a bug where gds.alpha.ml.pipeline.nodeClassification.trainandgds.alpha.ml.pipeline.linkPrediction.trainwould skip applying a penalty to the weight of the last feature.
- Fixed a bug where the trainConfig of persisted models would not be shown to the user.
- Fixed a bug where gds.alpha.ml.pipeline.nodeClassification.trainwould not scale penalty to train set size correctly.
 
- Fixed a bug where 
- Fixed a bug in gds.beta.graph.create.subgraphwhere long values greater than2^53were not properly handled during expression evaluation.
- Triangle Count & Local Clustering Coefficient 
- Fixed a bug where gds.triangleCountandgds.localClusteringCoefficientmight produce wrong results when using anodeLabelsfilter.
- Fixed a bug where graph intersection used in Triangle Count and Local Clustering Coefficient would fail on union node filtered graphs.
 
- Fixed a bug where 
- Fixed a bug where gds.alpha.closenessmight produce incorrect results for directed graphs.
- Fixed a bug where function gds.alpha.similarity.cosineand proceduresgds.alpha.similarity.cosine.[stats,stream,write]returned the absolute value of the cosine computation, instead of the cosine value itself.
- Fixed a bug where cypher on gds would try to access node properties as relationship properties and vice versa.
- Fixed a bug where gds.graph.create.cypherwould sometimes not display the root cause in case of an error.
- Fixed a bug where concurrently computing degrees on a node filtered graph would produce an AIOOBE.
- Fixed a bug where the memory estimation for generated Pregel procedures was calculated incorrectly.
Improvements
- GraphSAGE: 
- Improved runtime performance for gds.beta.graphSagewhen using therelationshipWeightconfiguration parameter.
- Improve memory usage of gds.beta.graphSageby computing the features per batch lazily.
 
- Improved runtime performance for 
- Memory estimation for gds.graph.projectreturns the estimated peak memory consumption during loading instead of the estimated final graph size.
- Reduced memory consumption while loading using Native or Cypher projections.
- gds.alpha.ml.pipeline.[nodeClassification|linkPrediction].trainwill raise an error when either of train, test, or validation sets are empty.
- Added failIfMissingflag togds.beta.[pipeline|model].drop.
- Implemented batched prediction for LinkPrediction which improves runtime.
- Breadth First / Depth First Search: 
- Parallel implementation of gds.bfs.stream.
- Result field pathofgds.bfs.streamandgds.dfs.streamwill only be computed if explicitly specified in theYIELDclause or there is noYIELDclause.
 
- Parallel implementation of 
- Provide more information to users if a node is missing a particular property in KNN.
Recent Graph Data Science Releases
- Graph Data Science 2.22
- Graph Data Science 2.21
- Graph Data Science 2.20
- Graph Data Science 2.19
- Graph Data Science 2.18