Release Date: 24 March 2022
GDS 2.0.0 is compatible with Neo4j 4.3 and 4.4 but not Neo4j 3.5.x, 4.0, 4.1, or 4.2. For a 3.5 compatible release, please see GDS 1.1.7. For a 4.0 compatible release, please see GDS 1.6.5. For a 4.1 or 4.2 compatible release, please see GDS 1.8.6Breaking changes
- Moved BFS to product tier
Gds.alpha.bfs => gds.bfs.stream- Added support for
gds.bfs.stream.estimate - Removed configuration parameter
relationshipWeightProperty. - Rename configuration parameter
startNodeIdtosourceNode. - Rename YIELD field
startNodeIdtosourceNode.
- Moved DFS to product tier
Gds.alpha.bfs => gds.dfs.stream- Added support for
gds.dfs.stream.estimate - Removed configuration parameter
relationshipWeightProperty. - Rename configuration parameter
startNodeIdtosourceNode. - Rename YIELD field
startNodeIdtosourceNode.
- Moved KNN to product tier
gds.beta.knn.mutate=>gds.knn.mutategds.beta.knn.stats=>gds.knn.statsgds.beta.knn.stream=>gds.knn.streamgds.beta.knn.write=>gds.knn.write- Removed ANN (superseded by KNN)
nodeWeightPropertyfor KNN replaced bynodeProperties, which accepts multiple properties.
- Similarity:
- Moved alpha similarity functions to product tier.
gds.alpha.similarity.cosine=>gds.similarity.cosinegds.alpha.similarity.euclidean=>gds.similarity.euclideangds.alpha.similarity.euclideanDistance=>gds.similarity.euclideanDistancegds.alpha.similarity.jaccard=>gds.similarity.jaccardgds.alpha.similarity.overlap=>gds.similarity.overlapgds.alpha.similarity.pearson=>gds.similarity.pearson- Pearson similarity function no longer accepts Lists of Maps, but computes over Lists of Numbers like the other similarity functions.
- Removed
gds.alpha.similarity.asVectorfunction.
- Removed alpha similarity procedures (similarity metrics added as modes for KNN and Node Similarity).
gds.alpha.similarity.cosinegds.alpha.similarity.euclideangds.alpha.similarity.overlapgds.alpha.similarity.pearsongds.alpha.ml.ann
- Moved alpha similarity functions to product tier.
- Moved delta stepping shortest path to product tier
gds.alpha.shortestPath.deltaStepping =>gds.allShortestPath.delta.[write,stream, mutate, estimate]
- Moved Closeness Centrality to beta tier
gds.alpha.closeness.stream => gds.beta.closeness.streamgds.alpha.closeness.stats => gds.beta.closeness.statsgds.alpha.closeness.write => gds.beta.closeness.writegds.alpha.closeness.mutate => gds.beta.closeness.mutate- Removed return item
nodesfromwriteandmutatemode. - Renamed configuration parameter
improvedtouseWassermanFaust. - Renamed YIELD field
centralitytoscoreinstreammode.
- Moved link prediction pipeline procedures to
betatier:gds.beta.pipeline.linkPrediction.addFeaturegds.beta.pipeline.linkPrediction.addNodePropertygds.beta.pipeline.linkPrediction.configureParamsgds.beta.pipeline.linkPrediction.configureSplitgds.beta.pipeline.linkPrediction.creategds.beta.pipeline.linkPrediction.predict.mutategds.beta.pipeline.linkPrediction.predict.mutate.estimategds.beta.pipeline.linkPrediction.predict.streamgds.beta.pipeline.linkPrediction.predict.stream.estimategds.beta.pipeline.linkPrediction.traingds.beta.pipeline.linkPrediction.train.estimate.
- Moved node classification pipeline procedures to
betatier:gds.beta.pipeline.nodeClassification.selectFeaturesgds.beta.pipeline.nodeClassification.addNodePropertygds.beta.pipeline.nodeClassification.configureParamsgds.beta.pipeline.nodeClassification.configureSplitgds.beta.pipeline.nodeClassification.creategds.beta.pipeline.nodeClassification.predict.mutategds.beta.pipeline.nodeClassification.predict.mutate.estimategds.beta.pipeline.nodeClassification.predict.streamgds.beta.pipeline.nodeClassification.predict.stream.estimategds.beta.pipeline.nodeClassification.predict.writegds.beta.pipeline.nodeClassification.predict.write.estimategds.beta.pipeline.nodeClassification.traingds.beta.pipeline.nodeClassification.train.estimate.
- Removed non-pipeline versions of Node Classification, including procedures:
gds.alpha.ml.nodeClassification.predict.mutategds.alpha.ml.nodeClassification.predict.mutate.estimategds.alpha.ml.nodeClassification.predict.streamgds.alpha.ml.nodeClassification.predict.stream.estimategds.alpha.ml.nodeClassification.predict.writegds.alpha.ml.nodeClassification.predict.write.estimategds.alpha.ml.nodeClassification.traingds.alpha.ml.nodeClassification.train.estimate
- Removed non-pipeline versions of Link Prediction, including procedures:
gds.alpha.ml.linkPrediction.predict.mutategds.alpha.ml.linkPrediction.predict.mutate.estimategds.alpha.ml.linkPrediction.predict.streamgds.alpha.ml.linkPrediction.predict.stream.estimategds.alpha.ml.linkPrediction.predict.writegds.alpha.ml.linkPrediction.predict.write.estimategds.alpha.ml.linkPrediction.traingds.alpha.ml.linkPrediction.train.estimate
- Additional changes to node classification & link predictions
- Removed
batchSizeparameter for Node Classification pipeline predict modes, because it is not useful. - The procedure resolution for the
taskNameparameter ofgds.alpha.ml.pipeline.linkPrediction.addNodePropertyandgds.alpha.ml.pipeline.nodeClassification.addNodePropertychanged and now requires the inclusion of the tier, e.g.'scaleProperties'must now be written as'alpha.scaleProperties'. - Changed node classification and link prediction training pipelines management from the model catalog to the new pipeline catalog. Trained pipelines (which we refer to as models) are still managed in the model catalog.
- Replaced
gds.beta.pipeline.[nodeClassification|linkPrediction].configureParams(pipelineName::String, parameterSpace::List of Map)bygds.beta.pipeline.[nodeClassification|linkPrediction].addLogisticRegression(pipelineName::String, config::Map. This also removes the previous default model candidate. - Removed
useBiasFeatureparameter ingds.beta.pipeline.linkPrediction.addLogisticRegression.
- Removed
- Graph Projection:
- gds.graph.create renamed gds.graph.project
- In
gds.graph.project, defining the same node property for different labels with differentneoPropertyKeysis no longer allowed. - Inputs for comparison expressions in
graph.project.subgraphmust resolve to the same type, i.e.,longordouble.
- Removed support for anonymous graph syntax from algorithm execution. Only explicit, named graphs are supported.
- Memory estimation is an exception to this.
- Changed the syntax of memory estimation. The graph name or graph create config always go into the first parameter, the algorithm config always into the second.
- Dropped
Neo4j 4.2support - Removed
USE_PRE_AGGREGATIONfeature toggle.
New features
- KNN graduated to product tier:
- Added a random walk sampler for initializing KNN based on the topology of the input graph. The configuration key
initialSampleraccepts eitherUNIFORMorRANDOM_WALK. - Added possibility to exclude pairs of nodes in the K-Nearest Neighbor algorithm that have a similarity below a given threshold defined with an optional configuration parameter
similarityCutoff. - Added perturbation rate to KNN, to reduce the risk of some neighbors not being explored. Configured with
perturbationRateas a value between 0 and 1. - Improved normalization of KNN metrics to make them consistent and usable in combination
- KNN supports multiple node properties via the
nodePropertieskey - Added metrics
ranIterations,didConvergeandnodePairsConsideredto the result ofgds.knn.[stats|mutate|write]. - KNN can compute similarity over multiple node properties, specified with the new
nodePropertiesparameter. - Added new similarity metrics to KNN, configured per property via the
nodePropertieskey.- Euclidean
- Overlap
- Pearson.
- Added a random walk sampler for initializing KNN based on the topology of the input graph. The configuration key
- Added similarity metric selection to Node Similarity configured with similarityMetric (supports Jaccard or Overlap)
- BFS & DFS graduated to product tier
- Added support for
mutatemode withgds.dfs.mutate,gds.bfs.mutate - Added support for
estimatemode togds.bfs.[stream|mutate]andgds.dfs.[stream|mutate]procedures. - Added progress logging support
- Added support for
- Added a new parallel single-source shortest path algorithm to product-tier:
gds.allShortestPaths.delta.streamgds.allShortestPaths.delta.write.estimategds.allShortestPaths.delta.writegds.allShortestPaths.delta.write.estimategds.allShortestPaths.delta.mutategds.allShortestPaths.delta.mutate.estimate.
- Closeness Centrality graduated to beta tier, added:
gds.beta.closeness.mutategds.beta.closeness.stats
- Node Classification:
- Models produced with
gds.alpha.ml.pipeline.nodeClassification.traincan now be stored (persisted) usinggds.alpha.model.store. - Added
estimatemode togds.alpha.ml.pipeline.nodeClassification.[train|predict.stream|predict.mutate|predict.write]procedures. - Added
modelSelectionStatstogds.alpha.ml.pipeline.nodeClassification.train - Only save metrics for winning model inside modelInfo.
- Models produced with
- Link Prediction:
- Models produced with
gds.alpha.ml.pipeline.linkPrediction.traincan now be stored (persisted) usinggds.alpha.model.store. - Added
estimatemode togds.alpha.ml.pipeline.linkPrediction.trainprocedure. - Added
estimatemode togds.alpha.ml.pipeline.linkPrediction.[train|predict.stream|predict.mutate]procedures. - Added
modelSelectionStatstogds.alpha.ml.pipeline.linkPrediction.train - Only save metrics for winning model inside modelInfo.
- Models produced with
- Added support for Random Forest models in both Link Prediction and Node Classification pipelines with
gds.alpha.pipeline.[linkPrediction|nodeClassification].addRandomForest - Added pipeline catalog procedures for managing training pipelines:
gds.beta.pipeline.listgds.beta.pipeline.existsgds.beta.pipeline.drop.
- Added new way of projecting a graph using Cypher:
gds.alpha.graph.project, which is an aggregation rather than a procedure. - Added surface for hints and warnings generated by executed tasks with the new
gds.alpha.userLoglogging procedure. - Support for write back from Neo4j Causal Cluster Read Replica instance (requires Enterprise GDS).
- Support for graph projections backup and restore with
gds.alpha.backupandgds.alpha.restore(requires Enterprise GDS)
Bug fixes
- Fixed a bug where Node2Vec would produce an AIOOBE on sufficiently large graphs.
- Fixed a bug where ForkJoin pools were not properly closed which could lead to OOMs using Pregel-based algorithms,e.g. Page Rank.
- GraphSAGE:
- Fixed a bug where
gds.beta.graphSagewould produce incorrect results for smaller graphs. - Fixed a bug where
gds.beta.graphSagewould produce incorrect results for the pool aggregator.
- Fixed a bug where
- Node Classification & Link Prediction pipelines:
- Fixed a bug where
gds.alpha.ml.pipeline.nodeClassification.trainwould train a model under the wrong username and not be accessible for the actual user. - Fixed a bug where
gds.alpha.ml.pipeline.nodeClassification.trainandgds.alpha.ml.pipeline.linkPrediction.trainwould skip applying a penalty to the weight of the last feature. - Fixed a bug where the trainConfig of persisted models would not be shown to the user.
- Fixed a bug where
gds.alpha.ml.pipeline.nodeClassification.trainwould not scale penalty to train set size correctly.
- Fixed a bug where
- Fixed a bug in
gds.beta.graph.create.subgraphwhere long values greater than2^53were not properly handled during expression evaluation. - Triangle Count & Local Clustering Coefficient
- Fixed a bug where
gds.triangleCountandgds.localClusteringCoefficientmight produce wrong results when using anodeLabelsfilter. - Fixed a bug where graph intersection used in Triangle Count and Local Clustering Coefficient would fail on union node filtered graphs.
- Fixed a bug where
- Fixed a bug where
gds.alpha.closenessmight produce incorrect results for directed graphs. - Fixed a bug where function
gds.alpha.similarity.cosineand proceduresgds.alpha.similarity.cosine.[stats,stream,write]returned the absolute value of the cosine computation, instead of the cosine value itself. - Fixed a bug where cypher on gds would try to access node properties as relationship properties and vice versa.
- Fixed a bug where
gds.graph.create.cypherwould sometimes not display the root cause in case of an error. - Fixed a bug where concurrently computing degrees on a node filtered graph would produce an AIOOBE.
- Fixed a bug where the memory estimation for generated Pregel procedures was calculated incorrectly.
Improvements
- GraphSAGE:
- Improved runtime performance for
gds.beta.graphSagewhen using therelationshipWeightconfiguration parameter. - Improve memory usage of
gds.beta.graphSageby computing the features per batch lazily.
- Improved runtime performance for
- Memory estimation for
gds.graph.projectreturns the estimated peak memory consumption during loading instead of the estimated final graph size. - Reduced memory consumption while loading using Native or Cypher projections.
gds.alpha.ml.pipeline.[nodeClassification|linkPrediction].trainwill raise an error when either of train, test, or validation sets are empty.- Added
failIfMissingflag togds.beta.[pipeline|model].drop. - Implemented batched prediction for LinkPrediction which improves runtime.
- Breadth First / Depth First Search:
- Parallel implementation of
gds.bfs.stream. - Result field
pathofgds.bfs.streamandgds.dfs.streamwill only be computed if explicitly specified in theYIELDclause or there is noYIELDclause.
- Parallel implementation of
- Provide more information to users if a node is missing a particular property in KNN.
Recent Graph Data Science Releases
- Graph Data Science 2.22
- Graph Data Science 2.21
- Graph Data Science 2.20
- Graph Data Science 2.19
- Graph Data Science 2.18