Release Date: 18 March 2022
GDS 2.0.0 is compatible with Neo4j 4.3 and 4.4 but not Neo4j 3.5.x, 4.0, 4.1, or 4.2. For a 3.5 compatible release, please see GDS 1.1.7. For a 4.0 compatible release, please see GDS 1.6.5. For a 4.1 or 4.2 compatible release, please see GDS 1.8.6Breaking changes
- Moved BFS to product tier
Gds.alpha.bfs => gds.bfs.stream
- Added support for
gds.bfs.stream.estimate
- Removed configuration parameter
relationshipWeightProperty
. - Rename configuration parameter
startNodeId
tosourceNode
. - Rename YIELD field
startNodeId
tosourceNode
.
- Moved DFS to product tier
Gds.alpha.bfs => gds.dfs.stream
- Added support for
gds.dfs.stream.estimate
- Removed configuration parameter
relationshipWeightProperty
. - Rename configuration parameter
startNodeId
tosourceNode
. - Rename YIELD field
startNodeId
tosourceNode
.
- Moved KNN to product tier
gds.beta.knn.mutate
=>gds.knn.mutate
gds.beta.knn.stats
=>gds.knn.stats
gds.beta.knn.stream
=>gds.knn.stream
gds.beta.knn.write
=>gds.knn.write
- Removed ANN (superseded by KNN)
nodeWeightProperty
for KNN replaced bynodeProperties
, which accepts multiple properties.
- Similarity:
- Moved alpha similarity functions to product tier.
gds.alpha.similarity.cosine
=>gds.similarity.cosine
gds.alpha.similarity.euclidean
=>gds.similarity.euclidean
gds.alpha.similarity.euclideanDistance
=>gds.similarity.euclideanDistance
gds.alpha.similarity.jaccard
=>gds.similarity.jaccard
gds.alpha.similarity.overlap
=>gds.similarity.overlap
gds.alpha.similarity.pearson
=>gds.similarity.pearson
- Pearson similarity function no longer accepts Lists of Maps, but computes over Lists of Numbers like the other similarity functions.
- Removed
gds.alpha.similarity.asVector
function.
- Removed alpha similarity procedures (similarity metrics added as modes for KNN and Node Similarity).
gds.alpha.similarity.cosine
gds.alpha.similarity.euclidean
gds.alpha.similarity.overlap
gds.alpha.similarity.pearson
gds.alpha.ml.ann
- Moved alpha similarity functions to product tier.
- Moved delta stepping shortest path to product tier
gds.alpha.shortestPath.deltaStepping =>
gds.allShortestPath.delta.[write,stream, mutate, estimate]
- Moved Closeness Centrality to beta tier
gds.alpha.closeness.stream => gds.beta.closeness.stream
gds.alpha.closeness.stats => gds.beta.closeness.stats
gds.alpha.closeness.write => gds.beta.closeness.write
gds.alpha.closeness.mutate => gds.beta.closeness.mutate
- Removed return item
nodes
fromwrite
andmutate
mode. - Renamed configuration parameter
improved
touseWassermanFaust
. - Renamed YIELD field
centrality
toscore
instream
mode.
- Moved link prediction pipeline procedures to
beta
tier:gds.beta.pipeline.linkPrediction.addFeature
gds.beta.pipeline.linkPrediction.addNodeProperty
gds.beta.pipeline.linkPrediction.configureParams
gds.beta.pipeline.linkPrediction.configureSplit
gds.beta.pipeline.linkPrediction.create
gds.beta.pipeline.linkPrediction.predict.mutate
gds.beta.pipeline.linkPrediction.predict.mutate.estimate
gds.beta.pipeline.linkPrediction.predict.stream
gds.beta.pipeline.linkPrediction.predict.stream.estimate
gds.beta.pipeline.linkPrediction.train
gds.beta.pipeline.linkPrediction.train.estimate
.
- Moved node classification pipeline procedures to
beta
tier:gds.beta.pipeline.nodeClassification.selectFeatures
gds.beta.pipeline.nodeClassification.addNodeProperty
gds.beta.pipeline.nodeClassification.configureParams
gds.beta.pipeline.nodeClassification.configureSplit
gds.beta.pipeline.nodeClassification.create
gds.beta.pipeline.nodeClassification.predict.mutate
gds.beta.pipeline.nodeClassification.predict.mutate.estimate
gds.beta.pipeline.nodeClassification.predict.stream
gds.beta.pipeline.nodeClassification.predict.stream.estimate
gds.beta.pipeline.nodeClassification.predict.write
gds.beta.pipeline.nodeClassification.predict.write.estimate
gds.beta.pipeline.nodeClassification.train
gds.beta.pipeline.nodeClassification.train.estimate
.
- Removed non-pipeline versions of Node Classification, including procedures:
gds.alpha.ml.nodeClassification.predict.mutate
gds.alpha.ml.nodeClassification.predict.mutate.estimate
gds.alpha.ml.nodeClassification.predict.stream
gds.alpha.ml.nodeClassification.predict.stream.estimate
gds.alpha.ml.nodeClassification.predict.write
gds.alpha.ml.nodeClassification.predict.write.estimate
gds.alpha.ml.nodeClassification.train
gds.alpha.ml.nodeClassification.train.estimate
- Removed non-pipeline versions of Link Prediction, including procedures:
gds.alpha.ml.linkPrediction.predict.mutate
gds.alpha.ml.linkPrediction.predict.mutate.estimate
gds.alpha.ml.linkPrediction.predict.stream
gds.alpha.ml.linkPrediction.predict.stream.estimate
gds.alpha.ml.linkPrediction.predict.write
gds.alpha.ml.linkPrediction.predict.write.estimate
gds.alpha.ml.linkPrediction.train
gds.alpha.ml.linkPrediction.train.estimate
- Additional changes to node classification & link predictions
- Removed
batchSize
parameter for Node Classification pipeline predict modes, because it is not useful. - The procedure resolution for the
taskName
parameter ofgds.alpha.ml.pipeline.linkPrediction.addNodeProperty
andgds.alpha.ml.pipeline.nodeClassification.addNodeProperty
changed and now requires the inclusion of the tier, e.g.'scaleProperties'
must now be written as'alpha.scaleProperties'
. - Changed node classification and link prediction training pipelines management from the model catalog to the new pipeline catalog. Trained pipelines (which we refer to as models) are still managed in the model catalog.
- Replaced
gds.beta.pipeline.[nodeClassification|linkPrediction].configureParams(pipelineName::String, parameterSpace::List of Map)
bygds.beta.pipeline.[nodeClassification|linkPrediction].addLogisticRegression(pipelineName::String, config::Map
. This also removes the previous default model candidate. - Removed
useBiasFeature
parameter ingds.beta.pipeline.linkPrediction.addLogisticRegression
.
- Removed
- Graph Projection:
- gds.graph.create renamed gds.graph.project
- In
gds.graph.project
, defining the same node property for different labels with differentneoPropertyKeys
is no longer allowed. - Inputs for comparison expressions in
graph.project.subgraph
must resolve to the same type, i.e.,long
ordouble
.
- Removed support for anonymous graph syntax from algorithm execution. Only explicit, named graphs are supported.
- Memory estimation is an exception to this.
- Changed the syntax of memory estimation. The graph name or graph create config always go into the first parameter, the algorithm config always into the second.
- Dropped
Neo4j 4.2
support - Removed
USE_PRE_AGGREGATION
feature toggle.
New features
- KNN graduated to product tier:
- Added a random walk sampler for initializing KNN based on the topology of the input graph. The configuration key
initialSampler
accepts eitherUNIFORM
orRANDOM_WALK
. - Added possibility to exclude pairs of nodes in the K-Nearest Neighbor algorithm that have a similarity below a given threshold defined with an optional configuration parameter
similarityCutoff
. - Added perturbation rate to KNN, to reduce the risk of some neighbors not being explored. Configured with
perturbationRate
as a value between 0 and 1. - Improved normalization of KNN metrics to make them consistent and usable in combination
- KNN supports multiple node properties via the
nodeProperties
key - Added metrics
ranIterations
,didConverge
andnodePairsConsidered
to the result ofgds.knn.[stats|mutate|write]
. - KNN can compute similarity over multiple node properties, specified with the new
nodeProperties
parameter. - Added new similarity metrics to KNN, configured per property via the
nodeProperties
key.- Euclidean
- Overlap
- Pearson.
- Added a random walk sampler for initializing KNN based on the topology of the input graph. The configuration key
- Added similarity metric selection to Node Similarity configured with similarityMetric (supports Jaccard or Overlap)
- BFS & DFS graduated to product tier
- Added support for
mutate
mode withgds.dfs.mutate,
gds.bfs.mutate
- Added support for
estimate
mode togds.bfs.[stream|mutate]
andgds.dfs.[stream|mutate]
procedures. - Added progress logging support
- Added support for
- Added a new parallel single-source shortest path algorithm to product-tier:
gds.allShortestPaths.delta.stream
gds.allShortestPaths.delta.write.estimate
gds.allShortestPaths.delta.write
gds.allShortestPaths.delta.write.estimate
gds.allShortestPaths.delta.mutate
gds.allShortestPaths.delta.mutate.estimate
.
- Closeness Centrality graduated to beta tier, added:
gds.beta.closeness.mutate
gds.beta.closeness.stats
- Node Classification:
- Models produced with
gds.alpha.ml.pipeline.nodeClassification.train
can now be stored (persisted) usinggds.alpha.model.store
. - Added
estimate
mode togds.alpha.ml.pipeline.nodeClassification.[train|predict.stream|predict.mutate|predict.write]
procedures. - Added
modelSelectionStats
togds.alpha.ml.pipeline.nodeClassification.train
- Only save metrics for winning model inside modelInfo.
- Models produced with
- Link Prediction:
- Models produced with
gds.alpha.ml.pipeline.linkPrediction.train
can now be stored (persisted) usinggds.alpha.model.store
. - Added
estimate
mode togds.alpha.ml.pipeline.linkPrediction.train
procedure. - Added
estimate
mode togds.alpha.ml.pipeline.linkPrediction.[train|predict.stream|predict.mutate]
procedures. - Added
modelSelectionStats
togds.alpha.ml.pipeline.linkPrediction.train
- Only save metrics for winning model inside modelInfo.
- Models produced with
- Added support for Random Forest models in both Link Prediction and Node Classification pipelines with
gds.alpha.pipeline.[linkPrediction|nodeClassification].addRandomForest
- Added pipeline catalog procedures for managing training pipelines:
gds.beta.pipeline.list
gds.beta.pipeline.exists
gds.beta.pipeline.drop
.
- Added new way of projecting a graph using Cypher:
gds.alpha.graph.project
, which is an aggregation rather than a procedure. - Added surface for hints and warnings generated by executed tasks with the new
gds.alpha.userLog
logging procedure. - Support for write back from Neo4j Causal Cluster Read Replica instance (requires Enterprise GDS).
- Support for graph projections backup and restore with
gds.alpha.backup
andgds.alpha.restore
(requires Enterprise GDS)
Bug fixes
- Fixed a bug where Node2Vec would produce an AIOOBE on sufficiently large graphs.
- Fixed a bug where ForkJoin pools were not properly closed which could lead to OOMs using Pregel-based algorithms,e.g. Page Rank.
- GraphSAGE:
- Fixed a bug where
gds.beta.graphSage
would produce incorrect results for smaller graphs. - Fixed a bug where
gds.beta.graphSage
would produce incorrect results for the pool aggregator.
- Fixed a bug where
- Node Classification & Link Prediction pipelines:
- Fixed a bug where
gds.alpha.ml.pipeline.nodeClassification.train
would train a model under the wrong username and not be accessible for the actual user. - Fixed a bug where
gds.alpha.ml.pipeline.nodeClassification.train
andgds.alpha.ml.pipeline.linkPrediction.train
would skip applying a penalty to the weight of the last feature. - Fixed a bug where the trainConfig of persisted models would not be shown to the user.
- Fixed a bug where
gds.alpha.ml.pipeline.nodeClassification.train
would not scale penalty to train set size correctly.
- Fixed a bug where
- Fixed a bug in
gds.beta.graph.create.subgraph
where long values greater than2^53
were not properly handled during expression evaluation. - Triangle Count & Local Clustering Coefficient
- Fixed a bug where
gds.triangleCount
andgds.localClusteringCoefficient
might produce wrong results when using anodeLabels
filter. - Fixed a bug where graph intersection used in Triangle Count and Local Clustering Coefficient would fail on union node filtered graphs.
- Fixed a bug where
- Fixed a bug where
gds.alpha.closeness
might produce incorrect results for directed graphs. - Fixed a bug where function
gds.alpha.similarity.cosine
and proceduresgds.alpha.similarity.cosine.[stats,stream,write]
returned the absolute value of the cosine computation, instead of the cosine value itself. - Fixed a bug where cypher on gds would try to access node properties as relationship properties and vice versa.
- Fixed a bug where
gds.graph.create.cypher
would sometimes not display the root cause in case of an error. - Fixed a bug where concurrently computing degrees on a node filtered graph would produce an AIOOBE.
- Fixed a bug where the memory estimation for generated Pregel procedures was calculated incorrectly.
Improvements
- GraphSAGE:
- Improved runtime performance for
gds.beta.graphSage
when using therelationshipWeight
configuration parameter. - Improve memory usage of
gds.beta.graphSage
by computing the features per batch lazily.
- Improved runtime performance for
- Memory estimation for
gds.graph.project
returns the estimated peak memory consumption during loading instead of the estimated final graph size. - Reduced memory consumption while loading using Native or Cypher projections.
gds.alpha.ml.pipeline.[nodeClassification|linkPrediction].train
will raise an error when either of train, test, or validation sets are empty.- Added
failIfMissing
flag togds.beta.[pipeline|model].drop
. - Implemented batched prediction for LinkPrediction which improves runtime.
- Breadth First / Depth First Search:
- Parallel implementation of
gds.bfs.stream
. - Result field
path
ofgds.bfs.stream
andgds.dfs.stream
will only be computed if explicitly specified in theYIELD
clause or there is noYIELD
clause.
- Parallel implementation of
- Provide more information to users if a node is missing a particular property in KNN.
Recent Graph Data Science Releases
- Graph Data Science 2.12
- Graph Data Science 2.11
- Graph Data Science 2.10.1
- Graph Data Science 2.9.0
- Graph Data Science 2.8.0