Release Date: 2 June 2022
GDS 2.1.0-preview is compatible with Neo4j 4.3 and 4.4 but not Neo4j 3.5.x, 4.0, 4.1, or 4.2. For a 3.5 compatible release, please see GDS 1.1.7. For a 4.0 compatible release, please see GDS 1.6.5. For a 4.1 or 4.2 compatible release, please see GDS 1.8.7
Breaking Changes
- Removed the redundant information of parameter space and split config from the info of the models trained by
gds.beta.pipeline.[nodeClassification|linkPrediction].train. The information is now accessible only via the Pipeline Catalog. - Removed the label parameter from
gds.graph.removeNodeProperties. - Supported config parameters are
timeoutInSecondsandconcurrency
New Features
- (Enterprise Only) Apache Arrow and Flight RPC can now be used to improve certain import and export tasks:
- Project a new in-memory graph or Neo4j database via Arrow Flight RPC, for example by using
gds.alpha.graph.constructfrom the GDS Python client - Export node, relationship, and graph properties directly via Arrow Flight RPC, for example by using the existing
stream*Propertiesfunctionality from the GDS Python client - Flight RPC is secured with the same authorization and encryption that the Neo4j database is using
- Project a new in-memory graph or Neo4j database via Arrow Flight RPC, for example by using
- New Algorithm: K-Means Clustering. Added the following procedures:
gds.alpha.kmeans.mutategds.alpha.kmeans.statsgds.alpha.kmeans.stream
- New Algorithm: Leiden. Added the following procedures:
gds.alpha.leiden.mutategds.alpha.leiden.statsGds.alpha.leiden.stream
- Added new similarity variant Filtered Node Similarity to alpha tier, accepting source and target node filters
gds.alpha.nodeSimilarity.filtered.mutategds.alpha.nodeSimilarity.filtered.streamgds.alpha.nodeSimilarity.filtered.write
- Added new similarity variant Filtered KNN to alpha tier, accepting source and target node filters
gds.alpha.knn.filtered.mutategds.alpha.knn.filtered.stream
- Added new procedures for delta stepping:
gds.allShortestPaths.delta.statsgds.allShortestPaths.delta.stats.estimate
- Added new procedures for BFS:
Gds.bfs.statsgds.bfs.stats.estimate
- Added Node Regression Pipelines with the following procedures
gds.alpha.pipeline.nodeRegression.creategds.alpha.pipeline.nodeRegression.configureAutoTuninggds.alpha.pipeline.nodeRegression.configureSplitgds.alpha.pipeline.nodeRegression.addLinearRegressiongds.alpha.pipeline.nodeRegression.addRandomForestgds.alpha.pipeline.nodeRegression.addNodePropertygds.alpha.pipeline.nodeRegression.selectFeaturesgds.alpha.pipeline.nodeRegression.traingds.alpha.pipeline.nodeRegression.predict.streamgds.alpha.pipeline.nodeRegression.predict.mutate
- Autotuning Support for Machine Learning Pipelines:
- Added new procedures
gds.alpha.pipeline.[nodeClassification|nodeRegression|linkPrediction].configureAutoTuning. - Added syntax to specify ranges for parameters in
gds.alpha.pipeline.[linkPrediction|nodeClassification|nodeRegression].addRandomForest,gds.beta.pipeline.[linkPrediction|nodeClassification].addLogisticRegression, andgds.alpha.nodeRegression.addLinearRegression
- Added new procedures
- Additional Machine Learning Pipeline Functionality:
- Exposed
learningRatefor theLogisticRegressionmodels, which can be added usinggds.beta.pipeline.[nodeClassification|linkPrediction].addLogisticRegression - Exposed
minLeafSizeforRandomForestmodels, which can be added usinggds.alpha.pipeline.[nodeClassification|linkPrediction].addRandomForest - Exposed
criterionforRandomForestClassificationmodels, which can be added usinggds.alpha.pipeline.[nodeClassification|linkPrediction].addRandomForest. Also added support for theENTROPYimpurity criterion. - Updated structure of
modelSelectionStatsyield ingds.beta.pipeline.[linkPrediction, nodeClassification].train. - Support
OUT_OF_BAG_ERRORmetric ingds.beta.pipeline.[linkPrediction, nodeClassification].trainwhich applies only to RandomForest models. - Expose
batchesPerIterationingds.beta.graphSage.trainto configure the number of batches considered per iteration.
- Exposed
- Cypher Aggregation now accepts any INTEGER value for source and target nodes
- Added
ShardedIdMapwhich adds support for external node ids ranging from0toLong.MAX_VALUE.- The id map is disabled by default and can be enabled via feature toggle
USE_SHARDED_ID_MAP.
- The id map is disabled by default and can be enabled via feature toggle
- Added procedures for exporting graph properties to the alpha tier
gds.alpha.graph.streamGraphPropertygds.alpha.graph.removeGraphProperty
- Exposed a new string config parameter
jobIdfor graph projection and algorithm procedures, which allows for easier tracking of a job via e.g.gds.beta.listProgress.
Bug fixes
- Fixed a bug in
gds.beta.pipeline.[nodeClassification|linkPrediction].addNodePropertywheregds.beta.graphSage.mutatecould not be added. - Fixed a bug where the procedures
gds.beta.pipeline.linkPrediction.predict.[mutate|stream]threw an error when given the argumentinitialSampler. - Fixed a bug with running Triangle Count on filtered graphs that could cause an ArrayIndexOutOfBounds Error.
- Fixed a bug where
graphSage.trainincorrectly reporteddidConvergeas false. - Fixed a bug in CollapsePath where a provided
nodeFilterwould be ignored. - Fixed a bug in
gds.louvain.streamwhen theconsecutiveIdsparameter was enabled. - Fixed a bug in RandomWalk where not consuming all stream results could lead to a state where GDS would become unable to run further procedures
Improvements
- When a query is failed by the memory guard, information is logged as well as sent to the user in the raised exception.
- Machine learning pipelines
gds.beta.pipeline.[nodeClassification|linkPrediction].train.estimatenow incorporates memory usage of random forest training into account when applicable.gds.beta.pipeline.[nodeClassification|linkPrediction].predict.[mutate,stream,write].estimatenow take random forest prediction memory overhead- Improve early validation of graph and prediction pipeline in
gds.beta.pipeline.[nodeClassification|linkPrediction].predict. - Improve memory estimation for
gds.beta.pipeline.[nodeClassification|linkPrediction].train.estimate. - Improve memory estimation in
gds.beta.pipeline.linkPrediction.train.estimate. - Add training method specific debug level logging during the model selection phase of
gds.beta.pipeline.linkPrediction.train,gds.beta.pipeline.nodeClassification.trainandgds.alpha.pipeline.nodeRegression.train. - Improved logging in Link Prediction and Node Classification training.
- Reduced computational complexity and constant overhead of random forest training, added via
gds.alpha.pipeline[linkPrediction|nodeClassification].addRandomForest. It now runs up to 80% faster. - Improve runtime of
gds.beta.pipeline.[nodeClassification|linkPrediction].trainif the model candidate is of typeLogisticRegression. Training may be up to 40% faster.
- GraphSAGE:
- Improved progress logging for GraphSage
- Improve the modelInfo of models created by
gds.beta.graphSage.trainto include the loss per iteration and ranIterations per epoch. - Use the average loss per node in
gds.beta.graphSage.train. This removes the implicit dependency between thetoleranceandbatchSizeparameter. - We now validate for embedding generation using
gds.beta.graphSage.[stream|write|mutate]to ensure that eithe both the input & model graph include relationshipWeightProperty, or neither include relationshipWeightProperties. Before, if the model was trained on an unweighted graph, the relationship-weight on the input graph was silently ignored (or vice versa) - Change the gradient computation in
gds.beta.graphSage.train. Instead of averaging the gradient over all batches we use thebatchSamplingRatiofor setting the number of batches to consider. By default, this significantly improves the runtime by up to 90%
- Expose training details by returning and logging
lossPerIterationingds.beta.node2vec. - Graph Projections:
- Add support for query parameters for
gds.beta.graph.project.subgraphby passing aparameterscypher map as part of the procedure configuration. - Improved error message for
gds.beta.graph.project.subgraphwhen comparing expressions with incompatible types and one of them is a literal expression. - improved memory usage while projecting a graph that has multiple relationship properties for the same relationship type
- It is now possible to specify
relationshipTypes: [], in order to project a graph with no relationships.
- Add support for query parameters for
- Graph Export:
- Changed
gds.graph.exportto export internal node identifiers instead of original ids. This avoids fragmentation of the newly created store. - Add progress tracking for
gds.graph.export.
- Changed
- Add
concurrencyconfiguration parameter togds.alpha.backupandgds.alpha.restore. - Added query support for mutated properties for Cypher on GDS.
Recent Graph Data Science Releases
- Graph Data Science 2.23
- Graph Data Science 2.22
- Graph Data Science 2.21
- Graph Data Science 2.20
- Graph Data Science 2.19