Release Date: 2 June 2022
GDS 2.1.0-preview is compatible with Neo4j 4.3 and 4.4 but not Neo4j 3.5.x, 4.0, 4.1, or 4.2. For a 3.5 compatible release, please see GDS 1.1.7. For a 4.0 compatible release, please see GDS 1.6.5. For a 4.1 or 4.2 compatible release, please see GDS 1.8.7
Breaking Changes
- Removed the redundant information of parameter space and split config from the info of the models trained by
gds.beta.pipeline.[nodeClassification|linkPrediction].train
. The information is now accessible only via the Pipeline Catalog. - Removed the label parameter from
gds.graph.removeNodeProperties
. - Supported config parameters are
timeoutInSeconds
andconcurrency
New Features
- (Enterprise Only) Apache Arrow and Flight RPC can now be used to improve certain import and export tasks:
- Project a new in-memory graph or Neo4j database via Arrow Flight RPC, for example by using
gds.alpha.graph.construct
from the GDS Python client - Export node, relationship, and graph properties directly via Arrow Flight RPC, for example by using the existing
stream*Properties
functionality from the GDS Python client - Flight RPC is secured with the same authorization and encryption that the Neo4j database is using
- Project a new in-memory graph or Neo4j database via Arrow Flight RPC, for example by using
- New Algorithm: K-Means Clustering. Added the following procedures:
gds.alpha.kmeans.mutate
gds.alpha.kmeans.stats
gds.alpha.kmeans.stream
- New Algorithm: Leiden. Added the following procedures:
gds.alpha.leiden.mutate
gds.alpha.leiden.stats
Gds.alpha.leiden.stream
- Added new similarity variant Filtered Node Similarity to alpha tier, accepting source and target node filters
gds.alpha.nodeSimilarity.filtered.mutate
gds.alpha.nodeSimilarity.filtered.stream
gds.alpha.nodeSimilarity.filtered.write
- Added new similarity variant Filtered KNN to alpha tier, accepting source and target node filters
gds.alpha.knn.filtered.mutate
gds.alpha.knn.filtered.stream
- Added new procedures for delta stepping:
gds.allShortestPaths.delta.stats
gds.allShortestPaths.delta.stats.estimate
- Added new procedures for BFS:
Gds.bfs.stats
gds.bfs.stats.estimate
- Added Node Regression Pipelines with the following procedures
gds.alpha.pipeline.nodeRegression.create
gds.alpha.pipeline.nodeRegression.configureAutoTuning
gds.alpha.pipeline.nodeRegression.configureSplit
gds.alpha.pipeline.nodeRegression.addLinearRegression
gds.alpha.pipeline.nodeRegression.addRandomForest
gds.alpha.pipeline.nodeRegression.addNodeProperty
gds.alpha.pipeline.nodeRegression.selectFeatures
gds.alpha.pipeline.nodeRegression.train
gds.alpha.pipeline.nodeRegression.predict.stream
gds.alpha.pipeline.nodeRegression.predict.mutate
- Autotuning Support for Machine Learning Pipelines:
- Added new procedures
gds.alpha.pipeline.[nodeClassification|nodeRegression|linkPrediction].configureAutoTuning
. - Added syntax to specify ranges for parameters in
gds.alpha.pipeline.[linkPrediction|nodeClassification|nodeRegression].addRandomForest
,gds.beta.pipeline.[linkPrediction|nodeClassification].addLogisticRegression
, andgds.alpha.nodeRegression.addLinearRegression
- Added new procedures
- Additional Machine Learning Pipeline Functionality:
- Exposed
learningRate
for theLogisticRegression
models, which can be added usinggds.beta.pipeline.[nodeClassification|linkPrediction].addLogisticRegression
- Exposed
minLeafSize
forRandomForest
models, which can be added usinggds.alpha.pipeline.[nodeClassification|linkPrediction].addRandomForest
- Exposed
criterion
forRandomForestClassification
models, which can be added usinggds.alpha.pipeline.[nodeClassification|linkPrediction].addRandomForest
. Also added support for theENTROPY
impurity criterion. - Updated structure of
modelSelectionStats
yield ingds.beta.pipeline.[linkPrediction, nodeClassification].train
. - Support
OUT_OF_BAG_ERROR
metric ingds.beta.pipeline.[linkPrediction, nodeClassification].train
which applies only to RandomForest models. - Expose
batchesPerIteration
ingds.beta.graphSage.train
to configure the number of batches considered per iteration.
- Exposed
- Cypher Aggregation now accepts any INTEGER value for source and target nodes
- Added
ShardedIdMap
which adds support for external node ids ranging from0
toLong.MAX_VALUE
.- The id map is disabled by default and can be enabled via feature toggle
USE_SHARDED_ID_MAP
.
- The id map is disabled by default and can be enabled via feature toggle
- Added procedures for exporting graph properties to the alpha tier
gds.alpha.graph.streamGraphProperty
gds.alpha.graph.removeGraphProperty
- Exposed a new string config parameter
jobId
for graph projection and algorithm procedures, which allows for easier tracking of a job via e.g.gds.beta.listProgress
.
Bug fixes
- Fixed a bug in
gds.beta.pipeline.[nodeClassification|linkPrediction].addNodeProperty
wheregds.beta.graphSage.mutate
could not be added. - Fixed a bug where the procedures
gds.beta.pipeline.linkPrediction.predict.[mutate|stream]
threw an error when given the argumentinitialSampler
. - Fixed a bug with running Triangle Count on filtered graphs that could cause an ArrayIndexOutOfBounds Error.
- Fixed a bug where
graphSage.train
incorrectly reporteddidConverge
as false. - Fixed a bug in CollapsePath where a provided
nodeFilter
would be ignored. - Fixed a bug in
gds.louvain.stream
when theconsecutiveIds
parameter was enabled. - Fixed a bug in RandomWalk where not consuming all stream results could lead to a state where GDS would become unable to run further procedures
Improvements
- When a query is failed by the memory guard, information is logged as well as sent to the user in the raised exception.
- Machine learning pipelines
gds.beta.pipeline.[nodeClassification|linkPrediction].train.estimate
now incorporates memory usage of random forest training into account when applicable.gds.beta.pipeline.[nodeClassification|linkPrediction].predict.[mutate,stream,write].estimate
now take random forest prediction memory overhead- Improve early validation of graph and prediction pipeline in
gds.beta.pipeline.[nodeClassification|linkPrediction].predict
. - Improve memory estimation for
gds.beta.pipeline.[nodeClassification|linkPrediction].train.estimate
. - Improve memory estimation in
gds.beta.pipeline.linkPrediction.train.estimate
. - Add training method specific debug level logging during the model selection phase of
gds.beta.pipeline.linkPrediction.train
,gds.beta.pipeline.nodeClassification.train
andgds.alpha.pipeline.nodeRegression.train
. - Improved logging in Link Prediction and Node Classification training.
- Reduced computational complexity and constant overhead of random forest training, added via
gds.alpha.pipeline[linkPrediction|nodeClassification].addRandomForest
. It now runs up to 80% faster. - Improve runtime of
gds.beta.pipeline.[nodeClassification|linkPrediction].train
if the model candidate is of typeLogisticRegression
. Training may be up to 40% faster.
- GraphSAGE:
- Improved progress logging for GraphSage
- Improve the modelInfo of models created by
gds.beta.graphSage.train
to include the loss per iteration and ranIterations per epoch. - Use the average loss per node in
gds.beta.graphSage.train
. This removes the implicit dependency between thetolerance
andbatchSize
parameter. - We now validate for embedding generation using
gds.beta.graphSage.[stream|write|mutate]
to ensure that eithe both the input & model graph include relationshipWeightProperty, or neither include relationshipWeightProperties. Before, if the model was trained on an unweighted graph, the relationship-weight on the input graph was silently ignored (or vice versa) - Change the gradient computation in
gds.beta.graphSage.train
. Instead of averaging the gradient over all batches we use thebatchSamplingRatio
for setting the number of batches to consider. By default, this significantly improves the runtime by up to 90%
- Expose training details by returning and logging
lossPerIteration
ingds.beta.node2vec
. - Graph Projections:
- Add support for query parameters for
gds.beta.graph.project.subgraph
by passing aparameters
cypher map as part of the procedure configuration. - Improved error message for
gds.beta.graph.project.subgraph
when comparing expressions with incompatible types and one of them is a literal expression. - improved memory usage while projecting a graph that has multiple relationship properties for the same relationship type
- It is now possible to specify
relationshipTypes: []
, in order to project a graph with no relationships.
- Add support for query parameters for
- Graph Export:
- Changed
gds.graph.export
to export internal node identifiers instead of original ids. This avoids fragmentation of the newly created store. - Add progress tracking for
gds.graph.export
.
- Changed
- Add
concurrency
configuration parameter togds.alpha.backup
andgds.alpha.restore
. - Added query support for mutated properties for Cypher on GDS.
Recent Graph Data Science Releases
- Graph Data Science 2.12
- Graph Data Science 2.11
- Graph Data Science 2.10.1
- Graph Data Science 2.9.0
- Graph Data Science 2.8.0