Release Date: 10 October 2022
GDS 2.2.0 is compatible with Neo 4.3 versions ≥ 4.3.15 and 4.4 ≥ 4.4.9.
For GDS compatibility with previous releases of 4.3 and 4.4, please use please see GDS 2.1.6. The 2.1 series is also incompatible with Neo4j 3.5.x, 4.0, 4.1, and 4.2. For a 3.5 compatible release, please see GDS 1.1.7. For a 4.0 compatible release, please see GDS 1.6.5. For a 4.1 or 4.2 compatible release, please see GDS 1.8.8
Breaking changes
- Link Prediction filtering:
- Change graph filtering in
gds.beta.pipeline.linkPrediction.train
- Replace parameter
nodeLabels
withsourceNodeLabel
andtargetNodeLabel
. - Replace parameter
relationshipTypes
withtargetRelationshipType
.
- Replace parameter
- Change graph filtering in
gds.beta.pipeline.linkPrediction.predict
- Replace parameter
nodeLabels
with optionalsourceNodeLabels
andtargetNodeLabels
. By default, they will be derived from the model's train configuration. - Change the default value for
relationshipTypes
with thetargetRelationshipType
from the model's train configuration.
- Replace parameter
- Change graph filtering in
- Node Classification & Regression filtering:
- Change graph filtering in
gds.beta.pipeline.nodeClassification.train
andgds.beta.pipeline.nodeRegression.train
- Replace parameter
nodeLabels
withtargetNodeLabels
- Replace parameter
- Change graph filtering in
gds.beta.pipeline.nodeClassification.predict
andgds.beta.pipeline.nodeRegression.predict
- Replace parameter
nodeLabels
withtargetNodeLabels
By default, they will be derived from the model's train configuration.
- Replace parameter
- Change graph filtering in
- Promoting Collapse Path to beta tier
- Changed the procedure name to
gds.beta.collapsePath.mutate
- Use parameter
pathTemplates
to now specify multiplepath templates.
- Changed the procedure name to
- Promoting CELF to
beta
tier- Moved
gds.alpha.influenceMaximization.celf.stream
togds.beta.influenceMaximization.celf.stream
- Moved
- For graphs created, with
gds.graph.project.cypher
, reduce output ofgds.graph.list
to only print the names ofparameters
. This will avoid printing the parameter values, which potentially leads to long procedure execution times. - RandomWalk algorithm promoted to product tier
gds.beta.randomWalk.stats
=>gds.randomWalk.stats
gds.beta.randomWalk.stats.estimate
=>gds.randomWalk.stats.estimate
gds.beta.randomWalk.stream
=>gds.randomWalk.stream
gds.beta.randomWalk.stream.estimate
=>gds.randomWalk.stream.estimate
- Removed
debug_log
config field from Arrow Create Database action. - Node2Vec uses new embedding initializer
NORMALIZED
as default. - Dropped support for older patches:
- for 4.3, only 4.3.15 and later is supported
- for 4.4, only 4.4.9 and later is supported
New features
- Link Prediction filtering:
- Supports heterogeneous LinkPrediction pipelines by allowing configuring which node labels and relationship type to train and predict for.
- See Breaking changes above for more details.
- K-means:
- Added centroids and average node-centroid distance to result for Mutate, Stats, and Write modes.
- Added distance to centroid per node result in Stream mode.
- Introduced a parameter
numberOfRestarts
that runs K-Means multiple times and picks the one with the minimum node-centroid distance. - Introduced a parameter
computeSilhouette
that if enabled will compute silhouette related metrics. - Introduced a parameter
initialSampler
which can select different sampling strategies for picking the first centroids.- Added the
K-means++
initialization algorithm which can be enabled by settinginitialSampler=kmeans++
.
- Added the
- Introduced the parameter
seedCentroids
which seeds input centroids to k-means (in negation of the above).
- Introduced a new scaler
Center
forScaleProperties
that subtracts the mean from each value. - Expose
penaltyL2
to configure the l2 regularization term to the loss function ingds.beta.graphSage.train
. - Add Multilayer Perceptron as a training method for node classification (
gds.alpha.pipeline.nodeClassification.addMLP
) and link prediction (gds.alpha.pipeline.linkPrediction.addMLP
). - Add
SAME_CATEGORY
feature type togds.beta.pipeline.linkPrediction.addFeature
. - Added new procedure
gds.beta.graph.relationships.stream
that streams relationship topology. - Added arrow export endpoint
gds.beta.graph.relationships.stream
that streams relationship topology. - Added new procedure
gds.alpha.graph.sample.rwr
that creates a new graph projection by sampling using random walk with restarts. - Added the ability to collapse multiple paths using
gds.beta.collapsePath.mutate
. - Promoting CELF algorithm to
beta
tier.- Added
gds.beta.influenceMaximization.celf.stats
- Added
gds.beta.influenceMaximization.celf.mutate
- Added
gds.beta.influenceMaximization.celf.write
- Added progress tracking capabilities.
- Added memory estimation.
- Added
- Progress tracking for KMeans algorithm.
- Memory estimation for KMeans.
- added
gds.alpha.kmeans.mutate.estimate
- added
gds.alpha.kmeans.stats.estimate
- added
gds.alpha.kmeans.stream.estimate
- added
gds.alpha.kmeans.write.estimate
- added
- Added procedure to compute modularity for pre-computed communities.
gds.alpha.modularity.stats
gds.alpha.modularity.stream
- Added new config options to the GDS Flight server.
gds.arrow.encryption.never
deactivates the server encryption even if it would otherwise be enabled.gds.arrow.advertised_listen_address
sets the server location that clients should connect to.
- Added support for importing
String
node identifiers for the ArrowCREATE_DATABASE
action. - Added capability to run BetweennessCentrality using relationship weights.
- Added
relationshipWeightProperty
optional configuration parameter.
- Added
- Added
stats
mode procedures for RandomWalk.gds.beta.randomWalk.stats
gds.beta.randomWalk.stats.estimate
- Introduced the ability to configure defaults and limits for configuration parameters.
gds.alpha.config.defaults.list
gds.alpha.config.defaults.set
gds.alpha.config.limits.list
gds.alpha.config.limits.set
- Introduce new configuration parameters
contextNodeLabels
andcontextRelationshipTypes
in nodePropertySteps.gds.beta.pipeline.linkPrediction.addNodeProperty
gds.beta.pipeline.nodeClassification.addNodeProperty
gds.alpha.pipeline.nodeRegression.addNodeProperty
- The context is used to enlarge the input graph to the node property steps when running
gds.beta.pipeline.linkPrediction.addNodeProperty.[train|predict]
,gds.beta.pipeline.nodeClassification.[train|predict]
andgds.alpha.pipeline.nodeRegression.[train|predict]
.
Leiden
- Add capability to mutate
intermediateCommunities
whenincludeIntermediateCommunities
is set totrue
. - Add capability to write
intermediateCommunities
whenincludeIntermediateCommunities
is set totrue
.
- Add capability to mutate
- Node2Vec adds new embedding initializer
NORMALIZED
configured with the parameterembeddingInitializer
.
Bug fixes
- Fixed a bug where eager checking for business rules around GDS on a Neo4j cluster would cause the cluster to fail to start.
- Fixed a bug where Neo4j users with
admin
role could not see all graphs in the catalog on GDS enterprise. - Fixed a bug in random graph generation where the resulting graph can end up with an incorrect relationship schema.
- Fixed a bug where a schema filter would not create a deep copy of the property schema map.
- Fixed a bug where modularity could have been incorrectly updated in ModularityOptimization. This may affect the number of performed iterations for ModularityOptimization or number of levels for Louvain.
- Fixed a bug where restoring from csv could not read values wrapped in quotes.
- Fixed a bug where KNN did not use the expected search space. This will improve the result but also increase the runtime.
- Fixed a bug in ML autotuning where
maxTrials
included model evaluations with concrete configs. - Fixed a bug where
gds.triangleCount
andgds.localClusteringCoefficient
were allowed to run on directed graphs. - Fixed a bug in
gds.graph.export
and Arrow DB import where thewriteConcurrency
was not respected. - Fixed a bug with Node Operations where
gds.graph.nodeProperties.write
,gds.graph.nodeProperties.drop
andgds.graph.nodeProperties/y.stream
would not acceptString
input for parametersnodeLabels
and/ornodeProperties
. - Fixed a bug, where Node2Vec would report negative losses.
- Fixed a bug with
gds.graph.nodeProperties/y.stream
, where the wrong nodes where returned when specifying anodeLabels
filter and using Arrow. - Fixed a bug in the Louvain algorithm, where aggregating dense communities could potentially lead to an exception.
- Fixed a bug where model loading is attempted even for unlicensed user, which might fail database startup.
Improvements
- Better error handling in K-means
- Improve memory estimation for
gds.beta.pipeline.linkPrediction.train
when the nodePropertySteps used a weighted graph. - Improve runtime of feature generation in
gds.beta.linkPrediction.[train|predict]
. - Improve performance of
gds.graph.project.cypher
by using the subscriber API. - Improve convergence criteria for
LogisticRegression
andLinearRegression
trainers, by making it independent of the number of batches. This affectsgds.alpha.pipeline.nodeRegression.train
,gds.beta.pipeline.[linkPrediction|nodeClassification].train
. - Improve error handling on invalid user input.
- Cypher on GDS projections is now capable of setting labels on nodes.
- Promoting CELF algorithm to
beta
tier.- Improved performance.
- A new column
serverLocation
ingds.debug.arrow()
that shows the actual location where the server is running, which might be different from the configured location. - Improve runtime of KNN by reusing similarity computations. This also affects
gds.beta.pipeline.linkPrediction.predict
when using the approximate search strategy. - Configuration keys for Node-/Relationship- and Property-Projections are now case-insensitive.
- The
gds.debug.sysInfo
procedure now shows the license expiration date when run with a valid GDS license. - Role-based access control not only for licensed, commercial users, but for everyone.
- The arrow create database endpoint is now capable of creating properties with an improved range of property types: string, string[], datetime, local datetime
- Improve error message thrown when calling
gds.beta.[nodeClassification|linkPrediction].[train|predict]
for too small graphs. - Added a new, optional method
close
toPregelComputation
, allowing implementers to close any opened resources, such as ThreadLocals. - Added new feature toggle procedures for enabling / disabling Arrow database import (default: enabled)
gds.features.enableArrowDatabaseImport
gds.features.enableArrowDatabaseImport.reset
- Runtime improvements for
RandomWalk
especially for the case of first order random walks.
Other changes
- Renamed and deprecated some graph management procedures:
- Renamed
gds.alpha.graph.removeGraphProperty
togds.alpha.graph.graphProperty.drop
. - Renamed
gds.alpha.graph.streamGraphProperty
togds.alpha.graph.graphProperty.stream
. - Deprecated
gds.graph.removeNodeProperties
by new proceduregds.graph.nodeProperties.drop
. - Deprecated
gds.graph.streamNodeProperties
by new proceduregds.graph.nodeProperties.stream
. - Deprecated
gds.graph.streamNodeProperty
by new proceduregds.graph.nodeProperty.stream
. - Deprecated
gds.graph.streamRelationshipProperties
by new proceduregds.graph.relationshipProperties.stream
. - Deprecated
gds.graph.streamRelationshipProperty
by new proceduregds.graph.relationshipProperty.stream
. - Deprecated
gds.graph.writeNodeProperties
by new proceduregds.graph.nodeProperties.write
. - Deprecated
gds.graph.writeRelationship
by new proceduregds.graph.relationship.write
. - Deprecated
gds.graph.deleteRelationships
by new proceduregds.graph.relationships.drop
.
- Renamed
- CSV format changes for backup/restore
- Export no longer writes databaseId field
- Import still can read databaseId field but doesn't use it
- Deprecated
enableDebugLog
config option forgds.graph.export
.
Recent Graph Data Science Releases
- Graph Data Science 2.13
- Graph Data Science 2.12
- Graph Data Science 2.11
- Graph Data Science 2.10.1
- Graph Data Science 2.9.0