Release Date: 11 May 2020
GDS 1.4.0 is compatible with Neo4j 4.0 and 4.1, but not Neo4j 3.5.x. For a 3.5 compatible release, please see GDS 1.1.6.
Breaking changes
- License key configuration was renamed from
licenseFile
tolicense_file
for consistency with Bloom - Removed sparsity parameter from
gds.alpha.randomProjection.*
- Renamed
gds.alpha.randomProjection
togds.fastRP
due to productization. - Renamed
embeddingSize
parameter toembeddingDimension
for fastRP, GraphSAGE and Node2Vec. - Renamed
projectedFeatureSize
toprojectedFeatureDimension
for GraphSAGE - Renamed
nodePropertyNames
has been renamed tofeatureProperties
ingds.beta.fastRPExtended
andgds.beta.graphSage.train
- Renamed
gds.alpha.randomProjection
togds.fastRP
due to productization. - Default parameters for
gds.fastRP
have changed on the following configuration parameters:iterationWeights
now has default[0.0, 1.0, 1.0]
normalizeL2
has been removed and its effect is always applied
- Removed alpha procedures for GraphSage (replaced with
beta
tier, see New Features section)gds.alpha.graphSage.stream
gds.alpha.graphSage.write
- GraphSage no longer directly calculates embeddings, instead it has been split into
train
(to generate a named model) andwrite, mutate
, andstream
to apply the model predictions to your data. - Due to the creation of a
train
mode for graph sage, the following configuration parameters were moved:embeddingSize
– moved as configuration parameter ofgds.beta.graphSage.train
aggregator
– moved as configuration parameter ofgds.beta.graphSage.train
activationFunction
– moved as configuration parameter ofgds.beta.graphSage.train
sampleSizes
– moved as configuration parameter ofgds.beta.graphSage.train
nodePropertyNames
– moved as configuration parameter ofgds.beta.graphSage.train
tolerance
– moved as configuration parameter ofgds.beta.graphSage.train
learningRate
– moved as configuration parameter ofgds.beta.graphSage.train
epochs
– moved as configuration parameter ofgds.beta.graphSage.train
maxIterations
– moved as configuration parameter ofgds.beta.graphSage.train
searchDepth
– moved as configuration parameter ofgds.beta.graphSage.train
negativeSampleWeight
– moved as configuration parameter ofgds.beta.graphSage.train
degreeAsProperty
– moved as configuration parameter ofgds.beta.graphSage.train
gds.beta.graphSage.stream
procedure now requiresmodelName
configuration parameter.gds.beta.graphSage.write
procedure requiresmodelName
configuration parameter.- Removed
startLoss
andepochLosses
from the result columns ofgds.beta.graphSage.write
. - Added the graph create config as a return field to the train procedure, affecting
gds.beta.graphSage.train
- Fixed result column name
embeddings
toembedding
in GraphSAGE, to align with the other embeddings. - Removed configuration parameter
maxCost
fromgds.alpha.bfs/dfs
. - Unlocking the Enterprise Edition of the Graph Data Science library requires a license key. The previous config setting has been removed.
- Removed
degreeDistribution
fromgds.graph.drop
return columns. gds.pageRank
now respects the concurrency setting. It will not run if there is insufficient memory for the given concurrency setting.- Alpha similarity algorithms no longer accept graph name as a parameter. The algorithm never used the named graph, and now the possibility to specify one is removed.
New features
- Promote GraphSage to
beta
tier and added support for inductive models with thetrain
mode- This adds procedures
gds.beta.graphSage.mutate
gds.beta.graphSage.mutate.estimate
gds.beta.graphSage.stream
gds.beta.graphSage.stream.estimate
gds.beta.graphSage.train
gds.beta.graphSage.train.estimate
gds.beta.graphSage.write
gds.beta.graphSage.write.estimate
- And removes alpha procedures
gds.alpha.graphSage.stream
gds.alpha.graphSage.write
- This adds procedures
- GraphSage supports relationship weights, driven by
relationshipWeightProperty
- GraphSage supports node labels via
projectedFeatureSize
- Introduced the model catalog to manage trained models, including:
gds.beta.model.exists
– a procedure to check if a model exists in the catalogGds.beta.model.list
– list all available modelsgds.beta.model.drop
– removes a model from the catalog
- The Random Projection algorithm has been promoted to the product tier and we have added:
gds.fastRP.stats
gds.fastRP.mutate
gds.fastRP.estimate
- Added procedures for
stats
andmutate
mode, as well as,estimates
for all modes.
- FastRP has been extended to support relationship weights and directions
- FastRP supports integer configuration for iteration weights.
- We’ve added support for node property features for FastRP in the beta namespace with FastRPExtended:
gds.beta.fastRPExtended.mutate
gds.beta.fastRPExtended.stream
gds.beta.fastRPExtended.stats
gds.beta.fastRPExtended.write
gds.beta.fastRPExtended.mutate.estimate
gds.beta.fastRPExtended.stream.estimate
gds.beta.fastRPExtended.stats.estimate
gds.beta.fastRPExtended.write.estimate
- We’ve added the K-Nearest Neighbors (KNN) algorithm to the beta tier
gds.beta.knn.mutate
andgds.beta.knn.mutate.estimate
gds.beta.knn.stats
andgds.beta.knn.stats.estimate
gds.beta.knn.stream
andgds.beta.knn.stream.estimate
gds.beta.knn.write
andgds.beta.knn.write.estimate
- The in memory graph can now support list properties, enabling embedding results to be stored in memory, or loading embeddings from nodes for KNN or similarity calculations.
- Pregel framework
- Added Pregel annotation processor to generate GDS procedures for custom Pregel algorithms.
- Pregel now supports long and double array node values.
- Add support for composite node state to allow complex data types on nodes.
- Reduced memory consumption.
- Improved memory estimation.
- Simplified message iteration in
compute
methods. - Split context into Init- and ComputeContext and simplified API.
- Removed
K1ColoringExample
standalone project. - Added
pregel-bootstrap
standalone project. - Added
pregel-examples
module.
- Licensing: GDS Enterprise edition now requires license keys issued by Neo4j to unlock enterprise features
- Added
density
property to the output of graph ingraph.list
. - Added a
failIfMissing
flag togds.graph.drop
Bug fixes
- Pregel:
- Fixed a bug in Pregel that could lead to incorrect results when running in parallel.
- Fix cast exception when returning array node properties in generated Pregel procedures.
- Fixed a bug in a multi-source BFS traversal strategy that could affect the following procedures:
gds.alpha.closeness
gds.alpha.closeness.harmonic
gds.alpha.allShortestPaths
- Fixed a bug in
gds.alpha.shortestPath.deltaStepping
where large relationship weights led to incorrect results - Weakly connected components:
- Fixed a bug in WCC where
componentCount
would be negative when the graph is empty. - Fixed a regression where WCC could run more slowly with increased concurrency.
- Fixed a bug in WCC where
- Fixed bugs in Louvain:
-
communityCount
is no longer negative when the graph is empty. - changes to
maxIterations
are no longer ignored.
-
- Fixed a bug in LabelPropagation where
communityCount
would be negative when the graph is empty. - Fixed a bug in KNN where it failed when run on graphs with filtered values
- Fixed bugs in
gds.graph.export:
- Previously, at most one relationship property per relationship type would be exported (now all are exported)
- Default array node properties (null) lead to an exception
- Graph loading:
- Fixed a bug where using node label projections including properties on large graphs and high concurrency could lead to loss of some properties.
- Fixed bug in graph creation which could cause an AIOOB exception during node loading.
- The
readConcurrency
config parameter can no longer be overwritten by theconcurrency
param when it is explicitly set in an implicit graph creation config
- Fixed a bug in memory estimation of large anonymous fictitious graphs.
- Fixed bug in
gds.alpha.dfs/bfs
, where the algorithm did not terminate for graphs containing loops. - Fixed result column name
embeddings
toembedding
in GraphSAGE, to align with the other embeddings. - Fixed a bug in Node2Vec where many disconnected nodes would cause a StackOverflowError
- Fixed a bug in RandomProjection each iteration weight was multiplied all previous iteration weights.
- Similarity algorithms:
- Fixed a bug where Alpha Similarity algorithms would load a graph even though it was not needed
- Fixed a bug where similarity algorithms would not remove the placeholder graph if config validation fails on invalid user input.
- Fixed a bug where community statistic computation could overflow for large community ids.
- Fixed a bug where DegreeCentrality returned incorrect values when concurrency > 1.
- Fixed a bug where ClosenessCentrality was using a slightly incorrect formula for Wasserman-Faust algorithm.
- Fixed a bug that affected
gds.triangleCount()
andgds.alpha.triangles()
where not all triangles would be counted under certain conditions. - Parallel edges in a graph no longer lead to incorrect Local Clustering Coefficient and Triangle Count results.
- The
Long.MIN_VALUE
fallback property values will now be translated toDouble.NaN
if a double value is requested. - Fixed a bug where graphs with multiple labels would sometimes fail when converting property values.
Improvements
fastRPExtended
andgraphSage
now fail if node properties areDouble.NaN
gds.fastRP
now accepts integer iterationWeights- If
graphSage.train
is run on a graph without relationships, GDS now fails gracefully with an appropriate error message - Added validation that properties used by GraphSage exist on graph
- Added validation for
embeddingSize
>=1 - Added a failIfExists flag to graph creation to enable a user to specify that if a graph already exists, it should be overwritten without failing.
- Progress logging:
- We now log progress in equally spaced percentages. This is 0-100% either in steps of 1, or in larger steps if there are fewer than 100 batches. For example, if there are 50 batches, completing one batch means 2% progress, so it would log in steps of 2.
- Decreased the logging frequency when running with a high concurrency.
- Added
postProcessingMillis
togds.localClusteringCoefficient
andgds.triangleCount
for modes:mutate
,write
,stats
- It is always zero for now, but this is a standard result column for these modes
- Parallelized computation of result statistics for the following community detection procedures:
gds.wcc.write
,gds.wcc.mutate
andgds.wcc.stats
gds.louvain.write
,gds.louvain.mutate
andgds.louvain.stats
gds.labelPropagation.write
,gds.labelPropagation.mutate
andgds.labelPropagation.stats
gds.beta.modularityOptimization.write
andgds.beta.modularityOptimization.mutate
gds.alpha.scc.write
- Add graph schema to the result columns of
gds.model.list
andgds.model.drop
- Validate property existence (e.g.
seedProperty
) when running algorithms on Cypher projections. - Elements in a Pregel composite schema may be set public/private in order to include or exclude them from generated procedure results
- Improved memory estimation for
*
node projections. - Added validation that properties used by GraphSage exist on graph
- Introduced parallel graph construction to improve performance of Louvain and Node Similarity
- In-memory graphs in multidatabase:
- When in-memory graphs are created, they are now associated with the database in use during creation time to prevent errors when running in a multi-database environment.
gds.graph.info()
returns the database name the graph has been created on.- Named graphs can only be used on the database they have been created on.
Recent Graph Data Science Releases
- Graph Data Science 2.12
- Graph Data Science 2.11
- Graph Data Science 2.10.1
- Graph Data Science 2.9.0
- Graph Data Science 2.8.0