Release Date: 11 May 2020

GDS 1.4.0 is compatible with Neo4j 4.0 and 4.1, but not Neo4j 3.5.x. For a 3.5 compatible release, please see GDS 1.1.6.

Breaking changes

  • License key configuration was renamed from licenseFile to license_file for consistency with Bloom
  • Removed sparsity parameter from gds.alpha.randomProjection.*
  • Renamed gds.alpha.randomProjection to gds.fastRP due to productization.
  • Renamed embeddingSize parameter to embeddingDimension for fastRP, GraphSAGE and Node2Vec.
  • Renamed projectedFeatureSize to projectedFeatureDimension for GraphSAGE
  • Renamed nodePropertyNames has been renamed to featureProperties in gds.beta.fastRPExtended and gds.beta.graphSage.train
  • Renamed gds.alpha.randomProjection to gds.fastRP due to productization.
  • Default parameters for gds.fastRP have changed on the following configuration parameters:
    • iterationWeights now has default [0.0, 1.0, 1.0]
    • normalizeL2 has been removed and its effect is always applied
  • Removed alpha procedures for GraphSage (replaced with beta tier, see New Features section)
    • gds.alpha.graphSage.stream
    • gds.alpha.graphSage.write
  • GraphSage no longer directly calculates embeddings, instead it has been split into train (to generate a named model) and write, mutate, and stream to apply the model predictions to your data.
  • Due to the creation of a train mode for graph sage, the following configuration parameters were moved:
    • embeddingSize – moved as configuration parameter of gds.beta.graphSage.train
    • aggregator – moved as configuration parameter of gds.beta.graphSage.train
    • activationFunction – moved as configuration parameter of gds.beta.graphSage.train
    • sampleSizes – moved as configuration parameter of gds.beta.graphSage.train
    • nodePropertyNames – moved as configuration parameter of gds.beta.graphSage.train
    • tolerance – moved as configuration parameter of gds.beta.graphSage.train
    • learningRate – moved as configuration parameter of gds.beta.graphSage.train
    • epochs – moved as configuration parameter of gds.beta.graphSage.train
    • maxIterations – moved as configuration parameter of gds.beta.graphSage.train
    • searchDepth – moved as configuration parameter of gds.beta.graphSage.train
    • negativeSampleWeight – moved as configuration parameter of gds.beta.graphSage.train
    • degreeAsProperty – moved as configuration parameter of gds.beta.graphSage.train
  • gds.beta.graphSage.stream procedure now requires modelName configuration parameter.
  • gds.beta.graphSage.write procedure requires modelName configuration parameter.
  • Removed startLoss and epochLosses from the result columns of gds.beta.graphSage.write.
  • Added the graph create config as a return field to the train procedure, affecting gds.beta.graphSage.train
  • Fixed result column name embeddings to embedding in GraphSAGE, to align with the other embeddings.
  • Removed configuration parameter maxCost from gds.alpha.bfs/dfs.
  • Unlocking the Enterprise Edition of the Graph Data Science library requires a license key. The previous config setting has been removed.
  • Removed degreeDistribution from gds.graph.drop return columns.
  • gds.pageRank now respects the concurrency setting. It will not run if there is insufficient memory for the given concurrency setting.
  • Alpha similarity algorithms no longer accept graph name as a parameter. The algorithm never used the named graph, and now the possibility to specify one is removed.

New features

  • Promote GraphSage to beta tier and added support for inductive models with the train mode
    • This adds procedures
      • gds.beta.graphSage.mutate
      • gds.beta.graphSage.mutate.estimate
      • gds.beta.graphSage.stream
      • gds.beta.graphSage.stream.estimate
      • gds.beta.graphSage.train
      • gds.beta.graphSage.train.estimate
      • gds.beta.graphSage.write
      • gds.beta.graphSage.write.estimate
    • And removes alpha procedures
      • gds.alpha.graphSage.stream
      • gds.alpha.graphSage.write
  • GraphSage supports relationship weights, driven by relationshipWeightProperty
  • GraphSage supports node labels via projectedFeatureSize
  • Introduced the model catalog to manage trained models, including:
    • gds.beta.model.exists – a procedure to check if a model exists in the catalog
    • Gds.beta.model.list– list all available models
    • gds.beta.model.drop – removes a model from the catalog
  • The Random Projection algorithm has been promoted to the product tier and we have added:
    • gds.fastRP.stats
    • gds.fastRP.mutate
    • gds.fastRP.estimate
    • Added procedures for stats and mutate mode, as well as, estimates for all modes.
  • FastRP has been extended to support relationship weights and directions
  • FastRP supports integer configuration for iteration weights.
  • We’ve added support for node property features for FastRP in the beta namespace with FastRPExtended:
    • gds.beta.fastRPExtended.mutate
    • gds.beta.fastRPExtended.stream
    • gds.beta.fastRPExtended.stats
    • gds.beta.fastRPExtended.write
    • gds.beta.fastRPExtended.mutate.estimate
    • gds.beta.fastRPExtended.stream.estimate
    • gds.beta.fastRPExtended.stats.estimate
    • gds.beta.fastRPExtended.write.estimate
  • We’ve added the K-Nearest Neighbors (KNN) algorithm to the beta tier
  • gds.beta.knn.mutate and gds.beta.knn.mutate.estimate
  • gds.beta.knn.stats and gds.beta.knn.stats.estimate
  • gds.beta.knn.stream and gds.beta.knn.stream.estimate
  • gds.beta.knn.write and gds.beta.knn.write.estimate
  • The in memory graph can now support list properties, enabling embedding results to be stored in memory, or loading embeddings from nodes for KNN or similarity calculations.
  • Pregel framework
    • Added Pregel annotation processor to generate GDS procedures for custom Pregel algorithms.
    • Pregel now supports long and double array node values.
    • Add support for composite node state to allow complex data types on nodes.
    • Reduced memory consumption.
    • Improved memory estimation.
    • Simplified message iteration in compute methods.
    • Split context into Init- and ComputeContext and simplified API.
    • Removed K1ColoringExample standalone project.
    • Added pregel-bootstrap standalone project.
    • Added pregel-examples module.
  • Licensing: GDS Enterprise edition now requires license keys issued by Neo4j to unlock enterprise features
  • Added density property to the output of graph in graph.list.
  • Added a failIfMissing flag to gds.graph.drop

Bug fixes

  • Pregel:
    • Fixed a bug in Pregel that could lead to incorrect results when running in parallel.
    • Fix cast exception when returning array node properties in generated Pregel procedures.
  • Fixed a bug in a multi-source BFS traversal strategy that could affect the following procedures:
    • gds.alpha.closeness
    • gds.alpha.closeness.harmonic
    • gds.alpha.allShortestPaths
  • Fixed a bug in gds.alpha.shortestPath.deltaStepping where large relationship weights led to incorrect results
  • Weakly connected components:
    • Fixed a bug in WCC where componentCount would be negative when the graph is empty.
    • Fixed a regression where WCC could run more slowly with increased concurrency.
  • Fixed bugs in Louvain:
    • communityCount is no longer negative when the graph is empty.
    • changes to maxIterations are no longer ignored.
  • Fixed a bug in LabelPropagation where communityCount would be negative when the graph is empty.
  • Fixed a bug in KNN where it failed when run on graphs with filtered values
  • Fixed bugs in gds.graph.export:
    • Previously, at most one relationship property per relationship type would be exported (now all are exported)
    • Default array node properties (null) lead to an exception
  • Graph loading:
    • Fixed a bug where using node label projections including properties on large graphs and high concurrency could lead to loss of some properties.
    • Fixed bug in graph creation which could cause an AIOOB exception during node loading.
    • The readConcurrency config parameter can no longer be overwritten by the concurrency param when it is explicitly set in an implicit graph creation config
  • Fixed a bug in memory estimation of large anonymous fictitious graphs.
  • Fixed bug in gds.alpha.dfs/bfs, where the algorithm did not terminate for graphs containing loops.
  • Fixed result column name embeddings to embedding in GraphSAGE, to align with the other embeddings.
  • Fixed a bug in Node2Vec where many disconnected nodes would cause a StackOverflowError
  • Fixed a bug in RandomProjection each iteration weight was multiplied all previous iteration weights.
  • Similarity algorithms:
    • Fixed a bug where Alpha Similarity algorithms would load a graph even though it was not needed
    • Fixed a bug where similarity algorithms would not remove the placeholder graph if config validation fails on invalid user input.
  • Fixed a bug where community statistic computation could overflow for large community ids.
  • Fixed a bug where DegreeCentrality returned incorrect values when concurrency > 1.
  • Fixed a bug where ClosenessCentrality was using a slightly incorrect formula for Wasserman-Faust algorithm.
  • Fixed a bug that affected gds.triangleCount() and gds.alpha.triangles() where not all triangles would be counted under certain conditions.
  • Parallel edges in a graph no longer lead to incorrect Local Clustering Coefficient and Triangle Count results.
  • The Long.MIN_VALUE fallback property values will now be translated to Double.NaN if a double value is requested.
  • Fixed a bug where graphs with multiple labels would sometimes fail when converting property values.

Improvements

  • fastRPExtended and graphSage now fail if node properties are Double.NaN
  • gds.fastRP now accepts integer iterationWeights
  • If graphSage.train is run on a graph without relationships, GDS now fails gracefully with an appropriate error message
  • Added validation that properties used by GraphSage exist on graph
  • Added validation for embeddingSize>=1
  • Added a failIfExists flag to graph creation to enable a user to specify that if a graph already exists, it should be overwritten without failing.
  • Progress logging:
    • We now log progress in equally spaced percentages. This is 0-100% either in steps of 1, or in larger steps if there are fewer than 100 batches. For example, if there are 50 batches, completing one batch means 2% progress, so it would log in steps of 2.
    • Decreased the logging frequency when running with a high concurrency.
  • Added postProcessingMillis to gds.localClusteringCoefficient and gds.triangleCount for modes:
    • mutate, write, stats
    • It is always zero for now, but this is a standard result column for these modes
  • Parallelized computation of result statistics for the following community detection procedures:
    • gds.wcc.write, gds.wcc.mutate and gds.wcc.stats
    • gds.louvain.write, gds.louvain.mutate and gds.louvain.stats
    • gds.labelPropagation.write, gds.labelPropagation.mutate and gds.labelPropagation.stats
    • gds.beta.modularityOptimization.write and gds.beta.modularityOptimization.mutate
    • gds.alpha.scc.write
  • Add graph schema to the result columns of gds.model.list and gds.model.drop
  • Validate property existence (e.g. seedProperty) when running algorithms on Cypher projections.
  • Elements in a Pregel composite schema may be set public/private in order to include or exclude them from generated procedure results
  • Improved memory estimation for * node projections.
  • Added validation that properties used by GraphSage exist on graph
  • Introduced parallel graph construction to improve performance of Louvain and Node Similarity
  • In-memory graphs in multidatabase:
    • When in-memory graphs are created, they are now associated with the database in use during creation time to prevent errors when running in a multi-database environment.
    • gds.graph.info() returns the database name the graph has been created on.
    • Named graphs can only be used on the database they have been created on.