Changelog

This page contains a raw changelog of Neo4j Graph Analytics for Snowflake.

Changelog

1.0.8

Added

Added support for the node embedding algorithm hashgnn.

Changed

Made configuration validation of numerical values in compute configurations faster.

Fixed

Fixed a bug where the experimental.visualize endpoint was not able to handle some kinds of node columns as input to byColumn of the nodeColoring configuration.

1.0.7

Added

Added support for setting log levels for a single job execution using the runtime configuration.
Added support for the new labelFilter parameter of the triangle_count algorithm which allow users to specify the labels of the nodes in the triangles counted by the algorithm.
Added support for the community detection algorithm leiden.
Added support for the node embedding algorithm node2vec.
Added support for the path finding algorithm delta_stepping.
Added support for the filtered node similarity algorithm node_similarity_filtered.

Changed

Writing to tables now uses Parquet files instead of CSV files as intermediate file format.
This improves write throughput, especially when writing large amounts of data including many complex properties, such as embeddings.
When writing arrays, such as in fast_rp, the resulting column type will be structured array, e.g. ARRAY(FLOAT) instead of ARRAY.
Changed to use more familiar type names in error messages, such as BIGINT instead of long.

Fixed

Added missing validation that the corresponding *table parameter is mandatory when a *node parameter is specified.
Improved graph.dijkstra_single_source by capturing target nod id types, and rendering them accordingly.
Fixed a bug where columns of some types (like VARCHAR) in table input to the experimental.visualize endpoint caused errors.
Fixed a bug where the experimental.visualize endpoint did not handle the maxAllowedNodes parameter correctly, leading to errors.
Fixed a bug where the experimental.visualize endpoint did not normalize the sizes of nodes correctly.

1.0.6

Added

Added a new endpoint experimental.visualize, based on the neo4j-viz Python library, which takes graphs represented as tables as input, and outputs an HTML visualization of the graph.
Better error message if a config parameter, such as sourceNode, refers to a node that has not been projected to the in-memory graph.
Added sourceTargetNodePairsTable parameter to the Dijkstra Source-Target algorithm, allowing batch processing of multiple source-target pairs from a table with SOURCENODEID and TARGETNODEID columns.

Changed

Fixed

Fixed default table prefix resolution for model train algorithms (GraphSage).
Fixed a bug where graph.dijkstra_single_source would not render target node IDs correctly in the results if both BIGINT and VARCHAR node IDs were present.
Fixed a bug where NULL node properties were not handled correctly by some algorithms.
Fixed a bug where integer random seeds below a certain value caused an IllegalArgumentException.

1.0.5

Added

Improved eager validation of job configurations, ensuring that:
- project and write configurations contain all mandatory parameters
- default table prefix is valid
- all normalized table names are valid
- provided node and relationship tables exist and are accessible
- provided node and relationship tables adhere to the expected schema
- source and target tables are provided as nodeTables
Improved error reporting for job configuration validation

Changed

graph.dijsktra and graph.dijkstra_single_source now also return node IDs of paths, and costs.
Remove experimental.dijsktra in favor of graph.dijkstra.

Fixed

Fixed a bug in graph.fastpath, where providing a firstRelationshipType always failed configuration validation, even if the type was valid.
Fixed a bug where an import was missing for some endpoints, causing confusing error messages when something failed in the job configuration validation.

1.0.4

Added

Added experimental endpoint experimental.dijsktra for running Dijkstra’s algorithm and returning all paths as part of the result.
Added eager validation for some aspects of job configurations, making jobs fail fast if the configuration is invalid.

Changed

Fixed

1.0.3

Added

Added DegreeCentrality algorithm and procedure graph.degree.

Changed

Fixed

1.0.2

Added

Added support for setting defaultTablePrefix as a global setting in algorithm configurations.

Changed

Fixed

1.0.1

Added

Changed

Fixed

1.0.0

Added

Changed

Fixed

0.3.14

Added

Added admin.show_jobs procedure to list all finished jobs in the system.
Added TriangleCounting algorithm and procedure graph.triangle_count.

Changed

admin.get_max_nodes replaces internal.get_max_nodes
admin.set_max_nodes replaces internal.set_max_nodes
admin.get_min_nodes replaces internal.get_min_nodes
admin.set_min_nodes replaces internal.set_min_nodes
graph.job_log replaces internal.job_service_log

Fixed

Diagnostic information that was lost with the shift to running transient job services is restored temporarily by changing the log level to DEBUG.

0.3.13

Added

Changed

Fixed

Fixed a problem in Dijkstra and Page Rank where result configuration entry could show internal node IDs.
Work around limitation in SPCS event sharing.

0.3.12

Added

Procedures internal.get_min_nodes, internal.get_max_nodes, internal.set_min_nodes, internal.set_max_nodes, to manage number of nodes in compute pools.
Log endpoint internal.job_service_log includes stack-trace when python based algorithms fail.

Changed

For graphsage and fastpath algorithms align api syntax, such as top-level keys and camelcased parameters, to be same as for all other algorithms.

Fixed

A bug in graph.gs_nc_train, graph.gs_nc_predict, graph.gs_unsup_train, graph.gs_unsup_predict, where GPU’s were not utilized.

0.3.11

Added

Changed

Fixed

0.3.10

Added

Added procedures: graph.betweenness graph.dijkstra graph.dijkstra_single_source graph.drop_model graph.fastpath graph.fast_rp graph.graph graph.gs_nc_predict graph.gs_nc_train graph.gs_unsup_predict graph.gs_unsup_train graph.knn graph.louvain graph.model_exists graph.node_similarity graph.page_rank graph.show_available_compute_pools graph.show_models graph.wcc

Changed

Fixed

0.3.9

Added

Changed

Fixed

Restore broken data on available compute pools.

0.3.8

Added

Support for GPU compute pool GPU_NV_XS, available in most Azure regions.
gml.show_available_compute_pools and gds.show_available_compute_pools These are replacements for the gml.list_available_compute_pools and gds.list_available_compute_pools procedures, which will be removed in a future release.

Changed

Fixed

Compute pool and warehouse creation no longer fails when a compute pool instance family is unavailable in a particular region.

0.3.7

Added

Support for defaultTablePrefix in gds.graph_project, enabling a common prefix for all tables in the projection.
Grant OPERATE on application-manged compute pools to APP_ADMIN role.

Changed

Replaced the map with a list of tables or views in nodeTables within gds.graph_project. The corresponding label is now inferred from the table name. This is a breaking change.
Removed type parameter from list entries of relationshipTables in gds.graph_project. The relationship type is now inferred from the table name.

Fixed

Fixed a bug, where write_relationships could potentially end up writing wrong node ids if multiple node tables are involved in the projection.

0.3.6

Added

Changed

Fixed

0.3.5

Added

Support for projecting heterogeneous graphs from multiple node and relationship tables.
- This is a breaking change as the syntax changed for
- gds.graph_project
- gds.write_nodeproperties
- gds.write_relationships
- Algorithm configurations that include node references (e.g. path algorithms).
Support for table-unqiue, non-integer node identifiers in input tables.
- We now support VARCHAR and BIGINT node identifiers.
- Node identifiers only need to be unique within the table they are projected from.

Changed

Fixed

0.3.4

Added

Procedure gds.list_available_compute_pools to list compute pools available for use with GDS Sessions.
Procedure gml.list_available_compute_pools to list compute pools available for use with GML Sessions.
New machine learning algorithm FastPath gml.fastpath for computing path embeddings.
Added endpoints for managing models:
- Check existence for a model: gml.model_exists
- List models: gml.model_list
- Drop a model: gml.model_drop

Changed

If an invalid compute pool selector is used, raise an exception with clear messaging and a list of valid compute pool selectors.
Telemetry event sharing changes.
- Errors and warnings ⇒ Mandatory
- Traces ⇒ Mandatory
- Usage logs ⇒ Mandatory
- Debug logs ⇒ Optional
- Metrics ⇒ Optional

Fixed

0.3.3

Added

Changed

Fixed

A recent change in Snowflake requires GPU compute pool usage to be declared up front in the application manifest, or compute pool creation fails.

0.3.2

Added

Changed

Slim down return values from GraphSAGE endpoints
Improved logging for GraphSAGE
Fail early in any gml training algorithm (currently GraphSAGE) if model name already exists
Add failure reason to log table in case of failure for gml training and prediction algorithms

Fixed

Fixed bug leading to progress of more than 100% being logged for GraphSAGE.

0.3.1

Added

Changed

Fixed

Fix an issue where GraphSAGE can run out of shared memory.
Removed target_label from config for gml.gs_nc_predict because it was unused.

0.3.0

Added

graph_project now supports projecting node identifier columns as BIGINT or VARCHAR.
- This allows for more flexible node identifier columns, e.g., when using UUIDs.
- For BIGINT there will be a ~2x regression in projection runtime, which will be addressed in an upcoming release.
Graph machine learning runtime.
- gml.create_session
- gml.stop_session
- gml.list
Supervised GraphSAGE
- gml.gs_nc_train
- gml.gs_nc_predict
Unsupervised GraphSAGE
- gml.gs_unsup_train
- gml.gs_unsup_predict
Support for GPU compute pool GPU_NV_S.

Changed

Fixed

0.2.19

Added

graph_list shows heap memory usage of the in-memory-graph.
Add support for compute pool type HIGHMEM_X64_L.

Changed

Projecting from an empty node table is no longer allowed and will return an error.

Fixed

Invalid function parameters now fail with a better error message and are not server errors anymore.
- This fixes long-running queries that would eventually fail with a server error.

0.2.18

Added

Added support for gds.drop_nodeproperties to drop node properties from a graph.

Changed

Improved service logging.
- Separated logging for server layer (snowgraph) and application layer (gds).
- Added more detailed logging for endpoint execution.
- Allow setting log level via internal.set_log_level(logger, level) function.

Fixed

0.2.17

Added

Changed

Fixed

Fixed a bug where graph drop might stall for a long time trying to drop a graph that doesn’t exist.
Disabled fail-early on write back when missing privilege to create table, because privilege check was flaky.

0.2.16

Added

Added support for the HITS algorithm via the command gds.hits.
Added support for gds.graph_filter to filter subgraphs based on node and relationship properties.

Changed

Concurrency now defaults to number of cores. Affects 'concurrency', 'readConcurrency' and 'writeConcurrency'.

Fixed

0.2.15

Added

Changed

Fixed

0.2.14

Added

Changed

Fixed

0.2.13

Added

Added support for the Speaker-Listener Label Propagation algorithm via the command gds.sllpa.

Changed

Application creates five own compute pools from which consumer selects one to run on.
Application creates own query warehouse, which consumer configures according to their requirements.
Application requires grants for CREATE COMPUTE POOL and CREATE WAREHOUSE privileges.

Fixed

Various documentation fixes.

0.2.12

Added

Changed

gds.indirect_exposure now computes exposure, hop, parent and root for each node.
- This can be defined in the configuration using 'mutateProperties': { 'exposure': '<key>', 'hop': '<key>', 'parent': '<key>', 'root': '<key>' }.
- The algorithm currently only supports max aggregation, the exposureReducer config has been removed.

Fixed

0.2.11

Added

Changed

Fixed

0.2.10

Added

gds.indirect_exposure allows specifying an exposureReducer function to aggregate the exposure of multiple neighbors.
- The default exposureReducer function is SUM, possible values are SUM, and MAX.

Changed

Fixed

0.2.9

Added

Added gds.indirect_exposure algorithm for risk analysis.
Post upgrade, calling gds.create_session will explicitly drop and re-create the service.

Changed

Fixed

0.2.8

Added

Added support for node id ranges that use the full BIGINT range.

Fixed

Fixed sizing of JVM heap memory for GDS service.

0.2.7

Added

GDS gets the calling Snowflake user’s username
- to project, list and drop graphs per user
- to run algorithms on users own graphs
GDS gets the calling Snowflake user’s current role
- to set admin privileges if the current role has the application role APP_ADMIN
Support semi-structured ARRAY type for node property projections. Element types can be BIGINT or DOUBLE.
gds.write_nodeproperties_to_table and gds.write_relationships_to_table
- Both functions upload data to an app-internal stage and then copy the data into the specified consumer table.
gds.write_nodeproperties_to_stage and gds.write_relationships_to_stage
- Both functions upload data to a consumer-defined stage for further processing.
gds.write_nodeproperties_to_table supports writing semi-structured ARRAY type
- Element types can be BIGINT or DOUBLE
gds.graph_project supports setting an orientation for relationships
- possible values are NATURAL (default), UNDIRECTED and REVERSED

Changed

Renamed to "Neo4j Graph Data Science" (and long form "Neo4j Graph Data Science \<version>" in text).
write_nodeproperties and write_relationships parameter outputTable changed to table
write_nodeproperties and write_relationships are now aliases
- write_nodeproperties is an alias for write_nodeproperties_to_table
- write_relationships is an alias for write_relationships_to_table
Automatic eviction of GDS operation results (graph project, algorithms):
- Results can be accessed via the gds.result_list and gds.result functions.
- When an operation finishes, the result is kept for 2 more hours before it gets evicted.

0.2.6

Changed

Use snowpark-sdk for schema operations.

0.2.5

Fixed

Made sure that relationship property shows up in in-memory graph.
write_relationships now correctly writes relationships to the table.

0.2.4

Changed

graph_project, write_nodeproperties, and write_relationships use snowflake-jdbc driver instead of snowpark-sdk.