Weakly Connected Components
Neo4j Graph Analytics for Snowflake is in Public Preview and is not intended for production use. |
Introduction
The Weakly Connected Components (WCC) algorithm finds sets of connected nodes in directed and undirected graphs.
Two nodes are connected, if there exists a path between them.
The set of all nodes that are connected with each other form a component.
In contrast to Strongly Connected Components (SCC), the direction of relationships on the path between two nodes is not considered.
For example, in a directed graph (a)→(b)
, a
and b
will be in the same component, even if there is no directed relationship (b)→(a)
.
WCC is often used early in an analysis to understand the structure of a graph. Using WCC to understand the graph structure enables running other algorithms independently on an identified cluster.
The implementation of the algorithm is based on the following papers:
Syntax
CALL Neo4j_Graph_Analytics.graph.wcc(
'X64_CPU_L', (1)
{
'project': {...}, (2)
'compute': {...}, (3)
'write': {...} (4)
}
);
1 | Compute pool selector. |
2 | Project config. |
3 | Compute config. |
4 | Write config. |
Name | Type | Default | Optional | Description |
---|---|---|---|---|
computePoolSelector |
String |
|
no |
The selector for the compute pool on which to run the WCC job. |
configuration |
Map |
|
no |
Configuration for graph project, algorithm compute and result write back. |
The configuration map consists of the following three entries.
For more details on below Project configuration, refer to the Project documentation. |
Name | Type |
---|---|
nodeTables |
List of node tables. |
relationshipTables |
Map of relationship types to relationship tables. |
Name | Type | Default | Optional | Description |
---|---|---|---|---|
mutateProperty |
String |
|
yes |
The node property that will be written back to the Snowflake database. |
relationshipWeightProperty |
String |
|
yes |
Name of the relationship property to use as weights. If unspecified, the algorithm runs unweighted. |
seedProperty |
String |
|
yes |
Used to set the initial component for a node. The property value needs to be a number. |
threshold |
Float |
|
yes |
The value of the weight above which the relationship is considered in the computation. |
consecutiveIds |
Boolean |
|
yes |
Flag to decide whether component identifiers are mapped into a consecutive id space (requires additional memory). |
minComponentSize |
Integer |
|
yes |
Only nodes inside communities larger or equal the given value are returned. |
For more details on below Write configuration, refer to the Write documentation. |
Name | Type | Default | Optional | Description |
---|---|---|---|---|
nodeLabel |
String |
|
no |
Node label in the in-memory graph from which to write a node property. |
nodeProperty |
String |
|
yes |
The node property that will be written back to the Snowflake database. |
outputTable |
String |
|
no |
Table in Snowflake database to which node properties are written. |
Examples
In this section we will show examples of running the Weakly Connected Components algorithm on a concrete graph. The intention is to illustrate what the results look like and to provide a guide in how to make use of the algorithm in a real setting. We will do this on a small user network graph of a handful nodes connected in a particular pattern. The example graph looks like this:
CREATE OR REPLACE TABLE EXAMPLE_DB.DATA_SCHEMA.USERS (NODEID STRING);
INSERT INTO EXAMPLE_DB.DATA_SCHEMA.USERS VALUES
('Alice'),
('Bridget'),
('Charles'),
('Doug'),
('Mark'),
('Michael');
CREATE OR REPLACE TABLE EXAMPLE_DB.DATA_SCHEMA.LINKS (SOURCENODEID STRING, TARGETNODEID STRING, WEIGHT FLOAT);
INSERT INTO EXAMPLE_DB.DATA_SCHEMA.LINKS VALUES
('Alice', 'Bridget', 0.5),
('Alice', 'Charles', 4),
('Mark', 'Doug', 1.1),
('Mark', 'Michael', 2);
This graph has two connected components, each with three nodes.
The relationships that connect the nodes in each component have a property weight
which determines the strength of the relationship.
In the following examples we will demonstrate using the Weakly Connected Components algorithm on this graph.
Run job
Running a WCC job involves the three steps: Project, Compute and Write.
To run the query, there is a required setup of grants for the application, your consumer role and your environment. Please see the Getting started page for more on this.
We also assume that the application name is the default Neo4j_Graph_Analytics. If you chose a different app name during installation, please replace it with that.
CALL Neo4j_Graph_Analytics.graph.wcc('CPU_X64_XS', {
'project': {
'defaultTablePrefix': 'EXAMPLE_DB.DATA_SCHEMA',
'nodeTables': ['USERS'],
'relationshipTables': {
'LINKS': {
'sourceTable': 'USERS',
'targetTable': 'USERS'
}
}
},
'compute': {},
'write': [
{
'nodeLabel': 'USERS',
'outputTable': 'EXAMPLE_DB.DATA_SCHEMA.USERS_COMPONENTS'
}
]
});
JOB_ID | JOB_START | JOB_END | JOB_RESULT |
---|---|---|---|
job_492026bbeaa6422eb4502a18def04cd6 |
2025-04-30 13:53:53.702000 |
2025-04-30 13:54:00.716000 |
{ "project_1": { "graphName": "snowgraph", "nodeCount": 6, "nodeMillis": 212, "relationshipCount": 4, "relationshipMillis": 519, "totalMillis": 731 }, "wcc_1": { "componentCount": 2, "componentDistribution": { "max": 3, "mean": 3, "min": 3, "p1": 3, "p10": 3, "p25": 3, "p5": 3, "p50": 3, "p75": 3, "p90": 3, "p95": 3, "p99": 3, "p999": 3 }, "computeMillis": 9, "configuration": { "concurrency": 2, "consecutiveIds": false, "jobId": "582c3c87-66ed-421a-b800-74e6c15d9734", "logProgress": true, "mutateProperty": "component", "nodeLabels": [ "" ], "relationshipTypes": [ "" ], "seedProperty": null, "sudo": false, "threshold": 0 }, "mutateMillis": 2, "nodePropertiesWritten": 6, "postProcessingMillis": 20, "preProcessingMillis": 6 }, "write_node_property_1": { "exportMillis": 2532, "nodeLabel": "USERS", "nodeProperty": "component", "outputTable": "EXAMPLE_DB.DATA_SCHEMA.USERS_COMPONENTS", "propertiesExported": 6 } } |
The returned result contains information about the job execution and result distribution. Additionally, the component ID for each of the nodes has been written back to the Snowflake database. We can query it like so:
SELECT * FROM EXAMPLE_DB.DATA_SCHEMA.USERS_COMPONENTS;
NODEID | COMPONENT | |
---|---|---|
Alice |
0 |
|
Bridget |
0 |
|
Charles |
0 |
|
Doug |
3 |
|
Mark |
3 |
|
Michael |
3 |
The result shows that the algorithm identifies two components. This can be verified in the example graph.
The default behaviour of the algorithm is to run unweighted
, e.g. without using relationship
weights.
The weighted
option will be demonstrated in Weighted
The actual component ids may differ because the order of nodes projected in the in-memory graph is not guaranteed.
For this case it is equally plausible to get the inverse solution, f.i. when our community |