Weakly Connected Components

Introduction

The Weakly Connected Components (WCC) algorithm finds sets of connected nodes in directed and undirected graphs. Two nodes are connected, if there exists a path between them. The set of all nodes that are connected with each other form a component. In contrast to Strongly Connected Components (SCC), the direction of relationships on the path between two nodes is not considered. For example, in a directed graph (a)→(b), a and b will be in the same component, even if there is no directed relationship (b)→(a).

WCC is often used early in an analysis to understand the structure of a graph. Using WCC to understand the graph structure enables running other algorithms independently on an identified cluster.

The implementation of the algorithm is based on the following papers:

Syntax

This section covers the syntax used to execute the Weakly Connected Components algorithm.

Run WCC.

CALL Neo4j_Graph_Analytics.graph.wcc(
  'CPU_X64_XS',                    (1)
  {
    ['defaultTablePrefix': '...',] (2)
    'project': {...},              (3)
    'compute': {...},              (4)
    'write':   {...}               (5)
  }
);

1	Compute pool selector.
2	Optional prefix for table references.
3	Project config.
4	Compute config.
5	Write config.

Table 1. Parameters
Name	Type	Default	Optional	Description
computePoolSelector	String	`n/a`	no	The selector for the compute pool on which to run the WCC job.
configuration	Map	`{}`	no	Configuration for graph project, algorithm compute and result write back.

The configuration map consists of the following three entries.

For more details on below Project configuration, refer to the Project documentation.

Table 2. Project configuration
Name	Type
nodeTables	List of node tables.
relationshipTables	Map of relationship types to relationship tables.

Table 3. Compute configuration
Name	Type	Default	Optional	Description
mutateProperty	String	`'component'`	yes	The node property that will be written back to the Snowflake database.
relationshipWeightProperty	String	`null`	yes	Name of the relationship property to use as weights. If unspecified, the algorithm runs unweighted.
seedProperty	String	`n/a`	yes	Used to set the initial component for a node. The property value needs to be a number.
threshold	Float	`null`	yes	The value of the weight above which the relationship is considered in the computation.
consecutiveIds	Boolean	`false`	yes	Flag to decide whether component identifiers are mapped into a consecutive id space (requires additional memory).

For more details on below Write configuration, refer to the Write documentation.

Table 4. Write configuration
Name	Type	Default	Optional	Description
nodeLabel	String	`n/a`	no	Node label in the in-memory graph from which to write a node property.
nodeProperty	String	`'component'`	yes	The node property that will be written back to the Snowflake database.
outputTable	String	`n/a`	no	Table in Snowflake database to which node properties are written.

Examples

In this section we will show examples of running the Weakly Connected Components algorithm on a concrete graph. The intention is to illustrate what the results look like and to provide a guide in how to make use of the algorithm in a real setting. We will do this on a small user network graph of a handful nodes connected in a particular pattern. The example graph looks like this:

The following SQL statement will create the example graph tables in the Snowflake database:

CREATE OR REPLACE TABLE EXAMPLE_DB.DATA_SCHEMA.USERS (NODEID VARCHAR);
INSERT INTO EXAMPLE_DB.DATA_SCHEMA.USERS VALUES
  ('Alice'),
  ('Bridget'),
  ('Charles'),
  ('Doug'),
  ('Mark'),
  ('Michael');

CREATE OR REPLACE TABLE EXAMPLE_DB.DATA_SCHEMA.LINKS (SOURCENODEID VARCHAR, TARGETNODEID VARCHAR, WEIGHT FLOAT);
INSERT INTO EXAMPLE_DB.DATA_SCHEMA.LINKS VALUES
  ('Alice', 'Bridget', 0.5),
  ('Alice', 'Charles', 4),
  ('Mark',  'Doug',    1.1),
  ('Mark',  'Michael', 2);

This graph has two connected components, each with three nodes. The relationships that connect the nodes in each component have a property weight which determines the strength of the relationship.

In the following examples we will demonstrate using the Weakly Connected Components algorithm on this graph.

Run job

Running a WCC job involves the three steps: Project, Compute and Write.

To run the query, there is a required setup of grants for the application, your consumer role and your environment. Please see the Getting started page for more on this.

We also assume that the application name is the default Neo4j_Graph_Analytics. If you chose a different app name during installation, please replace it with that.

The following will run a Weakly Connected Components job:

CALL Neo4j_Graph_Analytics.graph.wcc('CPU_X64_XS', {
  'defaultTablePrefix': 'EXAMPLE_DB.DATA_SCHEMA',
  'project': {
    'nodeTables': ['USERS'],
    'relationshipTables': {
      'LINKS': {
        'sourceTable': 'USERS',
        'targetTable': 'USERS'
      }
    }
  },
  'compute': {},
  'write': [
    {
      'nodeLabel': 'USERS',
      'outputTable': 'USERS_COMPONENTS'
    }
  ]
});

Table 5. Results
JOB_ID	JOB_START	JOB_END	JOB_RESULT
job_492026bbeaa6422eb4502a18def04cd6	2025-04-30 13:53:53.702000	2025-04-30 13:54:00.716000	{ "project_1": { "graphName": "snowgraph", "nodeCount": 6, "nodeMillis": 212, "relationshipCount": 4, "relationshipMillis": 519, "totalMillis": 731 }, "wcc_1": { "componentCount": 2, "componentDistribution": { "max": 3, "mean": 3, "min": 3, "p1": 3, "p10": 3, "p25": 3, "p5": 3, "p50": 3, "p75": 3, "p90": 3, "p95": 3, "p99": 3, "p999": 3 }, "computeMillis": 9, "configuration": { "concurrency": 2, "consecutiveIds": false, "jobId": "582c3c87-66ed-421a-b800-74e6c15d9734", "logProgress": true, "mutateProperty": "component", "nodeLabels": [""], "relationshipTypes": [""], "seedProperty": null, "sudo": false, "threshold": 0 }, "mutateMillis": 2, "nodePropertiesWritten": 6, "postProcessingMillis": 20, "preProcessingMillis": 6 }, "write_node_property_1": { "exportMillis": 2532, "nodeLabel": "USERS", "nodeProperty": "component", "outputTable": "EXAMPLE_DB.DATA_SCHEMA.USERS_COMPONENTS", "propertiesExported": 6 } }

Table 5. Results

JOB_ID

JOB_START

JOB_END

JOB_RESULT

job_492026bbeaa6422eb4502a18def04cd6

2025-04-30 13:53:53.702000

2025-04-30 13:54:00.716000

 {
  "project_1": {
    "graphName": "snowgraph",
    "nodeCount": 6,
    "nodeMillis": 212,
    "relationshipCount": 4,
    "relationshipMillis": 519,
    "totalMillis": 731
  },
  "wcc_1": {
    "componentCount": 2,
    "componentDistribution": {
      "max": 3,
      "mean": 3,
      "min": 3,
      "p1": 3,
      "p10": 3,
      "p25": 3,
      "p5": 3,
      "p50": 3,
      "p75": 3,
      "p90": 3,
      "p95": 3,
      "p99": 3,
      "p999": 3
    },
    "computeMillis": 9,
    "configuration": {
      "concurrency": 2,
      "consecutiveIds": false,
      "jobId": "582c3c87-66ed-421a-b800-74e6c15d9734",
      "logProgress": true,
      "mutateProperty": "component",
      "nodeLabels": ["*"],
      "relationshipTypes": ["*"],
      "seedProperty": null,
      "sudo": false,
      "threshold": 0
    },
    "mutateMillis": 2,
    "nodePropertiesWritten": 6,
    "postProcessingMillis": 20,
    "preProcessingMillis": 6
  },
  "write_node_property_1": {
    "exportMillis": 2532,
    "nodeLabel": "USERS",
    "nodeProperty": "component",
    "outputTable": "EXAMPLE_DB.DATA_SCHEMA.USERS_COMPONENTS",
    "propertiesExported": 6
  }
}

The returned result contains information about the job execution and result distribution. Additionally, the component ID for each of the nodes has been written back to the Snowflake database. We can query it like so:

SELECT * FROM EXAMPLE_DB.DATA_SCHEMA.USERS_COMPONENTS;

Table 6. Results
NODEID	COMPONENT
Alice	0
Bridget	0
Charles	0
Doug	3
Mark	3
Michael	3

The result shows that the algorithm identifies two components. This can be verified in the example graph.

The actual component ids may differ because the order of nodes projected in the in-memory graph is not guaranteed. For this case, it is equally plausible to get the inverse solution, f.i. when our community 0 nodes are mapped to community 3 instead, and vice versa.

Weighted WCC

By configuring the algorithm to use a weight we can increase granularity in the way the algorithm calculates component assignment. We do this by specifying the property key with the relationshipWeightProperty configuration parameter. Additionally, we can specify a threshold for the weight value. Then, only weights greater than the threshold value will be considered by the algorithm. We do this by specifying the threshold value with the threshold configuration parameter.

If a relationship does not have the specified weight property, the algorithm falls back to using a default value of zero.

The following will run a Weakly Connected Components job with weights and threshold configured:

CALL Neo4j_Graph_Analytics.graph.wcc('CPU_X64_XS', {
  'defaultTablePrefix': 'EXAMPLE_DB.DATA_SCHEMA',
  'project': {
    'nodeTables': ['USERS'],
    'relationshipTables': {
      'LINKS': {
        'sourceTable': 'USERS',
        'targetTable': 'USERS'
      }
    }
  },
  'compute': {
      'relationshipWeightProperty': 'WEIGHT',
      'threshold': 1.0
  },
  'write': [
    {
      'nodeLabel': 'USERS',
      'outputTable': 'USERS_COMPONENTS'
    }
  ]
});

Table 7. Results
JOB_ID	JOB_START	JOB_END	JOB_RESULT
job_d4f7ab2a37224183a0f4b3fd56003919	2025-06-18 12:10:52.657	2025-06-18 12:10:57.709	{ "project_1": { "graphName": "snowgraph", "nodeCount": 6, "nodeMillis": 150, "relationshipCount": 4, "relationshipMillis": 388, "totalMillis": 538 }, "wcc_1": { "componentCount": 3, "componentDistribution": { "max": 3, "mean": 2, "min": 1, "p1": 1, "p10": 1, "p25": 1, "p5": 1, "p50": 2, "p75": 3, "p90": 3, "p95": 3, "p99": 3, "p999": 3 }, "computeMillis": 21, "configuration": { "concurrency": 2, "consecutiveIds": false, "jobId": "a4a2c46b-9723-4f17-badd-b052707d5a90", "logProgress": true, "mutateProperty": "component", "nodeLabels": [""], "relationshipTypes": [""], "relationshipWeightProperty": "WEIGHT", "seedProperty": null, "sudo": false, "threshold": 1 }, "mutateMillis": 3, "nodePropertiesWritten": 6, "postProcessingMillis": 35, "preProcessingMillis": 10 }, "write_node_property_1": { "exportMillis": 1940, "nodeLabel": "USERS", "nodeProperty": "component", "outputTable": "EXAMPLE_DB.DATA_SCHEMA.USERS_COMPONENTS", "propertiesExported": 6 } }

Table 7. Results

JOB_ID

JOB_START

JOB_END

JOB_RESULT

job_d4f7ab2a37224183a0f4b3fd56003919

2025-06-18 12:10:52.657

2025-06-18 12:10:57.709

 {
  "project_1": {
    "graphName": "snowgraph",
    "nodeCount": 6,
    "nodeMillis": 150,
    "relationshipCount": 4,
    "relationshipMillis": 388,
    "totalMillis": 538
  },
  "wcc_1": {
    "componentCount": 3,
    "componentDistribution": {
      "max": 3,
      "mean": 2,
      "min": 1,
      "p1": 1,
      "p10": 1,
      "p25": 1,
      "p5": 1,
      "p50": 2,
      "p75": 3,
      "p90": 3,
      "p95": 3,
      "p99": 3,
      "p999": 3
    },
    "computeMillis": 21,
    "configuration": {
      "concurrency": 2,
      "consecutiveIds": false,
      "jobId": "a4a2c46b-9723-4f17-badd-b052707d5a90",
      "logProgress": true,
      "mutateProperty": "component",
      "nodeLabels": ["*"],
      "relationshipTypes": ["*"],
      "relationshipWeightProperty": "WEIGHT",
      "seedProperty": null,
      "sudo": false,
      "threshold": 1
    },
    "mutateMillis": 3,
    "nodePropertiesWritten": 6,
    "postProcessingMillis": 35,
    "preProcessingMillis": 10
  },
  "write_node_property_1": {
    "exportMillis": 1940,
    "nodeLabel": "USERS",
    "nodeProperty": "component",
    "outputTable": "EXAMPLE_DB.DATA_SCHEMA.USERS_COMPONENTS",
    "propertiesExported": 6
  }
}

As we can see from the results below, the node named 'Bridget' is now in its own component, due to its relationship weight being less than the configured threshold and thus ignored.

SELECT * FROM EXAMPLE_DB.DATA_SCHEMA.USERS_COMPONENTS;

Table 8. Results
NODEID	COMPONENT
Alice	0
Bridget	1
Charles	0
Doug	3
Mark	3
Michael	3

The actual component ids may differ because the order of nodes projected in the in-memory graph is not guaranteed. For this case it is equally plausible to get the inverse solution, f.i. when our community 0 nodes are mapped to community 3 instead, and vice versa.

Seeded components

It is possible to define preliminary component IDs for nodes using the seedProperty configuration parameter. This is helpful if we want to retain components from a previous run, and it is known that no components have been split by removing relationships. The property value needs to be a number.

The algorithm first checks if there is a seeded component ID assigned to the node. If there is one, that component ID is used. Otherwise, a new unique component ID is assigned to the node.

Once every node belongs to a component, the algorithm merges components of connected nodes. When components are merged, the resulting component is always the one with the lower component ID. Note that the consecutiveIds configuration option cannot be used in combination with seeding in order to retain the seeding values.

The algorithm assumes that nodes with the same seed value do in fact belong to the same component. If any two nodes in different components have the same seed, behavior is undefined. It is then recommended running WCC without seeds.

To demonstrate this in practice, we will go through a few steps:

We will set up new input where some nodes have a component id, and some do not
We will run WCC with the seed property parameter pointing at the component ids
We will observe that existing component ids persist, and that new nodes are added to existing components.

Start with input data: Mats is a new node who does not have a component seed. He is connected to Bridget though.

CREATE OR REPLACE TABLE EXAMPLE_DB.DATA_SCHEMA.USERS (NODEID VARCHAR, COMPONENTID NUMBER);
INSERT INTO EXAMPLE_DB.DATA_SCHEMA.USERS VALUES
  ('Alice', 2),
  ('Bridget', 0),
  ('Charles', 2),
  ('Doug', 1),
  ('Mark', 1),
  ('Mats', NULL),
  ('Michael', 1);

CREATE OR REPLACE TABLE EXAMPLE_DB.DATA_SCHEMA.LINKS (SOURCENODEID VARCHAR, TARGETNODEID VARCHAR, WEIGHT FLOAT);
INSERT INTO EXAMPLE_DB.DATA_SCHEMA.LINKS VALUES
  ('Alice', 'Bridget', 0.5),
  ('Alice', 'Charles', 4),
  ('Bridget', 'Mats', 2),
  ('Mark',  'Doug',    1.1),
  ('Mark',  'Michael', 2);

We can now run the WCC job on the new input data

CALL Neo4j_Graph_Analytics.graph.wcc('CPU_X64_XS', {
  'defaultTablePrefix': 'EXAMPLE_DB.DATA_SCHEMA',
  'project': {
    'nodeTables': ['USERS'],
    'relationshipTables': {
      'LINKS': {
        'sourceTable': 'USERS',
        'targetTable': 'USERS'
      }
    }
  },
  'compute': {
      'relationshipWeightProperty': 'WEIGHT',
      'seedProperty': 'COMPONENTID',
      'threshold': 1.0
  },
  'write': [
    {
      'nodeLabel': 'USERS',
      'outputTable': 'USERS_COMPONENTS'
    }
  ]
});

Table 9. Results
JOB_ID	JOB_START	JOB_END	JOB_RESULT
job_5c839fe33a934e7dbf80c891a28ac852	2025-06-19 08:10:52.076	2025-06-19 08:10:57.285	{ "project_1": { "graphName": "snowgraph", "nodeCount": 7, "nodeMillis": 150, "relationshipCount": 5, "relationshipMillis": 286, "totalMillis": 436 }, "wcc_1": { "componentCount": 3, "componentDistribution": { "max": 3, "mean": 2.3333333333333335, "min": 2, "p1": 2, "p10": 2, "p25": 2, "p5": 2, "p50": 2, "p75": 3, "p90": 3, "p95": 3, "p99": 3, "p999": 3 }, "computeMillis": 10, "configuration": { "concurrency": 2, "consecutiveIds": false, "jobId": "job_5c839fe33a934e7dbf80c891a28ac852", "logProgress": true, "mutateProperty": "component", "nodeLabels": [""], "relationshipTypes": [""], "relationshipWeightProperty": "WEIGHT", "seedProperty": "COMPONENTID", "sudo": false, "threshold": 1 }, "mutateMillis": 1, "nodePropertiesWritten": 7, "postProcessingMillis": 23, "preProcessingMillis": 16 }, "write_node_property_1": { "exportMillis": 1863, "nodeLabel": "USERS", "nodeProperty": "component", "outputTable": "EXAMPLE_DB.DATA_SCHEMA.USERS_COMPONENTS", "propertiesExported": 7 } }

Table 9. Results

JOB_ID

JOB_START

JOB_END

JOB_RESULT

job_5c839fe33a934e7dbf80c891a28ac852

2025-06-19 08:10:52.076

2025-06-19 08:10:57.285

 {
  "project_1": {
    "graphName": "snowgraph",
    "nodeCount": 7,
    "nodeMillis": 150,
    "relationshipCount": 5,
    "relationshipMillis": 286,
    "totalMillis": 436
  },
  "wcc_1": {
    "componentCount": 3,
    "componentDistribution": {
      "max": 3,
      "mean": 2.3333333333333335,
      "min": 2,
      "p1": 2,
      "p10": 2,
      "p25": 2,
      "p5": 2,
      "p50": 2,
      "p75": 3,
      "p90": 3,
      "p95": 3,
      "p99": 3,
      "p999": 3
    },
    "computeMillis": 10,
    "configuration": {
      "concurrency": 2,
      "consecutiveIds": false,
      "jobId": "job_5c839fe33a934e7dbf80c891a28ac852",
      "logProgress": true,
      "mutateProperty": "component",
      "nodeLabels": ["*"],
      "relationshipTypes": ["*"],
      "relationshipWeightProperty": "WEIGHT",
      "seedProperty": "COMPONENTID",
      "sudo": false,
      "threshold": 1
    },
    "mutateMillis": 1,
    "nodePropertiesWritten": 7,
    "postProcessingMillis": 23,
    "preProcessingMillis": 16
  },
  "write_node_property_1": {
    "exportMillis": 1863,
    "nodeLabel": "USERS",
    "nodeProperty": "component",
    "outputTable": "EXAMPLE_DB.DATA_SCHEMA.USERS_COMPONENTS",
    "propertiesExported": 7
  }
}

We see that new node Mats, given his connection to Bridget, was put in the same component as her, retaining her old component label 0:

SELECT * FROM EXAMPLE_DB.DATA_SCHEMA.USERS_COMPONENTS;

Table 10. Results
NODEID	COMPONENT
Alice	2
Bridget	0
Charles	2
Doug	1
Mark	1
Mats	0
Michael	1

The result shows that despite not having the seedProperty when it was projected, the node 'Mats' has been assigned to the same component as the node 'Bridget'. This is correct because these two nodes are connected.