5.3.3. Weakly Connected Components

This section describes the Weakly Connected Components (WCC) algorithm in the Neo4j Graph Data Science library.

This topic includes:

5.3.3.1. Introduction

The WCC algorithm finds sets of connected nodes in an undirected graph, where all nodes in the same set form a connected component. WCC is often used early in an analysis to understand the structure of a graph.

WCC has previously been known as Union Find or Connected Components in this User Guide.

For more information on this algorithm, see:

Running this algorithm requires sufficient memory availability. Before running this algorithm, we recommend that you read Section 3.1, “Memory Estimation”.

5.3.3.2. Syntax

Write mode

Run WCC in write mode on a graph stored in the catalog. 

CALL gds.wcc.write(
  graphName: String,
  configuration: Map
)
YIELD
  // general write return columns
  componentCount: Integer,
  componentDistribution: Map

Table 5.118. Parameters
Name Type Default Optional Description

graphName

String

n/a

no

The name of a graph stored in the catalog.

configuration

Map

{}

yes

Configuration for algorithm-specifics and/or graph filtering.

Table 5.119. General configuration for algorithm execution on a named graph.
Name Type Default Optional Description

nodeLabels

String[]

['*']

yes

Filter the named graph using the given node labels.

relationshipTypes

String[]

['*']

yes

Filter the named graph using the given relationship types.

concurrency

Integer

4

yes

The number of concurrent threads used for running the algorithm. Also provides the default value for 'writeConcurrency'.

writeConcurrency

Integer

value of 'concurrency'

yes

WRITE mode only: The number of concurrent threads used for writing the result.

writeProperty

String

n/a

no

WRITE mode only: The node property to which the component ID is written to.

Run WCC in write mode on an anonymous graph. 

CALL gds.wcc.write(configuration: Map)
YIELD
  // general write return columns
  componentCount: Integer,
  componentDistribution: Map

Table 5.120. Parameters
Name Type Default Optional Description

configuration

Map

null

no

Configuration for anonymous graph creation and algorithm-specifics.

Table 5.121. General configuration for algorithm execution on an anonymous graph.
Name Type Default Optional Description

nodeProjection

String, String[] or Map

null

yes

The node projection used for anonymous graph creation via a Native projection.

relationshipProjection

String, String[] or Map

null

yes

The relationship projection used for anonymous graph creation a Native projection.

nodeQuery

String

null

yes

The Cypher query used to select the nodes for anonymous graph creation via a Cypher projection.

relationshipQuery

String

null

yes

The Cypher query used to select the relationships for anonymous graph creation via a Cypher projection.

nodeProperties

String, String[] or Map

null

yes

The node properties to project during anonymous graph creation.

relationshipProperties

String, String[] or Map

null

yes

The relationship properties to project during anonymous graph creation.

concurrency

Integer

4

yes

The number of concurrent threads used for running the algorithm. Also provides the default value for 'readConcurrency' and 'writeConcurrency'.

readConcurrency

Integer

value of 'concurrency'

yes

The number of concurrent threads used for creating the graph.

writeConcurrency

Integer

value of 'concurrency'

yes

WRITE mode only: The number of concurrent threads used for writing the result.

writeProperty

String

n/a

no

WRITE mode only: The node property to which the component ID is written to.

Table 5.122. Algorithm specific configuration
Name Type Default Optional Description

relationshipWeightProperty

String

null

yes

The relationship property that contains the weight. If null, the graph is treated as unweighted. Must be numeric.

defaultValue

Float

null

yes

The default value of the relationship weight in case it is missing or invalid.

seedProperty

String

n/a

yes

Used to set the initial component for a node. The property value needs to be a number.

threshold

Float

null

yes

The value of the weight above which the relationship is considered in the computation.

consecutiveIds

Boolean

false

yes

Flag to decide whether component identifiers are mapped into a consecutive id space (requires additional memory).

Table 5.123. Results
Name Type Description

createMillis

Integer

Milliseconds for loading data.

computeMillis

Integer

Milliseconds for running the algorithm.

writeMillis

Integer

Milliseconds for writing result data back.

postProcessingMillis

Integer

Milliseconds for computing component count and distribution statistics.

nodePropertiesWritten

Integer

The number of node properties written.

relationshipPropertiesWritten

Integer

The number of relationship properties written.

componentCount

Integer

The number of computed components.

componentDistribution

Map

Map containing min, max, mean as well as p50, p75, p90, p95, p99 and p999 percentile values of component sizes.

configuration

Map

The configuration used for running the algorithm.

Mutate mode

Run WCC in mutate mode on a graph stored in the catalog. 

CALL gds.wcc.mutate(
  graphName: String,
  configuration: Map
)
YIELD
  // general mutate return columns
  componentCount: Integer,
  componentDistribution: Map

The configuration for the mutate mode is similar to the write mode. Instead of specifying a writeProperty, we need to specify a mutateProperty. Also, specifying writeConcurrency is not possible in mutate mode.

The following will run the algorithm and store the results in myGraph

CALL gds.wcc.mutate('myGraph', { mutateProperty: 'componentId' })

Stream mode

Run WCC in stream mode on a graph stored in the catalog. 

CALL gds.wcc.stream(
  graphName: String,
  configuration: Map
)
YIELD
  nodeId: Integer,
  componentId: Integer

Table 5.124. Parameters
Name Type Default Optional Description

graphName

String

n/a

no

The name of a graph stored in the catalog.

configuration

Map

{}

yes

Configuration for algorithm-specifics and/or graph filtering.

Table 5.125. General configuration for algorithm execution on a named graph.
Name Type Default Optional Description

nodeLabels

String[]

['*']

yes

Filter the named graph using the given node labels.

relationshipTypes

String[]

['*']

yes

Filter the named graph using the given relationship types.

concurrency

Integer

4

yes

The number of concurrent threads used for running the algorithm. Also provides the default value for 'writeConcurrency'.

writeConcurrency

Integer

value of 'concurrency'

yes

WRITE mode only: The number of concurrent threads used for writing the result.

writeProperty

String

n/a

no

WRITE mode only: The node property to which the component ID is written to.

Run WCC in stream mode on an anonymous graph. 

CALL gds.wcc.stream(configuration: Map)
YIELD
  nodeId: Integer,
  componentId: Integer

Table 5.126. Parameters
Name Type Default Optional Description

configuration

Map

null

no

Configuration for anonymous graph creation and algorithm-specifics.

Table 5.127. General configuration for algorithm execution on an anonymous graph.
Name Type Default Optional Description

nodeProjection

String, String[] or Map

null

yes

The node projection used for anonymous graph creation via a Native projection.

relationshipProjection

String, String[] or Map

null

yes

The relationship projection used for anonymous graph creation a Native projection.

nodeQuery

String

null

yes

The Cypher query used to select the nodes for anonymous graph creation via a Cypher projection.

relationshipQuery

String

null

yes

The Cypher query used to select the relationships for anonymous graph creation via a Cypher projection.

nodeProperties

String, String[] or Map

null

yes

The node properties to project during anonymous graph creation.

relationshipProperties

String, String[] or Map

null

yes

The relationship properties to project during anonymous graph creation.

concurrency

Integer

4

yes

The number of concurrent threads used for running the algorithm. Also provides the default value for 'readConcurrency' and 'writeConcurrency'.

readConcurrency

Integer

value of 'concurrency'

yes

The number of concurrent threads used for creating the graph.

writeConcurrency

Integer

value of 'concurrency'

yes

WRITE mode only: The number of concurrent threads used for writing the result.

writeProperty

String

n/a

no

WRITE mode only: The node property to which the component ID is written to.

Table 5.128. Algorithm specific configuration
Name Type Default Optional Description

relationshipWeightProperty

String

null

yes

The relationship property that contains the weight. If null, the graph is treated as unweighted. Must be numeric.

defaultValue

Float

null

yes

The default value of the relationship weight in case it is missing or invalid.

seedProperty

String

n/a

yes

Used to set the initial component for a node. The property value needs to be a number.

threshold

Float

null

yes

The value of the weight above which the relationship is considered in the computation.

consecutiveIds

Boolean

false

yes

Flag to decide whether component identifiers are mapped into a consecutive id space (requires additional memory).

Table 5.129. Results
Name Type Description

nodeId

Integer

The Neo4j node ID.

componentId

Integer

The component ID.

Stats mode

Run WCC in stats mode on a named graph. 

CALL gds.wcc.stats(
  graphName: String,
  configuration: Map
)
YIELD
  computeMillis: Integer,
  postProcessingMillis: Integer,
  componentCount: Integer,
  componentDistribution: Map,

Table 5.130. Parameters
Name Type Default Optional Description

graphName

String

n/a

no

The name of a graph stored in the catalog.

configuration

Map

{}

yes

Configuration for algorithm-specifics and/or graph filtering.

Run WCC in stats mode on an anonymous graph. 

CALL gds.wcc.stats(configuration: Map)
YIELD
  createMillis: Integer,
  computeMillis: Integer,
  postProcessingMillis: Integer,
  componentCount: Integer,
  componentDistribution: Map,

Table 5.131. Parameters
Name Type Default Optional Description

configuration

Map

null

no

Configuration for anonymous graph creation and algorithm-specifics.

The configuration is the same as for the write mode.

Table 5.132. Results
Name Type Description

createMillis

Integer

Milliseconds for loading data.

computeMillis

Integer

Milliseconds for running the algorithm.

postProcessingMillis

Integer

Milliseconds for computing component count and distribution statistics.

componentCount

Integer

The number of computed components.

componentDistribution

Map

Map containing min, max, mean as well as p50, p75, p90, p95, p99 and p999 percentile values of component sizes.

configuration

Map

The configuration used for running the algorithm.

Estimate mode

The following will estimate the memory requirements for running the algorithm. The mode can be substituted with the available modes (stream, write and stats).

Run WCC in estimate mode on a named graph. 

CALL gds.wcc.<mode>.estimate(
  graphName: String,
  configuration: Map
)

Table 5.133. Parameters
Name Type Default Optional Description

graphName

String

n/a

no

The name of a graph stored in the catalog.

configuration

Map

{}

yes

Configuration for algorithm-specifics and/or graph filtering.

Run WCC in estimate mode on an anonymous graph. 

CALL gds.wcc.<mode>.estimate(configuration: Map)

Table 5.134. Parameters
Name Type Default Optional Description

configuration

Map

null

no

Configuration for anonymous graph creation and algorithm-specifics.

For memory estimation on a named graph the configuration for creating that graph is used. For the anonymous graph, configuration is used as for Section 4.2, “Native projection” or Section 4.3, “Cypher projection”. The graph estimation on an anonymous graph is based on the configuration pertaining to anonymous creation or so-called fictive estimation controlled by the options in the table below.

Table 5.135. Configuration
Name Type Default Optional Description

nodeCount

Integer

-1

yes

The number of nodes in a fictive graph.

relationshipCount

Integer

-1

yes

The number of relationships in a fictive graph.

Setting the nodeCount and relationshipCount parameters results in fictive graph estimation which allows a memory estimation without loading the graph. Additionally algorithm specific parameters can also be provided as config which influence the estimation of memory usage specific to the algorithm.

Table 5.136. Memory estimation results.
Name Type Description

nodeCount

Integer

Node count of the graph used in the estimation.

relationshipCount

Integer

Relationship count of the graph used in the estimation.

requiredMemory

Integer

Human readable version for required memory.

bytesMin

Integer

Minimum number of bytes to be consumed.

bytesMax

Integer

Maximum number of bytes to be consumed.

heapPercentageMin

Float

The minimum percentage of the configured max heap (-Xmx) to be consumed.

heapPercentageMax

Float

The maximum percentage of the configured max heap (-Xmx) to be consumed.

treeView

Map

Human readable version of memory estimation.

mapView

Map

Detailed information on memory consumption.

5.3.3.3. Examples

Consider the graph created by the following Cypher statement:

CREATE (nAlice:User {name: 'Alice'})
CREATE (nBridget:User {name: 'Bridget'})
CREATE (nCharles:User {name: 'Charles'})
CREATE (nDoug:User {name: 'Doug'})
CREATE (nMark:User {name: 'Mark'})
CREATE (nMichael:User {name: 'Michael'})

CREATE (nAlice)-[:LINK {weight: 0.5}]->(nBridget)
CREATE (nAlice)-[:LINK {weight: 4}]->(nCharles)
CREATE (nMark)-[:LINK {weight: 1.1}]->(nDoug)
CREATE (nMark)-[:LINK {weight: 2}]->(nMichael);

This graph has two connected components, each with three nodes. The relationships that connect the nodes in each component have a property weight which determines the strength of the relationship. In the following examples we will demonstrate using the Weakly Connected Components algorithm on this graph.

We can load this graph into the in-memory catalog.

In the examples below we will use named graphs and standard projections as the norm. However, Cypher projection and anonymous graphs could also be used.

The following statement will create the graph and store it in the graph catalog. 

CALL gds.graph.create(
    'myGraph',
    'User',
    'LINK',
    {
        relationshipProperties: 'weight'
    }
)

In the following examples we will demonstrate using the WCC algorithm on this graph.

Unweighted

The following will run the algorithm and stream results: 

CALL gds.wcc.stream('myGraph')
YIELD nodeId, componentId
RETURN gds.util.asNode(nodeId).name AS name, componentId ORDER BY componentId, name

Table 5.137. Results
name componentId

"Alice"

0

"Bridget"

0

"Charles"

0

"Doug"

3

"Mark"

3

"Michael"

3

To instead write the component ID to a node property in the Neo4j graph, use this query:

The following will run the algorithm and write back results: 

CALL gds.wcc.write('myGraph', { writeProperty: 'componentId' })
YIELD nodePropertiesWritten, componentCount;

Table 5.138. Results
nodePropertiesWritten componentCount

6

2

As we can see from the results, the nodes connected to one another are calculated by the algorithm as belonging to the same connected component.

Weighted

By configuring the algorithm to use a weight we can increase granularity in the way the algorithm calculates component assignment. We do this by specifying the property key with the relationshipWeightProperty configuration parameter. Additionally, we can specify a threshold for the weight value. Then, only weights greater than the threshold value will be considered by the algorithm. We do this by specifying the threshold value with the threshold configuration parameter.

If a relationship does not have a weight property, a default weight is used. The default is zero, and can be configured to another value using the defaultValue configuration parameter.

The following will run the algorithm and stream results: 

CALL gds.wcc.stream('myGraph', { relationshipWeightProperty: 'weight', threshold: 1.0 })
YIELD nodeId, componentId
RETURN gds.util.asNode(nodeId).name AS Name, componentId AS ComponentId ORDER BY ComponentId, Name

Table 5.139. Results
Name ComponentId

"Alice"

0

"Charles"

0

"Bridget"

1

"Doug"

3

"Mark"

3

"Michael"

3

The following will run the algorithm and write back results: 

CALL gds.wcc.write('myGraph', {
    writeProperty: 'componentId',
    relationshipWeightProperty: 'weight',
    threshold: 1.0
})
YIELD nodePropertiesWritten, componentCount;

Table 5.140. Results
nodePropertiesWritten componentCount

6

3

As we can see from the results, the node named 'Bridget' is now in its own component, due to its relationship weight being less than the configured threshold and thus ignored.

Seeded components

It is possible to define preliminary component IDs for nodes using the seedProperty configuration parameter. This is helpful if we want to retain components from a previous run and it is known that no components have been split by removing relationships. The property value needs to be a number.

The algorithm first checks if there is a seeded component ID assigned to the node. If there is one, that component ID is used. Otherwise, a new unique component ID is assigned to the node.

Once every node belongs to a component, the algorithm merges components of connected nodes. When components are merged, the resulting component is always the one with the lower component ID. Note that the consecutiveIds configuration option cannot be used in combination with seeding in order to retain the seeding values.

The algorithm assumes that nodes with the same seed value do in fact belong to the same component. If any two nodes in different components have the same seed, behavior is undefined. It is then recommended to run WCC without seeds.

To show this in practice, we will run the algorithm, then add another node to our graph, then run the algorithm again with the seedProperty configuration parameter. We will use the weighted variant of WCC.

The following will run the algorithm and write back results: 

CALL gds.wcc.write('myGraph', {
    writeProperty: 'componentId',
    relationshipWeightProperty: 'weight',
    threshold: 1.0
})
YIELD nodePropertiesWritten, componentCount;

Table 5.141. Results
nodePropertiesWritten componentCount

6

3

The following will create a new node in the Neo4j graph, with no component ID: 

MATCH (b:User {name: 'Bridget'})
CREATE (b)-[:LINK {weight: 2.0}]->(new:User {name: 'Mats'})

Note, that we can not use our already created graph as it does not contain the component id. We will therefore create a second graph that contains the previously computed component id.

The following will create a new graph containing the previously computed component id: 

CALL gds.graph.create(
    'myGraph-seeded',
    'User',
    'LINK',
    {
        nodeProperties: 'componentId',
        relationshipProperties: 'weight'
    }
)

The following will run the algorithm and stream results: 

CALL gds.wcc.stream('myGraph-seeded', {
    seedProperty: 'componentId',
    relationshipWeightProperty: 'weight',
    threshold: 1.0
})
YIELD nodeId, componentId
RETURN gds.util.asNode(nodeId).name AS name, componentId ORDER BY componentId, name

Table 5.142. Results
name componentId

"Alice"

0

"Charles"

0

"Bridget"

1

"Mats"

1

"Doug"

3

"Mark"

3

"Michael"

3

The following will run the algorithm and write back results: 

CALL gds.wcc.write('myGraph-seeded', {
    seedProperty: 'componentId',
    writeProperty: 'componentId',
    relationshipWeightProperty: 'weight',
    threshold: 1.0
})
YIELD nodePropertiesWritten, componentCount;

Table 5.143. Results
nodePropertiesWritten componentCount

1

3

If the seedProperty configuration parameter has the same value as writeProperty, the algorithm only writes properties for nodes where the component ID has changed. If they differ, the algorithm writes properties for all nodes.

Memory Estimation

The following will estimate the memory requirements for running the algorithm: 

CALL gds.wcc.write.estimate('myGraph', {
  writeProperty: 'communityId'
})
YIELD nodeCount, relationshipCount, bytesMin, bytesMax, requiredMemory

Table 5.144. Results
nodeCount relationshipCount bytesMin bytesMax requiredMemory

6

4

176

176

"176 Bytes"

Stats

The following will run the algorithm and returns the result in form of statistical and measurement values. 

CALL gds.wcc.stats('myGraph')
YIELD componentCount

Table 5.145. Results
componentCount

2