6.3. The Weakly Connected Components algorithm

This section describes the Weakly Connected Components (WCC) algorithm in the Neo4j Graph Algorithms library.

This topic includes:

6.3.1. Introduction

The WCC algorithm finds sets of connected nodes in an undirected graph, where all nodes in the same set form a connected component. WCC is often used early in an analysis to understand the structure of a graph.

WCC has previously been known as Union Find or Connected Components in this User Guide. Currently, the WCC algorithm still uses the syntax of algo.unionFind.

For more information on this algorithm, see:

Running this algorithm requires sufficient memory availability. Before running this algorithm, we recommend that you read Section 2.4, “Memory Requirements”.

6.3.2. Syntax

The following describes the API for running the algorithm and writing results back to Neo4j: 

CALL algo.unionFind(label: STRING, relationship: STRING, {
  write: BOOLEAN,
  writeProperty: STRING
  // additional configuration
})
YIELD nodes, setCount, loadMillis, computeMillis, writeMillis

Table 6.22. Parameters
Name Type Default Optional Description

node label

string

null

yes

The node label to load from the graph. If null, load all nodes.

relationship

string

null

yes

The relationship type to load from the graph. If null, load all relationships.

config

map

{}

yes

Additional configuration, see below.

Table 6.23. Configuration
Name Type Default Optional Description

concurrency

int

available CPUs

yes

The number of concurrent threads used for running the algorithm. Also provides the default value for 'readConcurrency' and 'writeConcurrency'. This is dependent on the Neo4j edition; for more information, see Section 1.4.2, “CPU”.

readConcurrency

int

value of 'concurrency'

yes

The number of concurrent threads used for reading the graph.

writeConcurrency

int

value of 'concurrency'

yes

The number of concurrent threads used for writing the result.

weightProperty

string

null

yes

The property name that contains weight. If null, treats the graph as unweighted. Must be numeric.

seedProperty

string

n/a

yes

Used to set the initial community for a node. The property value needs to be a number.

write

boolean

true

yes

Specifies if the result should be written back as a node property.

writeProperty

string

'partition'

yes

The property name written back the ID of the partition particular node belongs to.

threshold

float

null

yes

The value of the weight above which the relationship is not thrown away.

defaultValue

float

null

yes

The default value of the weight in case it is missing or invalid.

consecutiveIds

boolean

false

yes

Community identifiers are mapped into a consecutive id space (requires additional memory).

graph

string

'huge'

yes

Use 'huge' when describing the subset of the graph with node label and relationship type parameters. Use 'cypher' for describing the subset using Cypher queries for nodes and relationships.

Table 6.24. Results
Name Type Description

loadMillis

int

Milliseconds for loading data.

computeMillis

int

Milliseconds for running the algorithm.

writeMillis

int

Milliseconds for writing result data back.

postProcessingMillis

int

Milliseconds for computing percentiles and community count.

nodes

int

The number of nodes considered.

communityCount

int

The number of communities found.

p1

double

The 1 percentile of community size.

p5

double

The 5 percentile of community size.

p10

double

The 10 percentile of community size.

p25

double

The 25 percentile of community size.

p50

double

The 50 percentile of community size.

p75

double

The 75 percentile of community size.

p90

double

The 90 percentile of community size.

p95

double

The 95 percentile of community size.

p99

double

The 99 percentile of community size.

p100

double

The 100 percentile of community size.

write

boolean

Specifies if the result was written back as a node property.

writeProperty

string

The property name written back to.

The following describes the API for running the algorithm and stream results: 

CALL algo.unionFind.stream(label: STRING, relationship: STRING, {
  // configuration
})
YIELD nodeId, setId

Table 6.25. Parameters
Name Type Default Optional Description

node label

string

null

yes

The node label to load from the graph. If null, load all nodes.

relationship type

string

null

yes

The relationship type to load from the graph. If null, load all relationships.

config

map

{}

yes

Additional configuration, see below.

Table 6.26. Configuration
Name Type Default Optional Description

concurrency

int

available CPUs

yes

The number of concurrent threads used for running the algorithm. Also provides the default value for 'readConcurrency'. This is dependent on the Neo4j edition; for more information, see Section 1.4.2, “CPU”.

readConcurrency

int

value of 'concurrency'

yes

The number of concurrent threads used for reading the graph.

weightProperty

string

null

yes

The property name that contains weight. If null, treats the graph as unweighted. Must be numeric.

seedProperty

string

n/a

yes

Used to set the initial community for a node. The property value needs to be a number.

threshold

float

null

yes

The value of the weight above which the relationship is not thrown away.

defaultValue

float

null

yes

The default value of the weight in case it is missing or invalid.

consecutiveIds

boolean

false

yes

Community identifiers are mapped into a consecutive id space (requires additional memory).

graph

string

'huge'

yes

Use 'huge' when describing the subset of the graph with node label and relationship type parameters. Use 'cypher' for describing the subset using Cypher queries for nodes and relationships.

Table 6.27. Results
Name Type Description

nodeId

int

Node ID

setId

int

Partition ID

6.3.3. Examples

Consider the graph created by the following Cypher statement:

CREATE (nAlice:User {name: 'Alice'})
CREATE (nBridget:User {name: 'Bridget'})
CREATE (nCharles:User {name: 'Charles'})
CREATE (nDoug:User {name: 'Doug'})
CREATE (nMark:User {name: 'Mark'})
CREATE (nMichael:User {name: 'Michael'})

CREATE (nAlice)-[:LINK {weight: 0.5}]->(nBridget)
CREATE (nAlice)-[:LINK {weight: 4}]->(nCharles)
CREATE (nMark)-[:LINK {weight: 1.1}]->(nDoug)
CREATE (nMark)-[:LINK {weight: 2}]->(nMichael);

This graph has two connected components, each with three nodes. The relationships that connect the nodes in each component have a property weight which determines the strength of the relationship. In the following examples we will demonstrate using the Weakly Connected Components algorithm on this graph.

6.3.3.1. Unweighted

The following will load the graph, run the algorithm, and stream results: 

CALL algo.unionFind.stream('User', 'LINK')
YIELD nodeId, setId
RETURN algo.asNode(nodeId).name AS Name, setId AS ComponentId
ORDER BY ComponentId, Name

Table 6.28. Results
Name ComponentId

"Alice"

0

"Bridget"

0

"Charles"

0

"Doug"

3

"Mark"

3

"Michael"

3

To instead write the component ID to a node property in the Neo4j graph, use this query:

The following will load the graph with weights, run the algorithm, and write back results: 

CALL algo.unionFind('User', 'LINK', {
  write: true,
  writeProperty: 'componentId'
})
YIELD nodes AS Nodes, setCount AS NbrOfComponents, writeProperty AS PropertyName;

Table 6.29. Results
Nodes NbrOfComponents PropertyName

6

2

"componentId"

As we can see from the results, the nodes connected to one another are calculated by the algorithm as belonging to the same connected component.

6.3.3.2. Weighted

By configuring the algorithm to use a weight we can increase granularity in the way the algorithm calculates component assignment. We do this by specifying the property key with the weightProperty configuration parameter. Additionally, we can specify a threshold for the weight value in such a way that only weighs greater than the threshold value will be considered by the algorithm. We do this by specifying the threshold value with the threshold configuration parameter.

If a relationship does not have a weight property, a default weight is used. The default is zero, and can be configured to another value using the defaultValue configuration parameter.

The following will load the graph with weights, run the algorithm, and stream results: 

CALL algo.unionFind.stream('User', 'LINK', {
  weightProperty: 'weight',
  threshold: 1.0
})
YIELD nodeId, setId

RETURN algo.asNode(nodeId).name AS Name, setId AS ComponentId
ORDER BY ComponentId, Name

Table 6.30. Results
Name ComponentId

"Alice"

0

"Charles"

0

"Bridget"

1

"Doug"

3

"Mark"

3

"Michael"

3

The following will load the graph with weights, run the algorithm, and write back results: 

CALL algo.unionFind('User', 'LINK', {
  write: true,
  writeProperty: "componentId",
  weightProperty: 'weight',
  threshold: 1.0
})
YIELD nodes AS Nodes, setCount AS NbrOfComponents, writeProperty AS PropertyName;

Table 6.31. Results
Nodes NbrOfComponents PropertyName

6

3

"componentId"

As we can see from the results, the node named 'Bridget' is now in its own component, due to its relationship weight being less than the configured threshold and thus ignored.

6.3.3.3. Seeded components

It is possible to define preliminary component IDs for nodes using the seedProperty configuration parameter. This is helpful if we want to retain components from a previous run and it is known that no components have been split by removing relationships. The property value needs to be a number.

The algorithm first checks if there is a seeded component ID assigned to the node. If there is one, that component ID is used. Otherwise, a new unique component ID is assigned to the node.

Once every node belongs to a component, the algorithm merges components of connected nodes. When components are merged, the resulting component is always the one with the lower component ID.

The algorithm assumes that nodes with the same seed value do in fact belong to the same component. If any two nodes in different components have the same seed, behavior is undefined. It is then recommended to run WCC without seeds.

To show this in practice, we will run the algorithm, then add another node to our graph, then run the algorithm again with the seedProperty configuration parameter. We will use the weighted variant of WCC.

The following will load the graph, run the algorithm, and write back results: 

CALL algo.unionFind('User', 'LINK', {
  write: true,
  writeProperty: 'componentId',
  weightProperty: 'weight',
  threshold: 1.0
})
YIELD nodes AS Nodes, setCount AS NbrOfComponents, writeProperty AS PropertyName;

Table 6.32. Results
Nodes NbrOfComponents PropertyName

6

3

"componentId"

The following will create a new node in the Neo4j graph, with no component ID: 

MATCH (b:User {name: 'Bridget'})
CREATE (b)-[:LINK {weight: 2.0}]->(new:User {name: 'Mats'})

Table 6.33. Results

No data returned.

The following will load the graph, run the algorithm, and stream results: 

CALL algo.unionFind.stream('User', 'LINK', {
  seedProperty: 'componentId',
  weightProperty: 'weight',
  threshold: 1.0
})
YIELD nodeId, setId

RETURN algo.asNode(nodeId).name AS Name, setId AS ComponentId
ORDER BY ComponentId, Name

Table 6.34. Results
Name ComponentId

"Alice"

0

"Charles"

0

"Bridget"

1

"Mats"

1

"Doug"

3

"Mark"

3

"Michael"

3

The following will load the graph, run the algorithm, and write back results: 

CALL algo.unionFind('User', 'LINK', {
  seedProperty: 'componentId',
  weightProperty: 'weight',
  threshold: 1.0,
  write: true,
  writeProperty: 'componentId'
})
YIELD nodes AS Nodes, setCount AS NbrOfComponents, writeProperty AS PropertyName;

Table 6.35. Results
Nodes NbrOfComponents PropertyName

7

3

"componentId"

If the seedProperty configuration parameter has the same value as writeProperty, the algorithm only writes properties for nodes where the component ID has changed. If they differ, the algorithm writes properties for all nodes.

6.3.3.4. Named graphs and Cypher projections

In the examples above, we have relied on the implicit loading of graphs for the algorithm computation. However, like other algorithms WCC also accepts named graphs and Cypher projections as inputs. See Projected Graph Model for more details.

Using a named graph: 

CALL algo.graph.load('myGraph', 'User', 'LINK');

CALL algo.unionFind.stream(null, null, {graph: 'myGraph'})
YIELD nodeId, setId
RETURN algo.asNode(nodeId).name AS Name, setId AS ComponentId
ORDER BY ComponentId, Name;

Table 6.36. Results
Name ComponentId

"Alice"

0

"Bridget"

0

"Charles"

0

"Doug"

3

"Mark"

3

"Michael"

3

As we can see, the results are identical to the results in the Section 6.3.3.1, “Unweighted” example.

Using a Cypher projection: 

CALL algo.unionFind.stream(
  'MATCH (u:User) RETURN id(u) AS id',
  'MATCH (u1:User)-[:LINK]->(u2:User)
   RETURN id(u1) AS source, id(u2) AS target',
   {graph:'cypher'}
)
YIELD nodeId, setId
RETURN algo.asNode(nodeId).name AS Name, setId AS ComponentId
ORDER BY ComponentId, Name

Table 6.37. Results
Name ComponentId

"Alice"

0

"Bridget"

0

"Charles"

0

"Doug"

3

"Mark"

3

"Michael"

3

Again, results are identical, as the Cypher projection we use mimics the behaviour of the default loading configuration. Of course, the Cypher projection feature enables more advanced control over which exact parts of the graph to compute over; please see Section 2.2, “Cypher projection” for more details.