Weakly Connected Components
Glossary
 Directed

Directed trait. The algorithm is welldefined on a directed graph.
 Directed

Directed trait. The algorithm ignores the direction of the graph.
 Directed

Directed trait. The algorithm does not run on a directed graph.
 Undirected

Undirected trait. The algorithm is welldefined on an undirected graph.
 Undirected

Undirected trait. The algorithm ignores the undirectedness of the graph.
 Heterogeneous nodes

Heterogeneous nodes fully supported. The algorithm has the ability to distinguish between nodes of different types.
 Heterogeneous nodes

Heterogeneous nodes allowed. The algorithm treats all selected nodes similarly regardless of their label.
 Heterogeneous relationships

Heterogeneous relationships fully supported. The algorithm has the ability to distinguish between relationships of different types.
 Heterogeneous relationships

Heterogeneous relationships allowed. The algorithm treats all selected relationships similarly regardless of their type.
 Weighted relationships

Weighted trait. The algorithm supports a relationship property to be used as weight, specified via the relationshipWeightProperty configuration parameter.
 Weighted relationships

Weighted trait. The algorithm treats each relationship as equally important, discarding the value of any relationship weight.
1. Introduction
The Weakly Connected Components (WCC) algorithm finds sets of connected nodes in directed and undirected graphs.
Two nodes are connected, if there exists a path between them.
The set of all nodes that are connected with each other form a component.
In contrast to Strongly Connected Components (SCC), the direction of relationships on the path between two nodes is not considered.
For example, in a directed graph (a)→(b)
, a
and b
will be in the same component, even if there is no directed relationship (b)→(a)
.
WCC is often used early in an analysis to understand the structure of a graph. Using WCC to understand the graph structure enables running other algorithms independently on an identified cluster.
The implementation of the algorithm is based on the following papers:
2. Syntax
This section covers the syntax used to execute the Weakly Connected Components algorithm in each of its execution modes. We are describing the named graph variant of the syntax. To learn more about general syntax variants, see Syntax overview.
CALL gds.wcc.stream(
graphName: String,
configuration: Map
)
YIELD
nodeId: Integer,
componentId: Integer
Name  Type  Default  Optional  Description 

graphName 
String 

no 
The name of a graph stored in the catalog. 
configuration 
Map 

yes 
Configuration for algorithmspecifics and/or graph filtering. 
Name  Type  Default  Optional  Description 

List of String 

yes 
Filter the named graph using the given node labels. 

List of String 

yes 
Filter the named graph using the given relationship types. 

Integer 

yes 
The number of concurrent threads used for running the algorithm. 

String 

yes 
An ID that can be provided to more easily track the algorithm’s progress. 

Boolean 

yes 
If disabled the progress percentage will not be logged. 

String 

yes 
Name of the relationship property to use as weights. If unspecified, the algorithm runs unweighted. 

String 

yes 
Used to set the initial component for a node. The property value needs to be a number. 

threshold 
Float 

yes 
The value of the weight above which the relationship is considered in the computation. 
consecutiveIds 
Boolean 

yes 
Flag to decide whether component identifiers are mapped into a consecutive id space (requires additional memory). 
minComponentSize 
Integer 

yes 
Only nodes inside communities larger or equal the given value are returned. 
Name  Type  Description 

nodeId 
Integer 
Node ID. 
componentId 
Integer 
Component ID. 
CALL gds.wcc.stats(
graphName: String,
configuration: Map
)
YIELD
componentCount: Integer,
preProcessingMillis: Integer,
computeMillis: Integer,
postProcessingMillis: Integer,
componentDistribution: Map,
configuration: Map
Name  Type  Default  Optional  Description 

graphName 
String 

no 
The name of a graph stored in the catalog. 
configuration 
Map 

yes 
Configuration for algorithmspecifics and/or graph filtering. 
Name  Type  Default  Optional  Description 

List of String 

yes 
Filter the named graph using the given node labels. 

List of String 

yes 
Filter the named graph using the given relationship types. 

Integer 

yes 
The number of concurrent threads used for running the algorithm. 

String 

yes 
An ID that can be provided to more easily track the algorithm’s progress. 

Boolean 

yes 
If disabled the progress percentage will not be logged. 

String 

yes 
Name of the relationship property to use as weights. If unspecified, the algorithm runs unweighted. 

String 

yes 
Used to set the initial component for a node. The property value needs to be a number. 

threshold 
Float 

yes 
The value of the weight above which the relationship is considered in the computation. 
consecutiveIds 
Boolean 

yes 
Flag to decide whether component identifiers are mapped into a consecutive id space (requires additional memory). 
Name  Type  Description 

componentCount 
Integer 
The number of computed components. 
preProcessingMillis 
Integer 
Milliseconds for preprocessing the data. 
computeMillis 
Integer 
Milliseconds for running the algorithm. 
postProcessingMillis 
Integer 
Milliseconds for computing component count and distribution statistics. 
componentDistribution 
Map 
Map containing min, max, mean as well as p50, p75, p90, p95, p99 and p999 percentile values of component sizes. 
configuration 
Map 
The configuration used for running the algorithm. 
CALL gds.wcc.mutate(
graphName: String,
configuration: Map
)
YIELD
componentCount: Integer,
nodePropertiesWritten: Integer,
preProcessingMillis: Integer,
computeMillis: Integer,
mutateMillis: Integer,
postProcessingMillis: Integer,
componentDistribution: Map,
configuration: Map
Name  Type  Default  Optional  Description 

graphName 
String 

no 
The name of a graph stored in the catalog. 
configuration 
Map 

yes 
Configuration for algorithmspecifics and/or graph filtering. 
Name  Type  Default  Optional  Description 

mutateProperty 
String 

no 
The node property in the GDS graph to which the component ID is written. 
List of String 

yes 
Filter the named graph using the given node labels. 

List of String 

yes 
Filter the named graph using the given relationship types. 

Integer 

yes 
The number of concurrent threads used for running the algorithm. 

String 

yes 
An ID that can be provided to more easily track the algorithm’s progress. 

String 

yes 
Name of the relationship property to use as weights. If unspecified, the algorithm runs unweighted. 

String 

yes 
Used to set the initial component for a node. The property value needs to be a number. 

threshold 
Float 

yes 
The value of the weight above which the relationship is considered in the computation. 
consecutiveIds 
Boolean 

yes 
Flag to decide whether component identifiers are mapped into a consecutive id space (requires additional memory). 
Name  Type  Description 

componentCount 
Integer 
The number of computed components. 
nodePropertiesWritten 
Integer 
The number of node properties written. 
preProcessingMillis 
Integer 
Milliseconds for preprocessing the data. 
computeMillis 
Integer 
Milliseconds for running the algorithm. 
mutateMillis 
Integer 
Milliseconds for adding properties to the projected graph. 
postProcessingMillis 
Integer 
Milliseconds for computing component count and distribution statistics. 
componentDistribution 
Map 
Map containing min, max, mean as well as p50, p75, p90, p95, p99 and p999 percentile values of component sizes. 
configuration 
Map 
The configuration used for running the algorithm. 
CALL gds.wcc.write(
graphName: String,
configuration: Map
)
YIELD
componentCount: Integer,
nodePropertiesWritten: Integer,
preProcessingMillis: Integer,
computeMillis: Integer,
writeMillis: Integer,
postProcessingMillis: Integer,
componentDistribution: Map,
configuration: Map
Name  Type  Default  Optional  Description 

graphName 
String 

no 
The name of a graph stored in the catalog. 
configuration 
Map 

yes 
Configuration for algorithmspecifics and/or graph filtering. 
Name  Type  Default  Optional  Description 

List of String 

yes 
Filter the named graph using the given node labels. 

List of String 

yes 
Filter the named graph using the given relationship types. 

Integer 

yes 
The number of concurrent threads used for running the algorithm. 

String 

yes 
An ID that can be provided to more easily track the algorithm’s progress. 

Boolean 

yes 
If disabled the progress percentage will not be logged. 

Integer 

yes 
The number of concurrent threads used for writing the result to Neo4j. 

String 

no 
The node property in the Neo4j database to which the component ID is written. 

String 

yes 
Name of the relationship property to use as weights. If unspecified, the algorithm runs unweighted. 

String 

yes 
Used to set the initial component for a node. The property value needs to be a number. 

threshold 
Float 

yes 
The value of the weight above which the relationship is considered in the computation. 
consecutiveIds 
Boolean 

yes 
Flag to decide whether component identifiers are mapped into a consecutive id space (requires additional memory). 
minComponentSize 
Integer 

yes 
Only nodes inside communities larger or equal the given value will be written to the underlying Neo4j database. 
Name  Type  Description 

componentCount 
Integer 
The number of computed components. 
nodePropertiesWritten 
Integer 
The number of node properties written. 
preProcessingMillis 
Integer 
Milliseconds for preprocessing the data. 
computeMillis 
Integer 
Milliseconds for running the algorithm. 
writeMillis 
Integer 
Milliseconds for writing result back to Neo4j. 
postProcessingMillis 
Integer 
Milliseconds for computing component count and distribution statistics. 
componentDistribution 
Map 
Map containing min, max, mean as well as p50, p75, p90, p95, p99 and p999 percentile values of component sizes. 
configuration 
Map 
The configuration used for running the algorithm. 
3. Examples
In this section we will show examples of running the Weakly Connected Components algorithm on a concrete graph. The intention is to illustrate what the results look like and to provide a guide in how to make use of the algorithm in a real setting. We will do this on a small user network graph of a handful nodes connected in a particular pattern. The example graph looks like this:
CREATE
(nAlice:User {name: 'Alice'}),
(nBridget:User {name: 'Bridget'}),
(nCharles:User {name: 'Charles'}),
(nDoug:User {name: 'Doug'}),
(nMark:User {name: 'Mark'}),
(nMichael:User {name: 'Michael'}),
(nAlice)[:LINK {weight: 0.5}]>(nBridget),
(nAlice)[:LINK {weight: 4}]>(nCharles),
(nMark)[:LINK {weight: 1.1}]>(nDoug),
(nMark)[:LINK {weight: 2}]>(nMichael);
This graph has two connected components, each with three nodes.
The relationships that connect the nodes in each component have a property weight
which determines the strength of the relationship.
In the examples below we will use named graphs and native projections as the norm. However, Cypher projections can also be used. 
CALL gds.graph.project(
'myGraph',
'User',
'LINK',
{
relationshipProperties: 'weight'
}
)
In the following examples we will demonstrate using the Weakly Connected Components algorithm on this graph.
3.1. Memory Estimation
First off, we will estimate the cost of running the algorithm using the estimate
procedure.
This can be done with any execution mode.
We will use the write
mode in this example.
Estimating the algorithm is useful to understand the memory impact that running the algorithm on your graph will have.
When you later actually run the algorithm in one of the execution modes the system will perform an estimation.
If the estimation shows that there is a very high probability of the execution going over its memory limitations, the execution is prohibited.
To read more about this, see Automatic estimation and execution blocking.
For more details on estimate
in general, see Memory Estimation.
CALL gds.wcc.write.estimate('myGraph', { writeProperty: 'component' })
YIELD nodeCount, relationshipCount, bytesMin, bytesMax, requiredMemory
nodeCount  relationshipCount  bytesMin  bytesMax  requiredMemory 

6 
4 
112 
112 
"112 Bytes" 
3.2. Stream
In the stream
execution mode, the algorithm returns the component ID for each node.
This allows us to inspect the results directly or postprocess them in Cypher without any side effects.
For example, we can order the results to see the nodes that belong to the same component displayed next to each other.
For more details on the stream
mode in general, see Stream.
CALL gds.wcc.stream('myGraph')
YIELD nodeId, componentId
RETURN gds.util.asNode(nodeId).name AS name, componentId
ORDER BY componentId, name
name  componentId 

"Alice" 
0 
"Bridget" 
0 
"Charles" 
0 
"Doug" 
3 
"Mark" 
3 
"Michael" 
3 
The result shows that the algorithm identifies two components. This can be verified in the example graph.
The default behaviour of the algorithm is to run unweighted
, e.g. without using relationship
weights.
The weighted
option will be demonstrated in Weighted
3.3. Stats
In the stats
execution mode, the algorithm returns a single row containing a summary of the algorithm result.
This execution mode does not have any side effects.
It can be useful for evaluating algorithm performance by inspecting the computeMillis
return item.
In the examples below we will omit returning the timings.
The full signature of the procedure can be found in the syntax section.
For more details on the stats
mode in general, see Stats.
stats
mode:CALL gds.wcc.stats('myGraph')
YIELD componentCount
componentCount 

2 
The result shows that myGraph
has two components and this can be verified by looking at the example graph.
3.4. Mutate
The mutate
execution mode extends the stats
mode with an important side effect: updating the named graph with a new node property containing the component ID for that node.
The name of the new property is specified using the mandatory configuration parameter mutateProperty
.
The result is a single summary row, similar to stats
, but with some additional metrics.
The mutate
mode is especially useful when multiple algorithms are used in conjunction.
For more details on the mutate
mode in general, see Mutate.
mutate
mode:CALL gds.wcc.mutate('myGraph', { mutateProperty: 'componentId' })
YIELD nodePropertiesWritten, componentCount;
nodePropertiesWritten  componentCount 

6 
2 
3.5. Write
The write
execution mode extends the stats
mode with an important side effect: writing the component ID for each node as a property to the Neo4j database.
The name of the new property is specified using the mandatory configuration parameter writeProperty
.
The result is a single summary row, similar to stats
, but with some additional metrics.
The write
mode enables directly persisting the results to the database.
For more details on the write
mode in general, see Write.
write
mode:CALL gds.wcc.write('myGraph', { writeProperty: 'componentId' })
YIELD nodePropertiesWritten, componentCount;
nodePropertiesWritten  componentCount 

6 
2 
As we can see from the results, the nodes connected to one another are calculated by the algorithm as belonging to the same connected component.
3.6. Weighted
By configuring the algorithm to use a weight we can increase granularity in the way the algorithm calculates component assignment.
We do this by specifying the property key with the relationshipWeightProperty
configuration parameter.
Additionally, we can specify a threshold for the weight value.
Then, only weights greater than the threshold value will be considered by the algorithm.
We do this by specifying the threshold value with the threshold
configuration parameter.
If a relationship does not have the specified weight property, the algorithm falls back to using a default value of zero.
CALL gds.wcc.stream('myGraph', {
relationshipWeightProperty: 'weight',
threshold: 1.0
}) YIELD nodeId, componentId
RETURN gds.util.asNode(nodeId).name AS Name, componentId AS ComponentId
ORDER BY ComponentId, Name
Name  ComponentId 













As we can see from the results, the node named 'Bridget' is now in its own component, due to its relationship weight being less than the configured threshold and thus ignored.
We are using stream mode to illustrate running the algorithm as weighted or unweighted, all the other algorithm modes also support this configuration parameter. 
3.7. Seeded components
It is possible to define preliminary component IDs for nodes using the seedProperty
configuration parameter.
This is helpful if we want to retain components from a previous run and it is known that no components have been split by removing relationships.
The property value needs to be a number.
The algorithm first checks if there is a seeded component ID assigned to the node. If there is one, that component ID is used. Otherwise, a new unique component ID is assigned to the node.
Once every node belongs to a component, the algorithm merges components of connected nodes.
When components are merged, the resulting component is always the one with the lower component ID.
Note that the consecutiveIds
configuration option cannot be used in combination with seeding in order to retain the seeding values.
The algorithm assumes that nodes with the same seed value do in fact belong to the same component. If any two nodes in different components have the same seed, behavior is undefined. It is then recommended running WCC without seeds. 
To demonstrate this in practice, we will go through a few steps:

We will run the algorithm and write the results to Neo4j.

Then we will add another node to our graph, this node will not have the property computed in Step 1.

We will project a new graph that has the result from Step 1 as
nodeProperty

And then we will run the algorithm again, this time in
stream
mode, and we will use theseedProperty
configuration parameter.
We will use the weighted variant of WCC.
Step 1
write
mode:CALL gds.wcc.write('myGraph', {
writeProperty: 'componentId',
relationshipWeightProperty: 'weight',
threshold: 1.0
})
YIELD nodePropertiesWritten, componentCount;
nodePropertiesWritten  componentCount 

6 
3 
Step 2
After the algorithm has finished writing to Neo4j we want to create a new node in the database.
MATCH (b:User {name: 'Bridget'})
CREATE (b)[:LINK {weight: 2.0}]>(new:User {name: 'Mats'})
Step 3
Note, that we cannot use our already projected graph as it does not contain the component id. We will therefore project a second graph that contains the previously computed component id.
CALL gds.graph.project(
'myGraphseeded',
'User',
'LINK',
{
nodeProperties: 'componentId',
relationshipProperties: 'weight'
}
)
Step 4
stream
mode using seedProperty
:CALL gds.wcc.stream('myGraphseeded', {
seedProperty: 'componentId',
relationshipWeightProperty: 'weight',
threshold: 1.0
}) YIELD nodeId, componentId
RETURN gds.util.asNode(nodeId).name AS name, componentId
ORDER BY componentId, name
name  componentId 

"Alice" 
0 
"Charles" 
0 
"Bridget" 
1 
"Mats" 
1 
"Doug" 
3 
"Mark" 
3 
"Michael" 
3 
The result shows that despite not having the seedProperty
when it was projected, the node 'Mats' has been assigned to the same component as the node 'Bridget'.
This is correct because these two nodes are connected.
3.8. Writing Seeded components
In the previous section we demonstrated the seedProperty
usage in stream
mode.
It is also available in the other modes of the algorithm.
Below is an example on how to use seedProperty
in write
mode.
Note that the example below relies on Steps 1  3 from the previous section.
write
mode using seedProperty
:CALL gds.wcc.write('myGraphseeded', {
seedProperty: 'componentId',
writeProperty: 'componentId',
relationshipWeightProperty: 'weight',
threshold: 1.0
})
YIELD nodePropertiesWritten, componentCount;
nodePropertiesWritten  componentCount 

1 
3 
If the 
3.9. Graph Sampling optimization
The WCC implementation provides two compute strategies:

The unsampled strategy as described in Waitfree Parallel Algorithms for the UnionFind Problem.

The sampled strategy as described in Optimizing Parallel Graph Connectivity Computation via Subgraph Sampling
While both strategies provide very good performance, the sampled strategy is usually the faster one. The decision, which strategy to use, depends on the input graph. If the relationships of the graph are …

… undirected, the algorithm picks the sampled strategy.

… directed, the algorithm picks the unsampled strategy.

… directed and inverse indexed, the algorithm picks the sampled strategy.
The direction of a relationship is defined by the orientation
which can be set during a graph projection.
While NATURAL
and REVERSE
orientation result in a directed graph, the UNDIRECTED
orientation leads to undirected relationships.
In order to create a directed graph with inverse indexed relationships, one can use the indexInverse
parameter as part of the relationship projection.
An inverse index allows the algorithm to traverse the relationships of a node according to the opposite orientation.
If the graph is projected using a NATURAL
orientation, the inverse index represents the REVERSE
orientation and vice versa.
myIndexedGraph
.CALL gds.graph.project(
'myIndexedGraph',
'User',
{LINK: {orientation: 'NATURAL', indexInverse: true }}
)
The following query is identical to the stream example in the previous section.
This time, we execute WCC on myIndexedGraph
which will allow the algorithm to use the sampled strategy.
CALL gds.wcc.stream('myIndexedGraph', {concurrency: 1, consecutiveIds: true})
YIELD nodeId, componentId
RETURN gds.util.asNode(nodeId).name AS name, componentId
ORDER BY componentId, name
name  componentId 

"Alice" 
0 
"Bridget" 
0 
"Charles" 
0 
"Doug" 
1 
"Mark" 
1 
"Michael" 
1 
Because of the randomness in the Graph sampling optimization we are using 
Was this page helpful?