This section describes the Weakly Connected Components (WCC) algorithm in the Neo4j Graph Algorithms library.
This is documentation for the Graph Algorithms Library, which has been deprecated by the Graph Data Science Library (GDS). |
This topic includes:
The WCC algorithm finds sets of connected nodes in an undirected graph, where all nodes in the same set form a connected component. WCC is often used early in an analysis to understand the structure of a graph.
WCC has previously been known as Union Find or Connected Components in this User Guide.
Currently, the WCC algorithm still uses the syntax of algo.unionFind
.
For more information on this algorithm, see:
Running this algorithm requires sufficient memory availability. Before running this algorithm, we recommend that you read Section 2.4, “Memory Requirements”. |
The following describes the API for running the algorithm and writing results back to Neo4j:
CALL algo.unionFind(label: STRING, relationship: STRING, {
write: BOOLEAN,
writeProperty: STRING
// additional configuration
})
YIELD nodes, setCount, loadMillis, computeMillis, writeMillis
Name | Type | Default | Optional | Description |
---|---|---|---|---|
node label |
string |
|
yes |
The node label to load from the graph. If |
relationship |
string |
|
yes |
The relationship type to load from the graph. If |
config |
map |
|
yes |
Additional configuration, see below. |
Name | Type | Default | Optional | Description |
---|---|---|---|---|
|
int |
available CPUs |
yes |
The number of concurrent threads used for running the algorithm. Also provides the default value for 'readConcurrency' and 'writeConcurrency'. This is dependent on the Neo4j edition; for more information, see Section 1.4.2, “CPU”. |
|
int |
value of 'concurrency' |
yes |
The number of concurrent threads used for reading the graph. |
|
int |
value of 'concurrency' |
yes |
The number of concurrent threads used for writing the result. |
|
string |
|
yes |
The property name that contains weight. If |
|
string |
n/a |
yes |
Used to set the initial community for a node. The property value needs to be a number. |
|
boolean |
|
yes |
Specifies if the result should be written back as a node property. |
|
string |
|
yes |
The property name written back the ID of the partition particular node belongs to. |
|
float |
|
yes |
The value of the weight above which the relationship is not thrown away. |
|
float |
|
yes |
The default value of the weight in case it is missing or invalid. |
|
boolean |
|
yes |
Community identifiers are mapped into a consecutive id space (requires additional memory). |
|
string |
|
yes |
Use |
Name | Type | Description |
---|---|---|
|
int |
Milliseconds for loading data. |
|
int |
Milliseconds for running the algorithm. |
|
int |
Milliseconds for writing result data back. |
|
int |
Milliseconds for computing percentiles and community count. |
|
int |
The number of nodes considered. |
|
int |
The number of communities found. |
|
double |
The 1 percentile of community size. |
|
double |
The 5 percentile of community size. |
|
double |
The 10 percentile of community size. |
|
double |
The 25 percentile of community size. |
|
double |
The 50 percentile of community size. |
|
double |
The 75 percentile of community size. |
|
double |
The 90 percentile of community size. |
|
double |
The 95 percentile of community size. |
|
double |
The 99 percentile of community size. |
|
double |
The 100 percentile of community size. |
|
boolean |
Specifies if the result was written back as a node property. |
|
string |
The property name written back to. |
The following describes the API for running the algorithm and stream results:
CALL algo.unionFind.stream(label: STRING, relationship: STRING, {
// configuration
})
YIELD nodeId, setId
Name | Type | Default | Optional | Description |
---|---|---|---|---|
node label |
string |
|
yes |
The node label to load from the graph. If null, load all nodes. |
relationship type |
string |
|
yes |
The relationship type to load from the graph. If null, load all relationships. |
config |
map |
|
yes |
Additional configuration, see below. |
Name | Type | Default | Optional | Description |
---|---|---|---|---|
|
int |
available CPUs |
yes |
The number of concurrent threads used for running the algorithm. Also provides the default value for 'readConcurrency'. This is dependent on the Neo4j edition; for more information, see Section 1.4.2, “CPU”. |
|
int |
value of 'concurrency' |
yes |
The number of concurrent threads used for reading the graph. |
|
string |
|
yes |
The property name that contains weight. If null, treats the graph as unweighted. Must be numeric. |
|
string |
n/a |
yes |
Used to set the initial community for a node. The property value needs to be a number. |
|
float |
|
yes |
The value of the weight above which the relationship is not thrown away. |
|
float |
|
yes |
The default value of the weight in case it is missing or invalid. |
|
boolean |
|
yes |
Community identifiers are mapped into a consecutive id space (requires additional memory). |
|
string |
|
yes |
Use |
Name | Type | Description |
---|---|---|
|
int |
Node ID |
|
int |
Partition ID |
Consider the graph created by the following Cypher statement:
CREATE (nAlice:User {name: 'Alice'})
CREATE (nBridget:User {name: 'Bridget'})
CREATE (nCharles:User {name: 'Charles'})
CREATE (nDoug:User {name: 'Doug'})
CREATE (nMark:User {name: 'Mark'})
CREATE (nMichael:User {name: 'Michael'})
CREATE (nAlice)-[:LINK {weight: 0.5}]->(nBridget)
CREATE (nAlice)-[:LINK {weight: 4}]->(nCharles)
CREATE (nMark)-[:LINK {weight: 1.1}]->(nDoug)
CREATE (nMark)-[:LINK {weight: 2}]->(nMichael);
This graph has two connected components, each with three nodes.
The relationships that connect the nodes in each component have a property weight
which determines the strength of the relationship.
In the following examples we will demonstrate using the Weakly Connected Components algorithm on this graph.
The following will load the graph, run the algorithm, and stream results:
CALL algo.unionFind.stream('User', 'LINK')
YIELD nodeId, setId
RETURN algo.asNode(nodeId).name AS Name, setId AS ComponentId
ORDER BY ComponentId, Name
Name | ComponentId |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
To instead write the component ID to a node property in the Neo4j graph, use this query:
The following will load the graph with weights, run the algorithm, and write back results:
CALL algo.unionFind('User', 'LINK', {
write: true,
writeProperty: 'componentId'
})
YIELD nodes AS Nodes, setCount AS NbrOfComponents, writeProperty AS PropertyName;
Nodes | NbrOfComponents | PropertyName |
---|---|---|
|
|
|
As we can see from the results, the nodes connected to one another are calculated by the algorithm as belonging to the same connected component.
By configuring the algorithm to use a weight we can increase granularity in the way the algorithm calculates component assignment.
We do this by specifying the property key with the weightProperty
configuration parameter.
Additionally, we can specify a threshold for the weight value in such a way that only weighs greater than the threshold value
will be considered by the algorithm.
We do this by specifying the threshold value with the threshold
configuration parameter.
If a relationship does not have a weight property, a default weight is used.
The default is zero, and can be configured to another value using the defaultValue
configuration parameter.
The following will load the graph with weights, run the algorithm, and stream results:
CALL algo.unionFind.stream('User', 'LINK', {
weightProperty: 'weight',
threshold: 1.0
})
YIELD nodeId, setId
RETURN algo.asNode(nodeId).name AS Name, setId AS ComponentId
ORDER BY ComponentId, Name
Name | ComponentId |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
The following will load the graph with weights, run the algorithm, and write back results:
CALL algo.unionFind('User', 'LINK', {
write: true,
writeProperty: "componentId",
weightProperty: 'weight',
threshold: 1.0
})
YIELD nodes AS Nodes, setCount AS NbrOfComponents, writeProperty AS PropertyName;
Nodes | NbrOfComponents | PropertyName |
---|---|---|
|
|
|
As we can see from the results, the node named 'Bridget' is now in its own component, due to its relationship weight being less than the configured threshold and thus ignored.
It is possible to define preliminary component IDs for nodes using the seedProperty
configuration parameter.
This is helpful if we want to retain components from a previous run and it is known that no components have been split by
removing relationships.
The property value needs to be a number.
The algorithm first checks if there is a seeded component ID assigned to the node. If there is one, that component ID is used. Otherwise, a new unique component ID is assigned to the node.
Once every node belongs to a component, the algorithm merges components of connected nodes. When components are merged, the resulting component is always the one with the lower component ID.
The algorithm assumes that nodes with the same seed value do in fact belong to the same component. If any two nodes in different components have the same seed, behavior is undefined. It is then recommended to run WCC without seeds. |
To show this in practice, we will run the algorithm, then add another node to our graph, then run the algorithm again with
the seedProperty
configuration parameter.
We will use the weighted variant of WCC.
The following will load the graph, run the algorithm, and write back results:
CALL algo.unionFind('User', 'LINK', {
write: true,
writeProperty: 'componentId',
weightProperty: 'weight',
threshold: 1.0
})
YIELD nodes AS Nodes, setCount AS NbrOfComponents, writeProperty AS PropertyName;
Nodes | NbrOfComponents | PropertyName |
---|---|---|
|
|
|
The following will create a new node in the Neo4j graph, with no component ID:
MATCH (b:User {name: 'Bridget'})
CREATE (b)-[:LINK {weight: 2.0}]->(new:User {name: 'Mats'})
No data returned. |
The following will load the graph, run the algorithm, and stream results:
CALL algo.unionFind.stream('User', 'LINK', {
seedProperty: 'componentId',
weightProperty: 'weight',
threshold: 1.0
})
YIELD nodeId, setId
RETURN algo.asNode(nodeId).name AS Name, setId AS ComponentId
ORDER BY ComponentId, Name
Name | ComponentId |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The following will load the graph, run the algorithm, and write back results:
CALL algo.unionFind('User', 'LINK', {
seedProperty: 'componentId',
weightProperty: 'weight',
threshold: 1.0,
write: true,
writeProperty: 'componentId'
})
YIELD nodes AS Nodes, setCount AS NbrOfComponents, writeProperty AS PropertyName;
Nodes | NbrOfComponents | PropertyName |
---|---|---|
|
|
|
If the |
In the examples above, we have relied on the implicit loading of graphs for the algorithm computation. However, like other algorithms WCC also accepts named graphs and Cypher projections as inputs. See Projected Graph Model for more details.
Using a named graph:
CALL algo.graph.load('myGraph', 'User', 'LINK');
CALL algo.unionFind.stream(null, null, {graph: 'myGraph'})
YIELD nodeId, setId
RETURN algo.asNode(nodeId).name AS Name, setId AS ComponentId
ORDER BY ComponentId, Name;
Name | ComponentId |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
As we can see, the results are identical to the results in the Section 6.3.3.1, “Unweighted” example.
Using a Cypher projection:
CALL algo.unionFind.stream(
'MATCH (u:User) RETURN id(u) AS id',
'MATCH (u1:User)-[:LINK]->(u2:User)
RETURN id(u1) AS source, id(u2) AS target',
{graph:'cypher'}
)
YIELD nodeId, setId
RETURN algo.asNode(nodeId).name AS Name, setId AS ComponentId
ORDER BY ComponentId, Name
Name | ComponentId |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
Again, results are identical, as the Cypher projection we use mimics the behaviour of the default loading configuration. Of course, the Cypher projection feature enables more advanced control over which exact parts of the graph to compute over; please see Section 2.2, “Cypher projection” for more details.