This section describes the Label Propagation algorithm in the Neo4j Graph Algorithms library.
The Label Propagation algorithm (LPA) is a fast algorithm for finding communities in a graph. It detects these communities using network structure alone as its guide, and doesn’t require a pre-defined objective function or prior information about the communities.
One interesting feature of LPA is that nodes can be assigned preliminary labels to narrow down the range of solutions generated. This means that it can be used as semi-supervised way of finding communities where we hand-pick some initial communities.
LPA is a relatively new algorithm, and was only proposed by Raghavan et al in 2007, in "Near linear time algorithm to detect community structures in large-scale networks". It works by propagating labels throughout the network and forming communities based on this process of label propagation.
The intuition behind the algorithm is that a single label can quickly become dominant in a densely connected group of nodes, but will have trouble crossing a sparsely connected region. Labels will get trapped inside a densely connected group of nodes, and those nodes that end up with the same label when the algorithms finish can be considered part of the same community.
The algorithm works as follows:
As labels propagate, densely connected groups of nodes quickly reach a consensus on a unique label. At the end of the propagation only a few labels will remain - most will have disappeared. Nodes that have the same label at convergence are said to belong to the same community.
In contrast with other algorithms, label propagation can result in different community structures when run multiple times on the same graph. The range of solutions can be narrowed if some nodes are given preliminary labels, while others are unlabelled. Unlabelled nodes will be more likely to adapt the preliminary labels.
The following will create a sample graph:
MERGE (nAlice:User {id:'Alice'}) SET nAlice.seed_label=52
MERGE (nBridget:User {id:'Bridget'}) SET nBridget.seed_label=21
MERGE (nCharles:User {id:'Charles'}) SET nCharles.seed_label=43
MERGE (nDoug:User {id:'Doug'}) SET nDoug.seed_label=21
MERGE (nMark:User {id:'Mark'}) SET nMark.seed_label=19
MERGE (nMichael:User {id:'Michael'}) SET nMichael.seed_label=52
MERGE (nAlice)-[:FOLLOW]->(nBridget)
MERGE (nAlice)-[:FOLLOW]->(nCharles)
MERGE (nMark)-[:FOLLOW]->(nDoug)
MERGE (nBridget)-[:FOLLOW]->(nMichael)
MERGE (nDoug)-[:FOLLOW]->(nMark)
MERGE (nMichael)-[:FOLLOW]->(nAlice)
MERGE (nAlice)-[:FOLLOW]->(nMichael)
MERGE (nBridget)-[:FOLLOW]->(nAlice)
MERGE (nMichael)-[:FOLLOW]->(nBridget)
MERGE (nCharles)-[:FOLLOW]->(nDoug);
The following will run the algorithm and stream results:
CALL algo.labelPropagation.stream("User", "FOLLOW",
{direction: "OUTGOING", iterations: 10})
The following will run the algorithm and write back results:
CALL algo.labelPropagation('User', 'FOLLOW','OUTGOING',
{iterations:10,partitionProperty:'partition', write:true})
YIELD nodes, iterations, loadMillis, computeMillis, writeMillis, write, partitionProperty;
Name | Partition |
---|---|
Alice |
5 |
Charles |
4 |
Bridget |
5 |
Michael |
5 |
Doug |
4 |
Mark |
4 |
Our algorithm found two communities, with 3 members each.
It appears that Michael, Bridget, and Alice belong together, as do Doug and Mark. Only Charles doesn’t strongly fit into either side, but ends up with Doug and Mark.
At the beginning of the algorithm, every node is initialized with unique label (called as identifier) and the labels propagate through the network.
It is possible to define preliminary labels (identifiers) of nodes using the partitionProperty
parameter.
We need to save a preliminary set of labels that we would like to run the Label Propagation algorithm with as a property of
nodes (must be a number).
In our example graph we saved them as the property seed_label
.
The algorithm first checks if there is a seed label assigned to the node, and loads it if there is one. If there isn’t one, it assigns the node new unique label (node ID is used). Using this preliminary set of labels (identifiers), it then sequentially updates each node’s label to a new one, which is the most frequent label among its neighbors at every label propagation step (iteration).
The following will run the algorithm with pre-defined labels:
CALL algo.labelPropagation('User', 'FOLLOW','OUTGOING',
{iterations:10,partitionProperty:'seed_label', write:true})
YIELD nodes, iterations, loadMillis, computeMillis, writeMillis, write, partitionProperty;
The following will run the algorithm and write back results:
CALL algo.labelPropagation(label:String, relationship:String, direction:String, {iterations:1,
weightProperty:'weight', partitionProperty:'partition', write:true, concurrency:4})
YIELD nodes, iterations, didConverge, loadMillis, computeMillis, writeMillis, write, weightProperty, partitionProperty
Name | Type | Default | Optional | Description |
---|---|---|---|---|
label |
string |
null |
yes |
The label to load from the graph. If null, load all nodes |
relationship |
string |
null |
yes |
The relationship-type to load from the graph. If null, load all relationships |
direction |
string |
'OUTGOING' |
yes |
The relationship-direction to use in the algorithm |
concurrency |
int |
available CPUs |
yes |
The number of concurrent threads |
iterations |
int |
1 |
yes |
The maximum number of iterations to run |
weightProperty |
string |
'weight' |
yes |
The property name of node and/or relationship that contain weight. Must be numeric. |
partitionProperty |
string |
'partition' |
yes |
The property name written back to the partition of the graph in which the node reside. Can be used to define initial set of labels (must be a number) |
write |
boolean |
true |
yes |
Specifies if the result should be written back as a node property |
graph |
string |
'heavy' |
yes |
Use 'heavy' when describing the subset of the graph with label and relationship-type parameter. Use 'cypher' for describing the subset with cypher node-statement and relationship-statement |
Name | Type | Description |
---|---|---|
loadMillis |
int |
Milliseconds for loading data |
computeMillis |
int |
Milliseconds for running the algorithm |
writeMillis |
int |
Milliseconds for writing result data back |
postProcessingMillis |
int |
Milliseconds for computing percentiles and community count |
nodes |
int |
The number of nodes considered |
communityCount |
int |
The number of communities found |
iterations |
int |
The number of iterations that were executed |
didConverge |
boolean |
True if the algorithm did converge to a stable labelling within the provided number of maximum iterations |
p1 |
double |
The 1 percentile of community size. |
p5 |
double |
The 5 percentile of community size. |
p10 |
double |
The 10 percentile of community size. |
p25 |
double |
The 25 percentile of community size. |
p50 |
double |
The 50 percentile of community size. |
p75 |
double |
The 75 percentile of community size. |
p90 |
double |
The 90 percentile of community size. |
p95 |
double |
The 95 percentile of community size. |
p99 |
double |
The 99 percentile of community size. |
p100 |
double |
The 100 percentile of community size. |
write |
boolean |
Specifies if the result was written back as a node property |
partitionProperty |
string |
The property name written back to |
weightProperty |
string |
The property name that contains weight |
The following will run the algorithm and stream back results:
CALL algo.labelPropagation.stream(label:String, relationship:String, {iterations:1,
weightProperty:'weight', partitionProperty:'partition', concurrency:4, direction:'OUTGOING'})
YIELD nodeId, label
Name | Type | Default | Optional | Description |
---|---|---|---|---|
label |
string |
null |
yes |
The label to load from the graph. If null, load all nodes |
relationship |
string |
null |
yes |
The relationship-type to load from the graph. If null, load all relationships |
direction |
string |
'OUTGOING' |
yes |
The relationship-direction to use in the algorithm |
concurrency |
int |
available CPUs |
yes |
The number of concurrent threads |
iterations |
int |
1 |
yes |
The maximum number of iterations to run |
weightProperty |
string |
'weight' |
yes |
The property name of node and/or relationship that contain weight. Must be numeric. |
partitionProperty |
string |
'partition' |
yes |
The property name written back to the partition of the graph in which the node reside. Can be used to define initial set of labels (must be a number) |
graph |
string |
'heavy' |
yes |
Use 'heavy' when describing the subset of the graph with label and relationship-type parameter. Use 'cypher' for describing the subset with cypher node-statement and relationship-statement |
Name | Type | Description |
---|---|---|
nodeId |
int |
Node ID |
label |
int |
Community ID |
If label and relationship-type are not selective enough to describe your subgraph to run the algorithm on, you can use Cypher statements to load or project subsets of your graph. This can also be used to run algorithms on a virtual graph.
Set graph:'cypher'
in the config:
CALL algo.labelPropagation(
'MATCH (p:User) RETURN id(p) as id, p.weight as weight, id(p) as value',
'MATCH (p1:User)-[f:FRIEND]->(p2:User)
RETURN id(p1) as source, id(p2) as target, f.weight as weight',
"OUT",
{graph:'cypher',write:true});
The Label Propagation algorithm supports the following graph types:
✓ directed, unweighted:
✓ directed, weighted
✓ undirected, unweighted
✓ undirected, weighted