5.2. The Label Propagation algorithm

This section describes the Label Propagation algorithm in the Neo4j Graph Algorithms library.

The Label Propagation algorithm (LPA) is a fast algorithm for finding communities in a graph. It detects these communities using network structure alone as its guide, and doesn’t require a pre-defined objective function or prior information about the communities.

One interesting feature of LPA is that nodes can be assigned preliminary labels to narrow down the range of solutions generated. This means that it can be used as semi-supervised way of finding communities where we hand-pick some initial communities.

This section includes:

5.2.1. History and explanation

LPA is a relatively new algorithm, and was only proposed by Raghavan et al in 2007, in "Near linear time algorithm to detect community structures in large-scale networks". It works by propagating labels throughout the network and forming communities based on this process of label propagation.

The intuition behind the algorithm is that a single label can quickly become dominant in a densely connected group of nodes, but will have trouble crossing a sparsely connected region. Labels will get trapped inside a densely connected group of nodes, and those nodes that end up with the same label when the algorithms finish can be considered part of the same community.

The algorithm works as follows:

  • Every node is initialized with a unique label (an identifier).
  • These labels propagate through the network.
  • At every iteration of propagation, each node updates its label to the one that the maximum numbers of its neighbours belongs to. Ties are broken uniformly and randomly.
  • LPA reaches convergence when each node has the majority label of its neighbours.

As labels propagate, densely connected groups of nodes quickly reach a consensus on a unique label. At the end of the propagation only a few labels will remain - most will have disappeared. Nodes that have the same label at convergence are said to belong to the same community.

5.2.2. Use-cases - when to use the Label Propagation algorithm

5.2.3. Constraints - when not to use the Label Propagation algorithm

In contrast with other algorithms, label propagation can result in different community structures when run multiple times on the same graph. The range of solutions can be narrowed if some nodes are given preliminary labels, while others are unlabelled. Unlabelled nodes will be more likely to adapt the preliminary labels.

5.2.4. Label Propagation algorithm sample

label propagation

The following will create a sample graph: 

MERGE (nAlice:User {id:'Alice'}) SET nAlice.seed_label=52
MERGE (nBridget:User {id:'Bridget'}) SET nBridget.seed_label=21
MERGE (nCharles:User {id:'Charles'}) SET nCharles.seed_label=43
MERGE (nDoug:User {id:'Doug'}) SET nDoug.seed_label=21
MERGE (nMark:User {id:'Mark'}) SET nMark.seed_label=19
MERGE (nMichael:User {id:'Michael'}) SET nMichael.seed_label=52

MERGE (nAlice)-[:FOLLOW]->(nBridget)
MERGE (nAlice)-[:FOLLOW]->(nCharles)
MERGE (nMark)-[:FOLLOW]->(nDoug)
MERGE (nBridget)-[:FOLLOW]->(nMichael)
MERGE (nDoug)-[:FOLLOW]->(nMark)
MERGE (nMichael)-[:FOLLOW]->(nAlice)
MERGE (nAlice)-[:FOLLOW]->(nMichael)
MERGE (nBridget)-[:FOLLOW]->(nAlice)
MERGE (nMichael)-[:FOLLOW]->(nBridget)
MERGE (nCharles)-[:FOLLOW]->(nDoug);

The following will run the algorithm and stream results: 

CALL algo.labelPropagation.stream("User", "FOLLOW",
  {direction: "OUTGOING", iterations: 10})

The following will run the algorithm and write back results: 

CALL algo.labelPropagation('User', 'FOLLOW',
  {iterations:10, writeProperty:'partition', write:true, direction: 'OUTGOING'})
YIELD nodes, iterations, loadMillis, computeMillis, writeMillis, write, writeProperty;

Table 5.8. Results
Name Partition

Alice

5

Charles

4

Bridget

5

Michael

5

Doug

4

Mark

4

Our algorithm found two communities, with 3 members each.

It appears that Michael, Bridget, and Alice belong together, as do Doug and Mark. Only Charles doesn’t strongly fit into either side, but ends up with Doug and Mark.

5.2.4.1. Using seed labels

At the beginning of the algorithm, every node is initialized with unique label (called as identifier) and the labels propagate through the network.

It is possible to define preliminary labels (identifiers) of nodes using the partitionProperty parameter. We need to save a preliminary set of labels that we would like to run the Label Propagation algorithm with as a property of nodes (must be a number). In our example graph we saved them as the property seed_label.

The algorithm first checks if there is a seed label assigned to the node, and loads it if there is one. If there isn’t one, it assigns the node new unique label (node ID is used). Using this preliminary set of labels (identifiers), it then sequentially updates each node’s label to a new one, which is the most frequent label among its neighbors at every label propagation step (iteration).

The following will run the algorithm with pre-defined labels: 

CALL algo.labelPropagation('User', 'FOLLOW',
  {iterations:10,partitionProperty:'seed_label', writeProperty: "partition", write:true, direction: 'OUTGOING'})
YIELD nodes, iterations, loadMillis, computeMillis, writeMillis, write, writeProperty;

5.2.5. Cypher projection

If label and relationship-type are not selective enough to describe your subgraph to run the algorithm on, you can use Cypher statements to load or project subsets of your graph. This can also be used to run algorithms on a virtual graph. You can learn more in the Section 1.3.2, “Cypher projection” section of the manual.

Set graph:'cypher' in the config: 

CALL algo.labelPropagation(
  'MATCH (p:User) RETURN id(p) as id, p.weight as weight, id(p) as value',
  'MATCH (p1:User)-[f:FRIEND]->(p2:User)
   RETURN id(p1) as source, id(p2) as target, f.weight as weight',
  {graph:'cypher',write:true});

5.2.6. Syntax

The following will run the algorithm and write back results: 

CALL algo.labelPropagation(label:String, relationship:String, {iterations:1,
    weightProperty:'weight', writeProperty:'partition', write:true, concurrency:4})
YIELD nodes, iterations, didConverge, loadMillis, computeMillis, writeMillis, write, weightProperty, writeProperty

Table 5.9. Parameters
Name Type Default Optional Description

label

string

null

yes

The label to load from the graph. If null, load all nodes

relationship

string

null

yes

The relationship-type to load from the graph. If null, load all relationships

direction

string

'OUTGOING'

yes

The relationship-direction to use in the algorithm

concurrency

int

available CPUs

yes

The number of concurrent threads

iterations

int

1

yes

The maximum number of iterations to run

weightProperty

string

'weight'

yes

The property name of node and/or relationship that contain weight. Must be numeric.

partitionProperty

string

'partition'

yes

Used to define initial set of labels (must be a number)

writeProperty

string

'partition'

yes

The property name written back to the partition of the graph in which the node resides.

write

boolean

true

yes

Specifies if the result should be written back as a node property

graph

string

'heavy'

yes

Use 'heavy' when describing the subset of the graph with label and relationship-type parameter. Use 'cypher' for describing the subset with cypher node-statement and relationship-statement

Table 5.10. Results
Name Type Description

loadMillis

int

Milliseconds for loading data

computeMillis

int

Milliseconds for running the algorithm

writeMillis

int

Milliseconds for writing result data back

postProcessingMillis

int

Milliseconds for computing percentiles and community count

nodes

int

The number of nodes considered

communityCount

int

The number of communities found

iterations

int

The number of iterations that were executed

didConverge

boolean

True if the algorithm did converge to a stable labelling within the provided number of maximum iterations

p1

double

The 1 percentile of community size.

p5

double

The 5 percentile of community size.

p10

double

The 10 percentile of community size.

p25

double

The 25 percentile of community size.

p50

double

The 50 percentile of community size.

p75

double

The 75 percentile of community size.

p90

double

The 90 percentile of community size.

p95

double

The 95 percentile of community size.

p99

double

The 99 percentile of community size.

p100

double

The 100 percentile of community size.

write

boolean

Specifies if the result was written back as a node property

writeProperty

string

The property name written back to

weightProperty

string

The property name that contains weight

The following will run the algorithm and stream back results: 

CALL algo.labelPropagation.stream(label:String, relationship:String, {iterations:1,
    weightProperty:'weight', writeProperty:'partition', concurrency:4, direction:'OUTGOING'})
YIELD nodeId, label

Table 5.11. Parameters
Name Type Default Optional Description

label

string

null

yes

The label to load from the graph. If null, load all nodes

relationship

string

null

yes

The relationship-type to load from the graph. If null, load all relationships

direction

string

'OUTGOING'

yes

The relationship-direction to use in the algorithm

concurrency

int

available CPUs

yes

The number of concurrent threads

iterations

int

1

yes

The maximum number of iterations to run

weightProperty

string

'weight'

yes

The property name of node and/or relationship that contain weight. Must be numeric.

partitionProperty

string

'partition'

yes

Used to define initial set of labels (must be a number)

writeProperty

string

'partition'

yes

The property name written back to the partition of the graph in which the node resides.

graph

string

'heavy'

yes

Use 'heavy' when describing the subset of the graph with label and relationship-type parameter. Use 'cypher' for describing the subset with cypher node-statement and relationship-statement

Table 5.12. Results
Name Type Description

nodeId

int

Node ID

label

int

Community ID

5.2.7. Graph type support

The Label Propagation algorithm supports the following graph types:

  • ✓ directed, unweighted:

    • direction: 'INCOMING' or 'OUTGOING', weightProperty: null
  • ✓ directed, weighted

    • direction: 'INCOMING' or 'OUTGOING', weightProperty : 'weight'
  • ✓ undirected, unweighted

    • direction: 'BOTH', weightProperty: null
  • ✓ undirected, weighted

    • direction: 'BOTH', weightProperty: 'weight'