Label Propagation
This section describes the Label Propagation algorithm in the Neo4j Graph Data Science library.
1. Introduction
The Label Propagation algorithm (LPA) is a fast algorithm for finding communities in a graph. It detects these communities using network structure alone as its guide, and doesn’t require a predefined objective function or prior information about the communities.
LPA works by propagating labels throughout the network and forming communities based on this process of label propagation.
The intuition behind the algorithm is that a single label can quickly become dominant in a densely connected group of nodes, but will have trouble crossing a sparsely connected region. Labels will get trapped inside a densely connected group of nodes, and those nodes that end up with the same label when the algorithms finish can be considered part of the same community.
The algorithm works as follows:

Every node is initialized with a unique community label (an identifier).

These labels propagate through the network.

At every iteration of propagation, each node updates its label to the one that the maximum numbers of its neighbours belongs to. Ties are broken arbitrarily but deterministically.

LPA reaches convergence when each node has the majority label of its neighbours.

LPA stops if either convergence, or the userdefined maximum number of iterations is achieved.
As labels propagate, densely connected groups of nodes quickly reach a consensus on a unique label. At the end of the propagation only a few labels will remain  most will have disappeared. Nodes that have the same community label at convergence are said to belong to the same community.
One interesting feature of LPA is that nodes can be assigned preliminary labels to narrow down the range of solutions generated. This means that it can be used as semisupervised way of finding communities where we handpick some initial communities.
For more information on this algorithm, see:
Running this algorithm requires sufficient memory availability. Before running this algorithm, we recommend that you read Memory Estimation. 
2. Syntax
This section covers the syntax used to execute the Label Propagation algorithm in each of its execution modes. We are describing the named graph variant of the syntax. To learn more about general syntax variants, see Syntax overview.
CALL gds.labelPropagation.stream(
graphName: String,
configuration: Map
)
YIELD
nodeId: Integer,
communityId: Integer
Name  Type  Default  Optional  Description 

graphName 
String 

no 
The name of a graph stored in the catalog. 
configuration 
Map 

yes 
Configuration for algorithmspecifics and/or graph filtering. 
Name  Type  Default  Optional  Description 

List of String 

yes 
Filter the named graph using the given node labels. 

List of String 

yes 
Filter the named graph using the given relationship types. 

Integer 

yes 
The number of concurrent threads used for running the algorithm. 
Name  Type  Default  Optional  Description 

Integer 
10 
yes 
The maximum number of iterations to run. 

String 
null 
yes 
The name of a node property that contains node weights. 

String 
null 
yes 
Name of the relationship property to use as weights. If unspecified, the algorithm runs unweighted. 

String 
n/a 
yes 
The name of a node property that defines an initial numeric label. 

consecutiveIds 
Boolean 
false 
yes 
Flag to decide whether component identifiers are mapped into a consecutive id space (requires additional memory). 
Name  Type  Description 

nodeId 
Integer 
Node ID. 
communityId 
Integer 
Community ID. 
CALL gds.labelPropagation.stats(
graphName: String,
configuration: Map
)
YIELD
preProcessingMillis: Integer,
computeMillis: Integer,
postProcessingMillis: Integer,
communityCount: Integer,
ranIterations: Integer,
didConverge: Boolean,
communityDistribution: Map,
configuration: Map
Name  Type  Default  Optional  Description 

graphName 
String 

no 
The name of a graph stored in the catalog. 
configuration 
Map 

yes 
Configuration for algorithmspecifics and/or graph filtering. 
Name  Type  Default  Optional  Description 

List of String 

yes 
Filter the named graph using the given node labels. 

List of String 

yes 
Filter the named graph using the given relationship types. 

Integer 

yes 
The number of concurrent threads used for running the algorithm. 
Name  Type  Default  Optional  Description 

Integer 
10 
yes 
The maximum number of iterations to run. 

String 
null 
yes 
The name of a node property that contains node weights. 

String 
null 
yes 
Name of the relationship property to use as weights. If unspecified, the algorithm runs unweighted. 

String 
n/a 
yes 
The name of a node property that defines an initial numeric label. 

consecutiveIds 
Boolean 
false 
yes 
Flag to decide whether component identifiers are mapped into a consecutive id space (requires additional memory). 
Name  Type  Description 

preProcessingMillis 
Integer 
Milliseconds for preprocessing the data. 
computeMillis 
Integer 
Milliseconds for running the algorithm. 
postProcessingMillis 
Integer 
Milliseconds for computing percentiles and community count. 
communityCount 
Integer 
The number of communities found. 
ranIterations 
Integer 
The number of iterations that were executed. 
didConverge 
Boolean 
True if the algorithm did converge to a stable labelling within the provided number of maximum iterations. 
communityDistribution 
Map 
Map containing min, max, mean as well as p50, p75, p90, p95, p99 and p999 percentile values of community size. 
configuration 
Map 
The configuration used for running the algorithm. 
CALL gds.labelPropagation.mutate(
graphName: String,
configuration: Map
)
YIELD
preProcessingMillis: Integer,
computeMillis: Integer,
mutateMillis: Integer,
postProcessingMillis: Integer,
nodePropertiesWritten: Integer,
communityCount: Integer,
ranIterations: Integer,
didConverge: Boolean,
communityDistribution: Map,
configuration: Map
Name  Type  Default  Optional  Description 

graphName 
String 

no 
The name of a graph stored in the catalog. 
configuration 
Map 

yes 
Configuration for algorithmspecifics and/or graph filtering. 
Name  Type  Default  Optional  Description 

List of String 

yes 
Filter the named graph using the given node labels. 

List of String 

yes 
Filter the named graph using the given relationship types. 

Integer 

yes 
The number of concurrent threads used for running the algorithm. 

mutateProperty 
String 

no 
The node property in the GDS graph to which the community ID is written. 
Name  Type  Default  Optional  Description 

Integer 
10 
yes 
The maximum number of iterations to run. 

String 
null 
yes 
The name of a node property that contains node weights. 

String 
null 
yes 
Name of the relationship property to use as weights. If unspecified, the algorithm runs unweighted. 

String 
n/a 
yes 
The name of a node property that defines an initial numeric label. 

consecutiveIds 
Boolean 
false 
yes 
Flag to decide whether component identifiers are mapped into a consecutive id space (requires additional memory). 
Name  Type  Description 

preProcessingMillis 
Integer 
Milliseconds for preprocessing the data. 
computeMillis 
Integer 
Milliseconds for running the algorithm. 
mutateMillis 
Integer 
Milliseconds for adding properties to the inmemory graph. 
postProcessingMillis 
Integer 
Milliseconds for computing percentiles and community count. 
nodePropertiesWritten 
Integer 
The number of node properties written. 
communityCount 
Integer 
The number of communities found. 
ranIterations 
Integer 
The number of iterations that were executed. 
didConverge 
Boolean 
True if the algorithm did converge to a stable labelling within the provided number of maximum iterations. 
communityDistribution 
Map 
Map containing min, max, mean as well as p50, p75, p90, p95, p99 and p999 percentile values of community size. 
configuration 
Map 
The configuration used for running the algorithm. 
CALL gds.labelPropagation.write(
graphName: String,
configuration: Map
)
YIELD
preProcessingMillis: Integer,
computeMillis: Integer,
writeMillis: Integer,
postProcessingMillis: Integer,
nodePropertiesWritten: Integer,
communityCount: Integer,
ranIterations: Integer,
didConverge: Boolean,
communityDistribution: Map,
configuration: Map
Name  Type  Default  Optional  Description 

graphName 
String 

no 
The name of a graph stored in the catalog. 
configuration 
Map 

yes 
Configuration for algorithmspecifics and/or graph filtering. 
Name  Type  Default  Optional  Description 

List of String 

yes 
Filter the named graph using the given node labels. 

List of String 

yes 
Filter the named graph using the given relationship types. 

Integer 

yes 
The number of concurrent threads used for running the algorithm. Also provides the default value for 'writeConcurrency'. 

Integer 

yes 
The number of concurrent threads used for writing the result to Neo4j. 

String 

no 
The node property in the Neo4j database to which the community ID is written. 
Name  Type  Default  Optional  Description 

Integer 
10 
yes 
The maximum number of iterations to run. 

String 
null 
yes 
The name of a node property that contains node weights. 

String 
null 
yes 
Name of the relationship property to use as weights. If unspecified, the algorithm runs unweighted. 

String 
n/a 
yes 
The name of a node property that defines an initial numeric label. 

consecutiveIds 
Boolean 
false 
yes 
Flag to decide whether component identifiers are mapped into a consecutive id space (requires additional memory). 
minCommunitySize 
Integer 
0 
yes 
Only community ids of communities with a size greater than or equal to the given value are written to Neo4j. 
Name  Type  Description 

preProcessingMillis 
Integer 
Milliseconds for preprocessing the data. 
computeMillis 
Integer 
Milliseconds for running the algorithm. 
writeMillis 
Integer 
Milliseconds for writing result data back. 
postProcessingMillis 
Integer 
Milliseconds for computing percentiles and community count. 
nodePropertiesWritten 
Integer 
The number of node properties written. 
communityCount 
Integer 
The number of communities found. 
ranIterations 
Integer 
The number of iterations that were executed. 
didConverge 
Boolean 
True if the algorithm did converge to a stable labelling within the provided number of maximum iterations. 
communityDistribution 
Map 
Map containing min, max, mean as well as p50, p75, p90, p95, p99 and p999 percentile values of community size. 
configuration 
Map 
The configuration used for running the algorithm. 
3. Examples
In this section we will show examples of running the Label Propagation algorithm on a concrete graph. The intention is to illustrate what the results look like and to provide a guide in how to make use of the algorithm in a real setting. We will do this on a small social network graph of a handful nodes connected in a particular pattern. The example graph looks like this:
CREATE
(alice:User {name: 'Alice', seed_label: 52}),
(bridget:User {name: 'Bridget', seed_label: 21}),
(charles:User {name: 'Charles', seed_label: 43}),
(doug:User {name: 'Doug', seed_label: 21}),
(mark:User {name: 'Mark', seed_label: 19}),
(michael:User {name: 'Michael', seed_label: 52}),
(alice)[:FOLLOW {weight: 1}]>(bridget),
(alice)[:FOLLOW {weight: 10}]>(charles),
(mark)[:FOLLOW {weight: 1}]>(doug),
(bridget)[:FOLLOW {weight: 1}]>(michael),
(doug)[:FOLLOW {weight: 1}]>(mark),
(michael)[:FOLLOW {weight: 1}]>(alice),
(alice)[:FOLLOW {weight: 1}]>(michael),
(bridget)[:FOLLOW {weight: 1}]>(alice),
(michael)[:FOLLOW {weight: 1}]>(bridget),
(charles)[:FOLLOW {weight: 1}]>(doug)
This graph represents six users, some of whom follow each other.
Besides a name
property, each user also has a seed_label
property.
The seed_label
property represents a value in the graph used to seed the node with a label.
For example, this can be a result from a previous run of the Label Propagation algorithm.
In addition, each relationship has a weight property.
In the examples below we will use named graphs and native projections as the norm. However, Cypher projections can also be used. 
CALL gds.graph.project(
'myGraph',
'User',
'FOLLOW',
{
nodeProperties: 'seed_label',
relationshipProperties: 'weight'
}
)
In the following examples we will demonstrate using the Label Propagation algorithm on this graph.
3.1. Memory Estimation
First off, we will estimate the cost of running the algorithm using the estimate
procedure.
This can be done with any execution mode.
We will use the write
mode in this example.
Estimating the algorithm is useful to understand the memory impact that running the algorithm on your graph will have.
When you later actually run the algorithm in one of the execution modes the system will perform an estimation.
If the estimation shows that there is a very high probability of the execution going over its memory limitations, the execution is prohibited.
To read more about this, see Automatic estimation and execution blocking.
For more details on estimate
in general, see Memory Estimation.
CALL gds.labelPropagation.write.estimate('myGraph', { writeProperty: 'community' })
YIELD nodeCount, relationshipCount, bytesMin, bytesMax, requiredMemory
nodeCount  relationshipCount  bytesMin  bytesMax  requiredMemory 

6 
10 
1608 
1608 
"1608 Bytes" 
3.2. Stream
In the stream
execution mode, the algorithm returns the community ID for each node.
This allows us to inspect the results directly or postprocess them in Cypher without any side effects.
For example, we can order the results to see the nodes that belong to the same communities displayed next to each other.
For more details on the stream
mode in general, see Stream.
CALL gds.labelPropagation.stream('myGraph')
YIELD nodeId, communityId AS Community
RETURN gds.util.asNode(nodeId).name AS Name, Community
ORDER BY Community, Name
Name  Community 

"Alice" 
1 
"Bridget" 
1 
"Michael" 
1 
"Charles" 
4 
"Doug" 
4 
"Mark" 
4 
In the above example we can see that our graph has two communities each containing three nodes.
The default behaviour of the algorithm is to run unweighted
, e.g. without using node
or relationship
weights.
The weighted
option will be demonstrated in Weighted
3.3. Stats
In the stats
execution mode, the algorithm returns a single row containing a summary of the algorithm result.
This execution mode does not have any side effects.
It can be useful for evaluating algorithm performance by inspecting the computeMillis
return item.
In the examples below we will omit returning the timings.
The full signature of the procedure can be found in the syntax section.
For more details on the stats
mode in general, see Stats.
stats
mode:CALL gds.labelPropagation.stats('myGraph')
YIELD communityCount, ranIterations, didConverge
communityCount  ranIterations  didConverge 

2 
3 
true 
As we can see from the example above the algorithm finds two communities and converges in three iterations.
Note that we ran the algorithm unweighted
.
3.4. Mutate
The mutate
execution mode extends the stats
mode with an important side effect: updating the named graph with a new node property containing the community ID for that node.
The name of the new property is specified using the mandatory configuration parameter mutateProperty
.
The result is a single summary row, similar to stats
, but with some additional metrics.
The mutate
mode is especially useful when multiple algorithms are used in conjunction.
For more details on the mutate
mode in general, see Mutate.
CALL gds.labelPropagation.mutate('myGraph', { mutateProperty: 'community' })
YIELD communityCount, ranIterations, didConverge
communityCount  ranIterations  didConverge 

2 
3 
true 
The returned result is the same as in the stats
example.
Additionally, the graph 'myGraph' now has a node property community
which stores the community ID for each node.
To find out how to inspect the new schema of the inmemory graph, see Listing graphs.
3.5. Write
The write
execution mode extends the stats
mode with an important side effect: writing the community ID for each node as a property to the Neo4j database.
The name of the new property is specified using the mandatory configuration parameter writeProperty
.
The result is a single summary row, similar to stats
, but with some additional metrics.
The write
mode enables directly persisting the results to the database.
For more details on the write
mode in general, see Write.
CALL gds.labelPropagation.write('myGraph', { writeProperty: 'community' })
YIELD communityCount, ranIterations, didConverge
communityCount  ranIterations  didConverge 

2 
3 
true 
The returned result is the same as in the stats
example.
Additionally, each of the six nodes now has a new property community
in the Neo4j database, containing the community ID for that node.
3.6. Weighted
The Label Propagation algorithm can also be configured to use node and/or relationship weights into account.
By specifying a node weight via the nodeWeightProperty
key, we can control the influence of a nodes community onto its neighbors.
During the computation of the weight of a specific community, the node property will be multiplied by the weight of that nodes relationships.
When we projected myGraph
, we also projected the relationship property weight
.
In order to tell the algorithm to consider this property as a relationship weight, we have to set the relationshipWeightProperty
configuration parameter to weight
.
CALL gds.labelPropagation.stream('myGraph', { relationshipWeightProperty: 'weight' })
YIELD nodeId, communityId AS Community
RETURN gds.util.asNode(nodeId).name AS Name, Community
ORDER BY Community, Name
Name  Community 

"Bridget" 
2 
"Michael" 
2 
"Alice" 
4 
"Charles" 
4 
"Doug" 
4 
"Mark" 
4 
Compared to the unweighted run of the algorithm we still have two communities, but they contain two and four nodes respectively.
Using the weighted relationships, the nodes Alice
and Charles
are now in the same community as there is a strong link between them.
We have used the stream mode to demonstrate running the algorithm using weights, the configuration parameters are available for all the modes of the algorithm.

3.7. Seeded communities
At the beginning of the algorithm computation, every node is initialized with a unique label, and the labels propagate through the network.
An initial set of labels can be provided by setting the seedProperty
configuration parameter.
When we projected myGraph
, we also projected the node property seed_label
.
We can use this node property as seedProperty
.
The algorithm first checks if there is a seed label assigned to the node. If no seed label is present, the algorithm assigns new unique label to the node. Using this preliminary set of labels, it then sequentially updates each node’s label to a new one, which is the most frequent label among its neighbors at every iteration of label propagation.
The consecutiveIds configuration option cannot be used in combination with seedProperty in order to retain the seeding values.

CALL gds.labelPropagation.stream('myGraph', { seedProperty: 'seed_label' })
YIELD nodeId, communityId AS Community
RETURN gds.util.asNode(nodeId).name AS Name, Community
ORDER BY Community, Name
Name  Community 

"Charles" 
19 
"Doug" 
19 
"Mark" 
19 
"Alice" 
21 
"Bridget" 
21 
"Michael" 
21 
As we can see, the communities are based on the seed_label
property, concretely 19
is from the node Mark
and 21
from Doug
.
We have used the stream mode to demonstrate running the algorithm using seedProperty , this configuration parameter is available for all the modes of the algorithm.

Was this page helpful?