This section describes the Louvain algorithm in the Neo4j Graph Algorithms library.
The Louvain method of community detection is an algorithm for detecting communities in networks. It maximizes a modularity score for each community, where the modularity quantifies the quality of an assignment of nodes to communities by evaluating how much more densely connected the nodes within a community are, compared to how connected they would be in a random network.
The Louvain algorithm is one of the fastest modularity-based algorithms, and works well with large graphs. It also reveals a hierarchy of communities at different scales, which can be useful for understanding the global functioning of a network.
The "Louvain algorithm" was proposed in 2008 by authors from the University of Louvain.
The method consists of repeated application of two steps. The first step is a "greedy" assignment of nodes to communities, favoring local optimizations of modularity. The second step is the definition of a new coarse-grained network, based on the communities found in the first step. These two steps are repeated until no further modularity-increasing reassignments of communities are possible.
The algorithm is initialized with each node in its own community.
In the first stage we iterate through each of the nodes in the network. We take each node, remove it from its current community and replace it in the community of ones of its neighbors. We compute the modularity change for each of the node’s neighbors. If none of these modularity changes are positive, the node stays in its current community. If some of the modularity changes are positive, the node moves into the community where the modularity change is most positive. Ties are resolved arbitrarily. We repeat this process for each node until one pass through all nodes yields no community assignment changes.
The second stage in the Louvain method uses the communities that were discovered in the community reassignment stage, to define a new coarse-grained network. In this network, the newly discovered communities are the nodes. The relationship weight between the nodes representing two communities is the sum of the relationship weights between the lower-level nodes of each community.
The rest of the Louvain method consists of the repeated application of stages 1 and 2. By applying stage 1 (the community reassignment phase) to the coarse-grained graph, we find a second tier of communities of communities of nodes. Then, in the next application of stage 2, we define a new coarse-grained graph at this higher-level of the hierarchy. We keep going like this until an application of stage 1 yields no reassignments. At that point, repeated application of stages 1 and 2 will not yield any more modularity-optimizing changes, so the process is complete.
Although the Louvain method, and modularity optimization algorithms more generally, have found wide application across many domains, some problems with these algorithms have been identified:
This sample will explain the Louvain algorithm, using a simple graph:
The following will create a sample graph:
MERGE (nAlice:User {id:'Alice'})
MERGE (nBridget:User {id:'Bridget'})
MERGE (nCharles:User {id:'Charles'})
MERGE (nDoug:User {id:'Doug'})
MERGE (nMark:User {id:'Mark'})
MERGE (nMichael:User {id:'Michael'})
MERGE (nAlice)-[:FRIEND]->(nBridget)
MERGE (nAlice)-[:FRIEND]->(nCharles)
MERGE (nMark)-[:FRIEND]->(nDoug)
MERGE (nBridget)-[:FRIEND]->(nMichael)
MERGE (nCharles)-[:FRIEND]->(nMark)
MERGE (nAlice)-[:FRIEND]->(nMichael)
MERGE (nCharles)-[:FRIEND]->(nDoug);
The following will run the algorithm and stream results:
CALL algo.louvain.stream('User', 'FRIEND', {})
YIELD nodeId, community
RETURN algo.getNodeById(nodeId).id AS user, community
ORDER BY community;
The following will run the algorithm and write back results:
CALL algo.louvain('User', 'FRIEND',
{write:true, writeProperty:'community'})
YIELD nodes, communityCount, iterations, loadMillis, computeMillis, writeMillis;
Name | Community |
---|---|
Alice |
0 |
Bridget |
0 |
Michael |
0 |
Charles |
1 |
Doug |
1 |
Mark |
1 |
Our algorithm found two communities with 3 members each.
Mark, Doug, and Charles are all friends with each other, as are Bridget, Alice, and Michael. Charles is the only one who has friends in both communities, but he has more in community 4 so he fits better in that one.
This sample will explain the hierarchical communities option of the Louvain algorithm:
The following will create a sample graph:
MERGE (nAlice:User {id:'Alice'})
MERGE (nBridget:User {id:'Bridget'})
MERGE (nCharles:User {id:'Charles'})
MERGE (nDoug:User {id:'Doug'})
MERGE (nMark:User {id:'Mark'})
MERGE (nMichael:User {id:'Michael'})
MERGE (nKarin:User {id:'Karin'})
MERGE (nAmy:User {id:'Amy'})
MERGE (nAlice)-[:FRIEND]->(nBridget)
MERGE (nAlice)-[:FRIEND]->(nCharles)
MERGE (nMark)-[:FRIEND]->(nDoug)
MERGE (nBridget)-[:FRIEND]->(nMichael)
MERGE (nCharles)-[:FRIEND]->(nMark)
MERGE (nAlice)-[:FRIEND]->(nMichael)
MERGE (nCharles)-[:FRIEND]->(nDoug)
MERGE (nMark)-[:FRIEND]->(nKarin)
MERGE (nKarin)-[:FRIEND]->(nAmy)
MERGE (nAmy)-[:FRIEND]->(nDoug);
The following will run the algorithm and stream results:
CALL algo.louvain.stream('User', 'FRIEND', {includeIntermediateCommunities: true})
YIELD nodeId, communities
RETURN algo.getNodeById(nodeId).id AS user, communities
ORDER BY communities;
The following will run the algorithm and write back results:
CALL algo.louvain('User', 'FRIEND', {
write:true,
includeIntermediateCommunities: true,
intermediateCommunitiesWriteProperty: 'communities'
})
YIELD nodes, communityCount, iterations, loadMillis, computeMillis, writeMillis;
Name | Communities |
---|---|
Alice |
[0,0] |
Bridget |
[0,0] |
Michael |
[0,0] |
Charles |
[1,1] |
Doug |
[1,1] |
Mark |
[1,1] |
Karin |
[2,1] |
Amy |
[2,1] |
Our algorithm found two hierarchical levels of communities.
On the first level it found three communities with Alice, Bridget and Michael forming the first community, Charles, Doug and Mark forming the second one and Karin and Amy forming the third one. On the second level it found two communities. Alice, Bridget and Michael stay in the same community, but the other two communities merge into a single one.
This sample will explain the pre-defined communities option of the Louvain algorithm.
Initial or pre-defined communities for nodes are specified using the communityProperty
parameter.
The following will create a sample graph:
MERGE (nAlice:User {id:'Alice'}) SET nAlice.community = 0
MERGE (nBridget:User {id:'Bridget'}) SET nBridget.community = 0
MERGE (nCharles:User {id:'Charles'}) SET nCharles.community = 1
MERGE (nDoug:User {id:'Doug'}) SET nDoug.community = 1
MERGE (nMark:User {id:'Mark'}) SET nMark.community = 1
MERGE (nMichael:User {id:'Michael'}) SET nMichael.community = 0
MERGE (nKarin:User {id:'Karin'}) SET nKarin.community = 1
MERGE (nAmy:User {id:'Amy'})
MERGE (nAlice)-[:FRIEND]->(nBridget)
MERGE (nAlice)-[:FRIEND]->(nCharles)
MERGE (nMark)-[:FRIEND]->(nDoug)
MERGE (nBridget)-[:FRIEND]->(nMichael)
MERGE (nCharles)-[:FRIEND]->(nMark)
MERGE (nAlice)-[:FRIEND]->(nMichael)
MERGE (nCharles)-[:FRIEND]->(nDoug)
MERGE (nMark)-[:FRIEND]->(nKarin)
MERGE (nKarin)-[:FRIEND]->(nAmy)
MERGE (nAmy)-[:FRIEND]->(nDoug);
The following will run the algorithm and stream results:
CALL algo.louvain.stream('User', 'FRIEND', {communityProperty: 'community'})
YIELD nodeId, communities
RETURN algo.getNodeById(nodeId).id AS user, communities
ORDER BY communities;
The following will run the algorithm and write back results:
CALL algo.louvain('User', 'FRIEND', {
write:true,
communityProperty: "community",
writeProperty: "newCommunity"
})
YIELD nodes, communityCount, iterations, loadMillis, computeMillis, writeMillis;
In this example the pre-defined communities are read from the community
property, and the communities computed by the algorithm are stored to the newCommunity
property.
Name | Community |
---|---|
Alice |
0 |
Bridget |
0 |
Michael |
0 |
Charles |
1 |
Doug |
1 |
Mark |
1 |
Karin |
1 |
Amy |
1 |
The following will run the algorithm and write back results:
CALL algo.louvain(label:String, relationship:String,
{weightProperty:'weight', defaultValue:1.0, write: true, writeProperty:'community', concurrency:4})
YIELD nodes, communityCount, iterations, loadMillis, computeMillis, writeMillis
Name | Type | Default | Optional | Description |
---|---|---|---|---|
label |
string |
null |
yes |
The label to load from the graph. If null, load all nodes |
relationship |
string |
null |
yes |
The relationship-type to load from the graph. If null, load all relationships |
weightProperty |
string |
null |
yes |
The property name that contains weight. If null, treats the graph as unweighted. Must be numeric. |
write |
boolean |
true |
yes |
Specifies if the result should be written back as a node property |
writeProperty |
string |
'community' |
yes |
The property name written back to the ID of the community that particular node belongs to |
communityProperty |
string |
null |
yes |
The property name that contains an initial or pre-defined community (must be a number) |
defaultValue |
float |
null |
yes |
The default value of the weight in case it is missing or invalid |
includeIntermediateCommunities |
boolean |
false |
yes |
Specifies whether an array of intermediate communities should be returned |
intermediateCommunitiesWriteProperty |
string |
'communities' |
yes |
The property name written back to the ID of the intermediate communities that particular node belongs to |
concurrency |
int |
available CPUs |
yes |
The number of concurrent threads |
graph |
string |
'heavy' |
yes |
Use 'heavy' when describing the subset of the graph with label and relationship-type parameter. Use 'cypher' for describing the subset with cypher node-statement and relationship-statement |
Name | Type | Description |
---|---|---|
loadMillis |
int |
Milliseconds for loading data |
computeMillis |
int |
Milliseconds for running the algorithm |
writeMillis |
int |
Milliseconds for writing result data back |
postProcessingMillis |
int |
Milliseconds for computing percentiles and community count |
nodes |
int |
The number of nodes considered |
communityCount |
int |
The number of communities found |
iterations |
int |
The number of iterations run |
modularities |
list of double |
List of modularities at each iteration |
modularity |
double |
Modularity after final iteration |
p1 |
double |
The 1 percentile of community size. |
p5 |
double |
The 5 percentile of community size. |
p10 |
double |
The 10 percentile of community size. |
p25 |
double |
The 25 percentile of community size. |
p50 |
double |
The 50 percentile of community size. |
p75 |
double |
The 75 percentile of community size. |
p90 |
double |
The 90 percentile of community size. |
p95 |
double |
The 95 percentile of community size. |
p99 |
double |
The 99 percentile of community size. |
p100 |
double |
The 100 percentile of community size. |
write |
boolean |
Specifies if the result was written back as a node property |
writeProperty |
string |
The property name written back to |
includeIntermediateCommunities |
boolean |
Specifies if the intermediate communities were written back as a node property |
intermediateCommunitiesWriteProperty |
string |
The property name written back to for intermediate communities |
The following will run the algorithm and stream results:
CALL algo.louvain.stream(label:String, relationship:String,
{weightProperty:'propertyName', defaultValue:1.0, concurrency:4})
YIELD nodeId, community - yields a community to each node id
Name | Type | Default | Optional | Description |
---|---|---|---|---|
label |
string |
null |
yes |
The label to load from the graph. If null, load all nodes |
relationship |
string |
null |
yes |
The relationship-type to load from the graph. If null, load all relationships |
weightProperty |
string |
null |
yes |
The property name that contains weight. If null, treats the graph as unweighted. Must be numeric. |
defaultValue |
float |
1.0 |
yes |
The default value of the weight if it is missing or invalid |
graph |
string |
'heavy' |
yes |
Use 'heavy' when describing the subset of the graph with label and relationship-type parameter. Use 'cypher' for describing the subset with cypher node-statement and relationship-statement |
Name | Type | Description |
---|---|---|
nodeId |
int |
Node ID |
community |
int |
Community ID after final iteration |
communities |
list of int |
Community IDs for each iteration |
If our projected graph contains more than 2 billion nodes or relationships, we need to use huge graph projection, as the default label and relationship-type projection has a limitation of 2 billion nodes and 2 billion relationships.
Set graph:'huge'
in the config:
CALL algo.louvain('User', 'FRIEND',{graph:'huge'})
YIELD nodes, communityCount, iterations, loadMillis, computeMillis, writeMillis;
If label and relationship-type are not selective enough to describe your subgraph to run the algorithm on, you can use Cypher statements to load or project subsets of your graph. This can also be used to run algorithms on a virtual graph.
Set graph:'cypher'
in the config:
CALL algo.louvain(
'MATCH (p:User) RETURN id(p) as id',
'MATCH (p1:User)-[f:FRIEND]-(p2:User)
RETURN id(p1) as source, id(p2) as target, f.weight as weight',
{graph:'cypher',write:true});
The Louvain algorithm supports the following graph types:
✓ undirected, unweighted
✓ undirected, weighted